Portable Animatronic Fish

OpenMouth AI

This project reimagines the classic 1998 Big Mouth Billy Bass for 2026: a smartphone-controlled animatronic fish that plays soundboard clips, speaks typed phrases, or delivers lightweight GPT responses, with mouth, body, and tail motion synchronized through a custom audio-to-motion tuning pipeline.

Raspberry Pi 4B HiFiBerry DAC+ ADC 2× DRV8833 Phone hotspot uplink Portable 80/20 frame
Mode 01

Soundboard

Phone buttons trigger stored WAV or MP3 clips on the Pi. This is the fastest path to a reliable demo and the best first build target.

Mode 02

Text-to-Speech

Typed phrases are converted to speech, saved as audio, analyzed for mouth motion, and played through the fish speakers.

Mode 03

Lightweight GPT

The Pi connects through the phone hotspot, calls a lightweight GPT model, converts the response to speech, and animates the fish from the generated waveform.

Architecture at a glance

The phone is both the remote control and the internet source. The Pi remains the local brain responsible for audio playback, motion extraction, and motor control.

SmartphoneBrowser UI
Phone hotspot
Soundboard / text / GPT prompt
Raspberry Pi 4BFastAPI app
Audio job queue
TTS + GPT clients
Audio PathWAV/MP3 playback
HiFiBerry DAC+ ADC
Lightweight speakers
Motion PathSpeech-band envelope
Onset pulses
Motion JSON timeline
Gemmy FishMouth motor
Body motor
Tail motor

Hardware stack

SubsystemSelected hardware
ComputeRaspberry Pi 4B, 2GB
Audio I/OHiFiBerry DAC+ ADC, primarily used for DAC output in this architecture
Motor controlTwo DRV8833 dual H-bridge motor drivers with fault/current-protection capability
MotorsExisting Gemmy fish mouth, front body, and tail DC motors
SpeakersTwo lightweight powered speakers or compact amplifier + passive speakers
PowerUSB-C PD power bank for Pi/audio; separate small battery pack for fish motors
Structure80/20 aluminum extrusion frame for portable mounting and cable management

Software stack

LayerDownselected choice
Web serverFastAPI for phone UI routes, status, and future WebSocket control
Laptop simulationAnaconda environment with ASCII digital twin, MP3/WAV input, and motion JSON export
Audio engineALSA-backed WAV playback through the HiFiBerry output
TTSPiper as the default local TTS path; cloud TTS as an optional premium path
GPTLightweight cloud model accessed through phone hotspot internet
Motor APIgpiozero first; pigpio fallback if PWM timing jitter becomes visible
Startupsystemd service to boot directly into fish-control mode

Audio-to-motion pipeline

The build treats soundboard clips, TTS, and GPT speech as the same downstream object: an audio file with a motion timeline.

Input command → load or generate audio file → compute speech-band envelope → detect speech onsets, phrase starts, and strong accents → assign mouth/body/tail pulse timeline → clamp pulse widths and enforce recovery time → start audio playback and motor timeline together

Motor channel map

Two dual H-bridge DRV8833 boards provide four motor outputs. Three are used for the fish, leaving one spare channel for future lighting, a second mouth action, or an auxiliary prop. Each axis is driven as a short pulse into the original spring-return cam mechanism, not as a continuously held servo axis.

DRV8833 #1 Channel A → mouth motor Channel B → front body/head motor DRV8833 #2 Channel A → tail motor Channel B → spare Default command forward pulse → coast/off → spring return Do not command continuous hold against endstop

Mechanical protection layer

The stock Gemmy axes are not servo-controlled. Each motor is treated as a small DC actuator pushing a cam or linkage into motion, then released so the original spring can return the mechanism home.

Protection ruleDesign intent
Pulse, do not holdCommand short forward pulses only. Avoid continuous drive into mechanical endstops.
Coast after each pulseSet both DRV8833 inputs low after each command so the motor is off and the spring can return the axis.
Max-on watchdogEvery motor command is clamped in software even if a higher-level animation request is wrong.
Minimum off-timeAllow mechanical recovery before the next pulse, especially for the mouth axis during speech.
Fault-aware controlUse DRV8833 nFAULT, if exposed, as a diagnostic signal. A fault means the pulse/PWM/current limit is too aggressive.

DRV8833 pulse policy

The DRV8833 improves the electrical safety envelope, but it does not turn the fish into a closed-loop servo. Current protection is a backstop; safe pulse choreography is the primary protection.

Forward pulse IN1 = PWM IN2 = LOW wait pulse_ms Coast / release IN1 = LOW IN2 = LOW Initial tuning targets mouth: 50–120 ms, 35–55% PWM body: 120–300 ms, 35–55% PWM tail: 100–250 ms, 35–55% PWM

Prototype Plan

Bring up the Pi and HiFiBerry

Install Raspberry Pi OS, configure the HiFiBerry overlay, verify audio playback through the RCA output, and confirm the speaker path.

Create the phone UI

Build a minimal FastAPI page with three buttons that trigger known local WAV files from the phone browser.

Drive one motor first

Connect only the mouth motor through one DRV8833 channel. Use short, conservative pulse tests before attempting audio-following motion.

Add envelope-following

Compute RMS amplitude from the WAV file, detect speech onsets, and map them into short mouth pulses with smoothing, thresholds, clamp limits, and mandatory off-time.

Add body and tail choreography

Use phrase starts, beat estimates, or simple timed accents to trigger the body and tail motors without making the fish look overactive.

Add TTS and GPT modes

Start with Piper or cached phrases, then add phone-hotspot GPT mode once the audio and motion systems are reliable.

Fish Risk Register

RiskMitigation
Motor noise corrupts audio/Pi powerUse separate motor battery, common ground, short motor wiring, and bulk capacitance near drivers.
Mouth motion lags speechPrecompute motion timelines from WAV files and start audio + motor playback from one monotonic clock.
PWM jitterPrototype with gpiozero, then switch the motor layer to pigpio if visible jitter appears.
GPT mode needs internetUse the phone hotspot as uplink. Keep soundboard and local TTS usable without cloud access.
DC motors are not position-controlledTreat each motor as a pulse-driven cam mechanism. Enforce max-on time, minimum off-time, and spring-return recovery.
Endstop stall damages pinions/camsUse DRV8833 current/fault protection as a safety net, but rely primarily on software pulse limits and conservative PWM.

Laptop audio-to-motion development

Before the Raspberry Pi arrives, develop the motion algorithm on macOS as an offline simulator. The working development loop is: audio file → speech-band analysis → safe motor pulse timeline → ASCII fish digital twin → exported motion JSON.

Mac laptop development loop → create or load WAV/MP3 audio → run openmouth_digital_twin_ascii.py → view ASCII fish + timeline cursor → inspect mouth/body/tail trigger events → export outputs/test.motion.json → later copy JSON + audio to Raspberry Pi runtime

1. Create the Anaconda environment

conda create -n openmouth python=3.11 -y conda activate openmouth conda install -c conda-forge numpy scipy matplotlib librosa soundfile ffmpeg jupyterlab ipykernel -y pip install sounddevice python -m ipykernel install --user --name openmouth --display-name "Python (openmouth)"

2. Create the project folders

mkdir openmouth-audio-motion cd openmouth-audio-motion mkdir sounds outputs # Place openmouth_digital_twin_ascii.py in this folder.

3. Generate a known-good test WAV

Use macOS text-to-speech to create a deterministic first test clip. This avoids debugging the algorithm with noisy or compressed source audio.

say "Hello, I am Open Mouth AI. I am a talking fish." -o sounds/test.aiff ffmpeg -i sounds/test.aiff sounds/test.wav

4. Run the ASCII digital twin from Terminal

conda activate openmouth python openmouth_digital_twin_ascii.py sounds/test.wav --preset gentle

The simulator uses macOS afplay by default for cleaner playback while Matplotlib animates. If audio clips or crackles, reduce playback gain.

python openmouth_digital_twin_ascii.py sounds/test.wav --preset gentle --playback-gain 0.35

5. Try the motion presets

PresetUse
mouth_onlySafest first-pass algorithm mode. Only the mouth axis receives pulse events.
gentleConservative mouth motion with sparse body and tail accents. Best default.
animatedMore lively puppet-like behavior. Useful for showmanship, but more aggressive.
python openmouth_digital_twin_ascii.py sounds/test.wav --preset mouth_only python openmouth_digital_twin_ascii.py sounds/test.wav --preset gentle python openmouth_digital_twin_ascii.py sounds/test.wav --preset animated

6. Export the future Pi motion file

python openmouth_digital_twin_ascii.py sounds/test.wav --preset gentle --export-json outputs/test.motion.json

The output JSON is the contract between the offline algorithm and the future Raspberry Pi motor runtime.

{ "t": 0.320, "motor": "mouth", "pulse_ms": 82, "pwm": 0.43, "reason": "speech_onset" }

7. Run inside Jupyter when tuning parameters

Use Jupyter for inspecting variables, event counts, and thresholds. Use Terminal for the most reliable synchronized audio + animation.

conda activate openmouth jupyter lab

Inside a notebook, select Python (openmouth) as the kernel, then either run the script directly:

%run openmouth_digital_twin_ascii.py sounds/test.wav --preset gentle

Or import the analyzer for interactive tuning:

from openmouth_digital_twin_ascii import analyze_audio, PRESETS, DigitalTwin audio_path = "sounds/test.wav" preset = PRESETS["gentle"] audio, sr, times, env_norm, threshold, events = analyze_audio(audio_path, preset) print(f"sample rate: {sr}") print(f"event count: {len(events)}") events[:5]

8. Parameters to tune first

SymptomAdjustment
Mouth flaps too oftenIncrease threshold percentile, onset delta, or mouth minimum gap.
Mouth misses syllablesLower threshold percentile or onset delta.
Motion looks twitchyIncrease envelope smoothing or mouth minimum gap.
Motion looks sluggishDecrease smoothing or mouth minimum gap.
Pulse events seem too aggressiveLower pulse width and PWM ranges before testing hardware.

Design philosophy

The first milestone is not conversational AI. The first milestone is a robust, portable, phone-controlled fish that can play one audio clip and flap its mouth convincingly. Once that is stable, TTS and GPT become input modes rather than architectural risks.

Exact bring-up instructions

The goal of the first bring-up milestone is intentionally narrow:

Phone browser → FastAPI UI running on Raspberry Pi → button press → WAV playback through HiFiBerry DAC → speakers verified

1. Flash Raspberry Pi OS

Install Raspberry Pi Imager

Download Raspberry Pi Imager on your laptop and insert a 32–64 GB microSD card.

Select OS

Choose Raspberry Pi OS Lite (64-bit). Lite is preferred because this project does not need a desktop environment.

Preconfigure Wi-Fi and SSH

In the advanced settings menu, enable SSH, configure your Wi-Fi network or phone hotspot SSID/password, and set a hostname such as openmouth.

Flash the SD card

Write the image, insert the card into the Pi, connect Ethernet or Wi-Fi, and power the Pi from the USB-C PD battery.

2. Connect to the Raspberry Pi

ssh pi@openmouth.local # or ssh pi@<Pi_IP_Address>

Update the system:

sudo apt update sudo apt upgrade -y

3. Install the HiFiBerry DAC+ ADC

Power OFF the Pi

Never attach the HiFiBerry board while powered.

Attach the HiFiBerry board

Mount the DAC+ ADC onto the Pi GPIO header carefully and verify all pins align correctly.

Connect speakers

Use RCA or RCA-to-3.5 mm cables into powered speakers.

4. Enable the HiFiBerry overlay

Edit the Raspberry Pi boot config:

sudo nano /boot/config.txt

Add these lines near the bottom:

dtoverlay=hifiberry-dacplusadc # OR for the Pro variant: # dtoverlay=hifiberry-dacplusadcpro

Disable onboard audio:

dtparam=audio=off

Reboot:

sudo reboot

5. Verify the HiFiBerry audio device

After reboot:

aplay -l arecord -l

You should see a HiFiBerry audio device listed.

6. Install Python dependencies

sudo apt install -y python3-pip python3-venv git mkdir ~/openmouth cd ~/openmouth python3 -m venv venv source venv/bin/activate pip install fastapi uvicorn sounddevice numpy scipy gpiozero

7. Create a minimal FastAPI app

Create app.py:

from fastapi import FastAPI from fastapi.responses import HTMLResponse import subprocess app = FastAPI() HTML = """ <html> <body style='font-family:sans-serif;padding:40px;'> <h1>OpenMouth AI</h1> <button onclick=\"fetch('/play')\">Play Test Audio</button> </body> </html> """ @app.get("/", response_class=HTMLResponse) def root(): return HTML @app.get("/play") def play(): subprocess.Popen([ "aplay", "sounds/test.wav" ]) return {"status": "playing"}

8. Add a test WAV file

mkdir sounds

Place a small WAV file inside:

sounds/test.wav

9. Run the FastAPI server

source venv/bin/activate uvicorn app:app --host 0.0.0.0 --port 5000

10. Verify the phone UI

Connect phone and Pi to same network

Usually your phone hotspot network.

Open browser

Navigate to http://openmouth.local:5000 or the Pi IP address.

Press Play Test Audio

You should hear audio through the speakers connected to the HiFiBerry board.

11. First DRV8833 motor bring-up

After audio and phone UI are stable, test only one motor channel first. The goal is to find the minimum pulse that creates visible motion without audible buzzing at the hard stop.

Motor test order 1. Disconnect all fish motors except the mouth motor. 2. Use a separate low-voltage motor supply if available. 3. Command 40 ms at 35–40% PWM. 4. Increase pulse duration in small steps: 60 ms, 80 ms, 100 ms. 5. Stop increasing once the mechanism clearly actuates. 6. Set software max-on below the first pulse that causes hard-stop buzz. 7. Repeat for body and tail only after mouth behavior is safe.

Use coast mode as the default release state:

DRV8833 channel convention Forward pulse: IN1 = PWM, IN2 = LOW Reverse pulse: IN1 = LOW, IN2 = PWM # usually unused Coast/off: IN1 = LOW, IN2 = LOW Brake: IN1 = HIGH, IN2 = HIGH # avoid initially

12. Troubleshooting checklist

ProblemLikely cause
No soundWrong audio output selected, onboard audio not disabled, or powered speakers not enabled.
FastAPI inaccessible from phonePhone and Pi not on same network, firewall issue, or wrong IP.
Audio stuttersWeak power supply or USB battery incapable of stable Pi 4 current delivery.
HiFiBerry not detectedOverlay typo, improper seating on GPIO header, or reboot not performed.
Wrong audio deviceHDMI audio still selected instead of HiFiBerry ALSA device.

13. First success milestone

At this point, the system should support:

Smartphone browser → FastAPI button → WAV playback → HiFiBerry → powered speakers.

Do not add motors yet. Verify audio stability and phone UI reliability first.

Python pipeline — modular architecture & song tuning

The offline laptop prototype is built as a proper Python library (openmouth/) with a strict boundary between Pi-safe signal processing and Jupyter-only visualisation tools. Five Jupyter notebooks walk through the full workflow from raw audio to validated motion JSON.

Library modules

openmouth/ is Pi-safe (numpy/scipy/librosa only). The ASCII twin imports it without circular dependency.

audio.pyload_audio · bandpass_speech (300–3400 Hz) · RMS envelope · smooth · normalize
onset.pyenvelope_crossings → mouth times · librosa onsets → body/tail times
motion.pybuild_timeline · Preset dataclass · PRESETS dict · safety clamps
exporter.pyvalidate_timeline · export_json · .motion.json contract
twin.pyASCII fish render · big-O mouth indicator · afplay sync · animate()

Jupyter notebook workflow

NotebookPurposeKey output
01_audio_analysisInspect the raw signal chain — waveform, bandpass filter, RMS envelope stagesVisual understanding of why speech-band filtering matters
02_onset_tuningInteractive ipywidgets sliders for threshold, smoothing, gap, onset_deltaTuned parameter set for a specific audio file
03_motion_timelineStacked timeline plot — waveform, envelope, per-motor event bars (width = pulse_ms, opacity = PWM)Visual QA before export
04_export_and_validateSafety validation then export to .motion.jsonPi-ready motion file + event density chart
05_ascii_twinASCII fish animates in Jupyter cell in sync with afplay audioReal-time sanity check of the full motor timeline

Built-in preset library

Six presets ship with the library. Each is a Preset dataclass instance registered in PRESETS. batch_tune.py selects one automatically by matching filename keywords.

Presetthresholdsmoothmouth_gapBest for
hiphop0.5440.13 sDefault — rap, hip-hop, compressed pop vocals
pop0.4570.10 sSustained pop vocals (All Star, I Will Survive, This Love)
jpop0.3250.10 sDynamic J-pop / anime (Cruel Angel's Thesis)
ballad0.2290.12 sSlow ballads and acoustic tracks with wide dynamics
edm0.5030.08 sElectronic / dance — very fast transient response
speech0.1860.09 sSpeech-heavy tracks, podcasts, McDonald's jingle
from openmouth.motion import PRESETS, Preset # Use a built-in preset preset = PRESETS["hiphop"] # Keyword auto-selection (same logic as batch_tune.py) # cruelangel*.mp3 → jpop | mcdonalds*.mp3 → speech | default → hiphop
pop_song preset

All Star — Smash Mouth

A maximally compressed pop track. The speech-band envelope stays above 0.20 for ~80% of the clip — the default gentle threshold of 0.18 barely crosses upward, producing only ~0.5 mouth events/second. Raising to 0.45 puts it in the zone where the envelope fluctuates with syllable energy.

ParameterValueRationale
threshold0.45Envelope stays high; need a higher crossing point
smoothing_frames7Standard — keeps envelope clean
body_min_gap_s0.50 sBody bobs at ~104 BPM beat rate
tail_min_gap_s1.00 sOne sweep per bar
onset_delta0.07Standard onset sensitivity
Duration : 60.0 s | 235 events total mouth : 137 (2.28/s) pulse 60–118 ms body : 64 (1.07/s) pulse 141–237 ms tail : 34 (0.57/s) pulse 120–199 ms ✅ No safety warnings
jpop preset

Cruel Angel's Thesis

A traditionally mastered J-pop track with real dynamic range. The envelope only exceeds 0.45 for ~31% of the clip, so pop_song starves the vocal sections. The first ~20 s is the instrumental organ intro — fewer events there is correct and expected.

ParameterValueRationale
threshold0.32Peak crossing zone for this song's dynamic range
smoothing_frames5Lighter — crisper syllable tracking
body_min_gap_s0.55 s~130 BPM; slightly sparser body bobs
tail_min_gap_s1.00 sOne sweep per bar
onset_delta0.06More sensitive to J-pop percussion transients
Duration : 60.0 s | 196 events total mouth : 102 (1.70/s) pulse 61–108 ms body : 61 (1.02/s) pulse 141–228 ms tail : 33 (0.55/s) pulse 126–190 ms ✅ No safety warnings

Audio analysis — signal chain & event timelines

Top row: raw waveform amplitude. Bottom row: normalized speech-band envelope (orange fill), threshold (red dashed), mouth events (orange lines), body events (purple, bottom 40%), tail events (teal, bottom 25%).

Envelope and event timeline

Why different thresholds?

Each curve shows the percentage of time the normalized envelope exceeds a given threshold. All Star (orange) stays high everywhere — a direct result of heavy pop compression. Cruel Angel's Thesis (purple) has a steeper drop-off, reflecting its traditional dynamic range. The shaded bands mark each song's tuned threshold, sitting at the inflection point where crossings are most meaningful.

Dynamic range comparison

Event density per 10-second window

Mouth, body, and tail events distributed across each 60-second clip. The low count in the first two windows of Cruel Angel's Thesis is intentional — those are the instrumental intro bars before the vocalist enters. Both songs maintain consistent density across their vocal sections.

Event density

Tuning guide — adapting a new song

The key diagnostic is the envelope distribution: check what percentage of time the envelope exceeds various thresholds, then pick the value at the inflection point where the curve bends steeply — that is where crossings are most responsive to actual vocal energy.

SymptomDiagnosisFix
Too few mouth events (<0.8/s)Threshold too high for this song's dynamic rangeLower threshold toward the envelope's inflection point
Too many mouth events (>3.5/s)Threshold too low — firing on background energyRaise threshold or increase smoothing_frames
Events bunched, then silentUneven dynamics (intro vs chorus)Use norm_percentile=95 for more aggressive ceiling, or clip to vocal section only
Events feel jittery / chatterySmoothing too light for this styleIncrease smoothing_frames (5 → 7 → 10)
Syllables blurring togetherSmoothing too heavyDecrease smoothing_frames; increase mouth_min_gap_s slightly
Body / tail too activeOnset detector too sensitiveIncrease onset_delta (0.06 → 0.08 → 0.10)
# Quick diagnostic — run before picking a threshold import numpy as np from openmouth.audio import (load_audio, bandpass_speech, rms_envelope, smooth_envelope, normalize_envelope) audio, sr = load_audio("sounds/my_song.mp3", target_sr=22050) filtered = bandpass_speech(audio, sr) env_raw = rms_envelope(filtered, frame_length=512, hop_length=256) env_s = smooth_envelope(env_raw, window_frames=7) env_n = normalize_envelope(env_s, percentile=98.0) print("Envelope distribution:") for t in [0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50]: pct = 100 * np.mean(env_n > t) bar = "X" * int(pct / 2) print(f" > {t:.2f}: {pct:5.1f}% {bar}") # Rule of thumb: pick the threshold where the value drops from >60% to ~30-40%. # That inflection is where the envelope has the most meaningful crossings.

How events are triggered — mouth, body, and tail

The mouth and the body/tail motors are driven by two completely independent audio features. Understanding the difference is the key to tuning them independently.

MOUTH — RMS ENVELOPE THRESHOLD CROSSINGS

The audio is bandpass-filtered (≈80–3000 Hz, the speech band) to remove low rumble and high noise. The RMS energy is computed in short overlapping windows (~23 ms), smoothed by a moving average (smoothing_frames), and normalized so the loudest moment in the track equals 1.0.

The mouth fires on upward crossings of threshold — the exact frame where the envelope rises from below to above the value. A sustained loud note produces exactly one event at the moment it starts, not a continuous stream.

After all crossings are collected, a gap filter (mouth_min_gap_s) discards any crossing that arrives too soon after the previous one, preventing rapid double-fires on staccato syllables.

Each crossing becomes a MotionEvent with a randomly drawn pulse_ms (e.g. 60–115 ms) and duty cycle. On the physical fish the motor drives for exactly that many milliseconds, then power cuts and a spring returns the jaw to closed. Duration of the open state is controlled entirely by pulse_ms and the spring — there is no separate "close" command.

BODY & TAIL — LIBROSA ONSET DETECTION

Body and tail events are driven by a completely different signal: librosa's spectral flux onset detector, which measures how quickly the frequency content changes. It is sensitive to percussive transients — drum hits, consonant bursts, strong beat attacks — not to sustained loudness.

onset_delta sets the minimum strength a local peak must exceed its local average. Lower delta → more onsets detected → denser body/tail events. Both motors draw from the same onset pool; what differentiates them is only the gap filter applied afterward.

Body vs Tail spacing example
Given onsets at [0.1, 0.3, 0.5, 0.9, 1.1, 1.8] s:
Body (gap 0.50 s) → keeps [0.1, 0.9, 1.8]
Tail (gap 1.00 s) → keeps [0.1, 1.1]
Tail is always a sparser subset of body events.

Like the mouth, each event fires for a randomly drawn pulse_ms and returns to rest when power cuts. The fin position (up vs. down) is a deterministic alternation in the digital twin; on the physical fish both fins have a single motor each, so direction is not controllable — only duration and duty cycle.

KEY ARCHITECTURAL POINT

Mouth and body/tail are driven by entirely independent features of the audio. A quiet verse with active drums can produce dense body/tail events with a completely closed mouth. A sustained loud note can hold the mouth open with no body or tail activity. Tuning them never conflicts — adjust threshold and mouth_min_gap_s for the mouth; adjust onset_delta and the two gap parameters for body and tail.

Threshold intuition

threshold is relative to the track's loudest moment after normalization. The envelope distribution table printed by tune_song.py tells you what percentage of the clip sits above each candidate value — use it to find the inflection point.

ScenarioWhat you seeThreshold guidance
Heavily compressed pop (All Star)Envelope above 0.20 for ~80% of the trackNeed a high threshold (≈0.45–0.54) to catch only syllable peaks
Dynamic J-pop (Cruel Angel)Envelope above 0.45 for only ~31% of the trackLower threshold (≈0.30–0.35) to avoid starving vocal moments
Instrumental section firingMouth opens when nobody is singingRaise threshold, or run with --no-vad removed to re-enable VAD gating
Vocals barely trigger mouthMouth stays closed through singingLower threshold; also check smoothing_frames isn't washing out syllable peaks

Smoothing and gap intuition

smoothing_frames sets the moving-average window on the RMS envelope before threshold comparison. Larger windows blur rapid syllables into a single sustained lump; smaller windows track individual syllables but also track noise.

ParameterLower valueHigher value
smoothing_framesCrisp per-syllable tracking (fast speech, J-pop)Smooth phrase-level shapes (slow ballads, instruments)
mouth_min_gap_sMore events; can blur adjacent syllables togetherFewer events; sparser but cleaner open-close cycles
onset_deltaMore body/tail events; tracks subtle transientsFewer events; only strong beats and accents fire
body_min_gap_sBody bobs at beat rateBody bobs at bar rate
tail_min_gap_sTail moves more frequently (approaching body density)Tail sweeps slowly, only on the strongest beats

batch_tune.py — one-command batch processor

batch_tune.py is the primary workflow for generating motion files. Run it once from openmouth-audio-motion/ to process every audio file in sounds/ and write .motion.json files to outputs/. The SoundPond web app picks up the results immediately — no server restart required.

Usage

# Tune everything with auto-selected presets python batch_tune.py # Force one preset for all files python batch_tune.py --preset hiphop # Per-file preset overrides python batch_tune.py \ --map McDonalds_60s.wav=speech \ --map LionKing_60s.wav=pop # Only process files whose JSON doesn't exist yet python batch_tune.py --skip-existing # Preview without writing any files python batch_tune.py --dry-run

Auto-preset keyword rules

Keyword in filenamePreset selected
mcdonalds, speech, podcastspeech
cruelangel, jpop, animejpop
lionking, musical, broadwaypop
survive, thislovepop
ballad, acoustic, slowballad
edm, electronic, technoedm
(anything else)hiphop

Pipeline steps (per file)

load_audiolibrosa · 22 050 Hz mono
bandpassspeech 80–3400 Hz + bass 60–300 Hz
RMS envelope512-frame windows · smooth · normalize
envelope_crossingsmouth times (upward threshold crossings)
detect_speech_onsetsbody + tail times (librosa onset detector)
build_timelinepure pulse events sorted by time

build_timeline — simplified event model

All events are pure discrete pulses. The timeline builder applies no hold/sustain classification, no tempo scaling, no anticipation offsets, and no phrase-boundary suppression — features that were found to cause gear stalling and unnatural motion. Each crossing or onset maps directly to one MotionEvent.

MotorSourceModeSpacing enforced by
MouthUpward envelope crossings of thresholdpulse onlymouth_min_gap_s
BodyBroadband librosa onset detectorpulse onlybody_min_gap_s
TailSame broadband onset pool (sparser gap)pulse onlytail_min_gap_s
## Sample batch_tune.py output Batch tuning 9 file(s) Sounds : /…/openmouth-audio-motion/sounds Outputs : /…/openmouth-audio-motion/outputs → All Star.mp3 [pop] ✓ 205.8s 104.3bpm mouth:147 body:183 tail: 94 phrases:0 [8.2s] → cruelangel_60s.wav [jpop] ✓ 60.0s 130.6bpm mouth:132 body:156 tail: 78 phrases:0 [2.1s] → McDonalds_60s.wav [speech] ✓ 60.0s 95.2bpm mouth:119 body: 88 tail: 44 phrases:0 [2.3s] ────────────────────────────────────────────────────────────────────────────── File Preset Dur BPM Scale Mth Bdy Tl Time ────────────────────────────────────────────────────────────────────────────── All Star.mp3 pop 205.8 104.3 ×1.00 147 183 94 8.2s cruelangel_60s.wav jpop 60.0 130.6 ×1.00 132 156 78 2.1s McDonalds_60s.wav speech 60.0 95.2 ×1.00 119 88 44 2.3s ────────────────────────────────────────────────────────────────────────────── 9/9 succeeded

🎬 Digital Twin Case Studies

The fish digital twin is generated in the jupyter notebook for crisper tuning development. Here are two real songs rendered through the full pipeline — audio analysis → motion timeline → image-frame animation → MP4. Each clip shows 30 seconds of the digital twin animating with all 8 body states active.

The 8 animation states

Every animation frame is one of these 8 images. The pipeline picks the correct image at each timestamp based on which motors are currently active. Front fin and tail fin each have two states (flat / flapped); mouth has two states (closed / open) — giving 2 × 2 × 2 = 8 combinations.

MOUTH CLOSED
Idle
Front flat · Tail flat
mouth 0 front 0 tail 0
Body accent
Front flap · Tail flat
mouth 0 front 1 tail 0
Tail sweep
Front flat · Tail flap
mouth 0 front 0 tail 1
Body + tail
Front flap · Tail flap
mouth 0 front 1 tail 1
MOUTH OPEN
Mouth only
Front flat · Tail flat
mouth 1 front 0 tail 0
Mouth + body
Front flap · Tail flat
mouth 1 front 1 tail 0
Mouth + tail
Front flat · Tail flap
mouth 1 front 0 tail 1
Full expression
Front flap · Tail flap
mouth 1 front 1 tail 1
Key: orange border = mouth open blue pill = front fin active green pill = tail fin active

"All Star" — Smash Mouth

hiphop preset

Clip: 00:10 – 00:40  ·  Preset: hiphop  ·  Mode: full (all 8 states)

EVENT DENSITY — 5s WINDOWS
Events per motor per 5 s · top axis = track time
Mouth Body Tail
PRESET PARAMETERS
Smoothing frames4
Threshold0.54
Mouth min gap0.13 s
Body min gap0.50 s
Tail min gap1.00 s
Onset delta0.07
TUNING RATIONALE

All Star is maximally compressed: the RMS envelope stays above 0.20 for ~80% of the track. A high threshold of 0.54 sits at the inflection point where the envelope actually fluctuates with lyric energy, preventing the mouth from firing constantly on flat sections. Body bobs at beat rate (~2/s), tail sweeps at bar rate (~1/s).

"Cruel Angel's Thesis" — Yoko Takahashi

jpop preset

Clip: 00:25 – 00:55  ·  Preset: jpop  ·  Mode: full (all 8 states)

EVENT DENSITY — 5s WINDOWS
Events per motor per 5 s · top axis = track time
Mouth Body Tail
PRESET PARAMETERS
Smoothing frames5
Threshold0.32
Mouth min gap0.10 s
Body min gap0.55 s
Tail min gap1.00 s
Onset delta0.06
TUNING RATIONALE

J-pop has real dynamic range unlike maximally-compressed Western pop — the envelope only exceeds 0.45 for ~31% of the track. Using pop_song's threshold of 0.45 would starve the vocals. A lower threshold of 0.32 with lighter smoothing (5 frames) preserves crisp Japanese syllable articulation. Body bobs at ~130 BPM beat structure.

Pipeline Comparison

Same 60-second clip processed through multiple presets — event counts per motor illustrate how preset parameters shape animation density.

Preset Song character Mouth Body Tail Total / 60s
hiphop Hip-hop / rap, sustained 80 97 61 238
jpop J-pop / anime, dynamic range 102 61 33 196
pop_song Western pop, compressed 307 197 140 644
animated High-energy / expressive 196 205 205 606

Event counts from the full Cruel Angel's Thesis track (252s) across presets. Higher isn't always better — the goal is matching the song's rhythmic character, not maximizing raw count.

SoundPond — Web Controller

FastAPI server running on the Raspberry Pi at http://openbass.local:8000. Any device on the same Wi-Fi can open the page. SoundPond is a SoundCloud-inspired interface that auto-discovers every audio file in sounds/ and displays waveform previews with motion-sync status.

🎵
Auto-discovery
Scans sounds/ on startup. Drop a file in, restart, it appears. WAV · MP3 · FLAC · AIFF · M4A.
〰️
Live waveforms
200-bar RMS waveform for every track — hero scrubber + per-card mini waveform. Decoded in the background.
Motion sync
⚡ badge on tracks with a matching .motion.json. Play fires audio + all three motors in perfect sync.
🗣️
Live TTS
Type anything, fish speaks it via Piper neural TTS with full mouth sync via the same pipeline.
⚙️
DOF controls
Enable / disable mouth, body, or tail independently. Event divider (÷1 ÷2 ÷3) and carrier frequency per motor.
🎛️
Live PWM tuning
Drive PWM, pulse ms, coast ms sliders per motor. Changes apply to the very next event — no JSON regen needed.
http://openbass.local:8000
SoundPond
🔊
🐟
All Star
OpenMouth AI · Audio + Motor sync ⚡
0:38
3:25
All Star
OpenMouth AI
3:25
2
Cruel Angel
OpenMouth AI
1:00
3
Ave Maria
OpenMouth AI
--:--
4
Gold Digger
OpenMouth AI
1:00
5
McDonald's
OpenMouth AI
1:00
↓ more tracks…
Now Playing
All Star
Audio + Motor sync ⚡ · hiphop preset
Active DOFs
✓ Mouth
✓ Body
✗ Tail
Event Divider
Body
÷1
÷2
÷3
Tail
÷1
÷2
÷3
PWM Carrier
All
125Hz
250
1kHz
2kHz
PWM Tuning — Mouth
Drive PWM
96%
Pulse ms
120
Coast ms
5
Live TTS
Say something…
Speak

REST API Endpoints

Method Path Description
GET /sounds List all discovered tracks with label, filename, and has_motion flag
GET /waveform/{stem} 200-bar RMS waveform + duration for any audio file. Cached after first decode. Supports WAV (stdlib), librosa, or ffmpeg fallback.
GET /play?file={name} Play audio + fire motor timeline if .motion.json exists. Kills any current playback first.
GET /status Returns {"playing", "elapsed", "duration"} — polled every second to drive the waveform scrubber.
GET /stop Kill all playback and reset motor state immediately.
GET /tts?text={phrase} Synthesize speech via Piper TTS, compute motion timeline on the fly, play with mouth sync.
GET /set_pwm Live-update any motor parameter (drive_pwm, pulse_ms, coast_ms, freq, divider). No restart required.

🎙 Recording Studio — /record

Segment-based punch-in recorder. Play audio on the Pi, hold a key to fire each DOF, release to save the event. Each take is stored as a time-stamped segment; overlapping takes use a newer-wins compile strategy. Bake to a .motion.json when satisfied.

http://openbass.local:8000/record
← Soundboard SoundPond 🎙 Record
🔊
+
65%
J Mouth K Body L Tail Space Start
Track
All Star.mp3
Motor
👄 Mouth
🐟 Body
〰 Tail
0:38 / 3:25
👄 🐟
Audio
▶ Play
⏸ Pause
⏹ Stop
▶ playing
⏹ STOP ALL
Record
⏺ Rec
⏸ Pause
⏹ Stop
Space
⏱ Reaction offset
150ms
● REC mouth 0:31 → 0:38 (4 events)
Segments
👄 Mouth 2 segs
↺ clear
0:00 – 0:31 12 ev × del
● 0:31 – now 4 ev
🐟 Body 1 seg
0:00 – 3:25 38 ev × del
〰 Tail no segments
Playback on Pi
▶ Play on Pi
⏹ Stop
⬇ Bake to JSON
▶ mouth + 🤖 auto body/tail: 50 compiled events → Pi
📼
Segment punch-in
Each ⏺ Rec → ⏸ Pause cycle saves a named segment with its exact time range. Later takes auto-replace only the covered window — the rest of the song is untouched.
Reaction offset
Human reaction time is ~150 ms. The slider shifts every event timestamp backward by that amount so the fish movement lines up with the beat you heard, not the beat you reacted to.
🤖
Auto body & tail
Check this to pull body and tail events straight from the existing .motion.json during Pi Playback. Focus entirely on mouth — the fish moves naturally on its own.
Bake to JSON
Compiles all segments with newer-wins logic and patches the final {stem}.motion.json. The soundboard picks it up instantly — no restart needed.

🎙 Tuning Custom Songs

The auto-generated motion pipeline is a strong starting point, but for songs where the beat is irregular, the lyrics are dense, or you simply want a specific performance character, the Recording Studio lets you hand-craft every movement. This page explains the full workflow from audio file to finished .motion.json.

Three-phase workflow

Each phase builds on the last. You can stop after any phase and still get a working performance.

1️⃣
Auto-generate
Run generate_motion.py to produce a baseline .motion.json from the audio envelope. Body and tail follow the beat; mouth follows speech-band energy.
2️⃣
Tune PWM live
On the Soundboard, play the track and adjust Drive PWM, pulse ms, hold PWM, close PWM, and PWM carrier frequency until each DOF moves exactly as wanted.
3️⃣
Record & punch-in
Open the Recording Studio, enable Auto body & tail, and focus on mouth only. Use Space / J to record holds in real time. Punch in over any phrase that needs a redo.

Step-by-step guide

Drop the audio file

Copy your WAV, MP3, FLAC, or M4A into openmouth-audio-motion/sounds/. Restart the server (or it auto-reloads if you have --reload). The track appears immediately in both the Soundboard and Recording Studio selectors.

Run the auto-generator

From the project root: python generate_motion.py "Track Name". This writes outputs/Track Name.motion.json. The Soundboard will now show the ⚡ badge and fire all three DOFs when you play the track. This is your baseline — every subsequent step refines it.

Tune PWM parameters on the Soundboard

Play the track on the Soundboard and watch the fish. Adjust these sliders until the mechanics feel right:

ParameterWhat it controlsStart point
Mouth Drive PWMOpening speed & force. Too low → mouth stalls. Too high → impact noise.96%
Mouth Drive msHow long the drive pulse lasts before switching to hold. Longer = wider open.120 ms
Mouth Hold PWMMinimum duty cycle to keep mouth open against the spring. Just above stall point.40%
Mouth Close PWMReverse pulse strength for the active-close phase.85%
Body Kick PWMInitial burst to overcome static friction before travel phase.71%
Body Travel PWMSustained travel duty cycle after the kick.47%
Body Hold PWMStall current to hold body at end-of-travel against spring.26%
PWM Carrier (body/tail)Carrier frequency. Lower = more torque ripple but cooler motor. Higher = smoother but more heat.125 Hz

💡 Tip: Changes apply to the very next motor event — no JSON regeneration needed. Tune live while the fish is moving.

Open the Recording Studio

Click 🎙 Record in the Soundboard nav. Select your track from the dropdown. The waveform loads automatically. The three motor key bindings are shown in the top-right corner: J Mouth · K Body · L Tail.

Enable Auto body & tail

In the Playback on Pi panel, check 🤖 Auto body & tail (ignore segments). This tells Pi Playback to pull body and tail events from the existing .motion.json instead of any manually recorded segments. You can then focus 100% on mouth quality without managing body and tail timing simultaneously.

When to turn this off: Once mouth is finalized, uncheck Auto body & tail, select Body or Tail as the active motor, and record those DOFs with the same punch-in workflow. Each motor's segments compile independently.

Set the reaction offset

Human reaction time between hearing a beat and pressing a key is typically 120–200 ms. The ⏱ Reaction offset slider (default 150 ms) automatically shifts every recorded event timestamp backward by that amount. You press the key when you hear the beat; the system records it as if you pressed it on the beat.

Calibrate by recording a single obvious beat, playing it back, and listening for the lag. Increase the offset if the fish lags the beat; decrease if it leads.

Record your first segment

Press ▶ Play (Audio row) to start the song on the Pi speaker. Press Space or ⏺ Rec when you're ready to start capturing. Hold J whenever you want the mouth open — release to close. Press Space again to save the segment.

Each key-hold records a single event. Short taps produce pulse events (< 200 ms); longer holds produce hold-mode events that keep the mouth open for the full duration.

Punch in over any section

Click the timeline canvas to seek to any position, or use ⏮ Rewind. Press ▶ Play to resume, then Space to start recording again. The new segment will cover only its own time range — all other existing events are preserved. This is the newer-wins compile strategy: later recordings always win within their window.

Workflow pattern: Record the whole song roughly → punch in chorus → punch in verses → punch in any awkward transitions. Each pass is non-destructive outside its own time range.

Preview with Pi Playback

Press ▶ Play on Pi in the Playback panel. This compiles all mouth segments (and pulls auto body/tail from the motion file), pushes the event list to the server, then plays audio + motors in full sync. This is the same path as the Soundboard — what you see is what you ship.

Bake to JSON

When the performance is final, press ⬇ Bake to JSON. This compiles all segments and patches them into outputs/{stem}.motion.json using newer-wins logic. The Soundboard immediately uses the new file — no server restart required. The raw segment files are kept, so you can always re-bake with different settings.

🧠 Concepts to know

Segment

A named recording take with a t_start, t_end, and list of motor events. Stored in {stem}.{motor}.segments.json alongside the audio file.

Newer-wins compile

Segments are sorted by recorded_at. Each later segment erases all events in its time range from earlier segments, then inserts its own. The result is a clean, flat, sorted event list.

Hold vs Pulse events

A key held ≥ 200 ms becomes a hold event — the motor opens and holds until hold_ms elapses. A shorter tap becomes a pulse — a single drive–coast cycle. Both are stored the same way; the dispatcher picks the right motion sequence at play time.

Motion override

Pi Playback pushes the compiled event list into an in-memory _motion_override dict keyed by stem. /play checks this before opening the .motion.json file. Bake writes the override back to disk permanently.

✅ Tips for great recordings

🎧

Use headphones while recording so the speaker audio doesn't bleed into your reaction time calibration. Wired is better — Bluetooth adds 50–200 ms of its own latency.

🔁

Record sections, not the whole song. Tackle the chorus first (it repeats), then verses. Keep segments short — a 30-second segment is easier to redo than a 3-minute one.

👄

Mouth on vowels, not consonants. Open on the vowel onset of each word; close during the consonant gap. This is how the original fish firmware worked and it looks the most natural.

🐟

Watch the fish, not the screen. After a rough pass, sit across the room, press Pi Playback, and judge it from the audience perspective. That's what matters.

⚙️

Retune before you re-record. If the mouth doesn't look right during playback, it's usually a PWM tuning issue, not a timing issue. Go back to the Soundboard sliders first.

Mechanical Motion & PWM

Wire colour guide, pin mapping, tuned PWM parameters, and the mechanical physics behind spring-return stall control. All values are live-tunable in the web controller.

Wiring Diagram

Raspberry Pi 4B BCM GPIO numbering GND GPIO 17 Pin 11 GPIO 27 Pin 13 GPIO 22 Pin 15 GPIO 23 Pin 16 GPIO 24 Pin 18 GPIO 25 Pin 22 I2S Bus GPIO 18 · 19 · 20 · 21 Reserved → HiFiBerry HAT Powered via USB-C PD bank Logic: 3.3 V · Bus: 5 V nSLEEP on each DRV8833 tied to motor 5 V (always on) HiFiBerry DAC+ ADC GPIO HAT · I2S · RCA / 3.5 mm out 🔊 Speakers RCA or 3.5 mm line out DRV8833 #1 Mouth + Tail — Tail channel — VCC AIN1 AIN2 BIN1 BIN2 GND VM → motor batt ⚡ AO1 AO2 BO1 BO2 DRV8833 #2 Body VCC AIN1 AIN2 GND VM → motor batt ⚡ AO1 AO2 👄 Mouth DC Motor · spring-return + = open − = close (rev pulse) 〰 Tail DC Motor · spring-return + = flap − = (unused) 🐟 Body DC Motor · spring-return kick → sustain → hold ⚡ Motor battery → VM + VCC pins Legend Motor 5 V (VCC) Motor 5 V (VM drive) Ground Mouth GPIO 17/27 Tail GPIO 22/23 Body GPIO 24/25 I2S (GPIO 18–21) Audio out (RCA) Solid = + terminal Dashed = − terminal pwm > 0 → IN1 low, IN2 driven pwm < 0 → IN1 driven, IN2 low pwm = 0 → both low (coast) GPIO 18–21 = I2S: avoid motors
Motor DOF Motor Wire DRV8833 Terminal Logic Wire Logic Signal Signal Wire Arduino (Legacy) RPi GPIO RPi Pin
#1 Body Front − Yellow #2, Out 3 Brown INT 4 Yellow D3 25 P22
Front + White #2, Out 4 Red INT 3 Orange D11 24 P18
#2 Mouth Mouth − Black #1, Out 3 Brown INT 4 Grey D5 27 P13
Mouth + Red #1, Out 4 Red INT 3 Purple D6 17 P11
#3 Tail Tail − Black #1, Out 1 Brown INT 2 Blue D9 23 P16
Tail + Orange #1, Out 2 Red INT 1 Green D10 22 P15
GND GND (shared) GND P6

⚠ GPIO 18 — HiFiBerry I2S Conflict

The Arduino schematic assigned Mouth − to Arduino D5 → GPIO 18 (Pin 12). GPIO 18 is the I2S BCLK clock line claimed by the HiFiBerry DAC+ADC driver. Connecting a motor signal here kills audio and prevents PWM from working. On the Pi, Mouth − is remapped to GPIO 27 (Pin 13). Only the PWM signal wire moves — the DRV8833 power rails stay unchanged. GPIOs 18, 19, 20, 21 must never be used for motor control.

DRV8833 H-Bridge Direction Logic

pwm > 0 → forward : IN1 = 0 %, IN2 = pwm %
pwm < 0 → reverse : IN1 = |pwm| %, IN2 = 0 %
pwm = 0 → coast : IN1 = 0 %, IN2 = 0 %

Each motor uses two GPIO pins. Mouth − is on GPIO 27 (remapped from GPIO 18 which is reserved for HiFiBerry I2S BCLK).

Tuned PWM Parameters

Values below were found empirically by testing against real tracks on the BreadVolt 5 V supply. All parameters are live-adjustable from the web controller — changes take effect on the next motor event without restarting the server or regenerating JSON. Carrier frequency is 125 Hz for all three motors (see below).

Motor Phase Parameter Value Notes
Mouth Drive Open PWM 96 % Overcomes jaw cam stiction quickly
Open duration 120 ms Drives jaw to full-open position
Hold Hold PWM 40 % Minimum duty to stall against return spring
Hold duration 300 ms Sustained vowel window (energy-duration gated)
Close Rev PWM 85 % Assisted close — spring + motor together
Rev duration / Coast 45 ms / 5 ms Coast gap prevents current spike on direction reversal
Body Kick Kick PWM 55 % Breaks static friction on body cam
Kick duration 45 ms Short impulse; transitions straight to travel
Travel Travel PWM 26 % Sustains motion to end-of-travel at low current
Travel duration 160 ms
Hold Hold PWM 26 % Stall against spring at end-of-travel (sustained accent)
Hold duration 275 ms Release to coast; spring returns body to rest
Tail Drive Drive PWM 90 % Tail needs high initial force (longer lever arm)
Drive duration 130 ms
Hold Hold PWM 45 % IOI-gated: fires on long inter-onset gaps ≥ 300 ms
Hold duration 300 ms 70 % of IOI gap, capped at 500 ms

Hold / Sustain Classification

The pipeline classifies every motor event as either a short pulse or a longer hold before writing the motion JSON. The runtime dispatches them through different code paths — pulses fire a quick open→coast→close sequence; holds drive to end-of-travel, stall at a reduced duty cycle against the spring, then release. The hold PWM is the key tuned value: too low and the mechanism drifts back mid-hold; too high and current (and heat) accumulates unnecessarily.

Mouth & Body — Energy-Duration Method

At each onset, the algorithm walks forward through the normalised RMS envelope. It measures how long the envelope stays above a floor threshold (mouth: same as detection threshold; body: 50 % of that). If the duration exceeds the hold_threshold_ms (mouth: 300 ms, body: 275 ms), the event is classified as a hold and hold_ms is set to that measured duration, capped at a maximum (mouth: 800 ms, body: 600 ms). Events below threshold are pulses.

Tail — IOI (Inter-Onset Interval) Method

Tail events are driven by beat-onset detection rather than the speech envelope. Hold classification uses the gap to the next tail event: if that gap is ≥ 300 ms (tail_hold_ioi_ms), the current event becomes a hold. The hold duration is set to 70 % of the gap (giving the spring time to return) capped at 500 ms. This keeps the tail raised through long musical phrases rather than snapping back immediately after every beat.

Runtime dispatch: The mode field in the motion JSON ("pulse" or "hold") determines which function runs at playback time. Each DOF has a _pp_* pulse function and a _*_sustain_for(hold_ms) variant. Crucially, both only stop their own motor (never _stop_all()), so concurrent DOF movements survive. The frequency divider and DOF enable/disable checks run before dispatch, so they apply equally to pulse and hold events.

Why Spring Return Beats a Hard Stop

The Gemmy fish uses spring-return cam mechanisms on all three axes, not continuous-rotation or position-controlled servos. Understanding the spring's role is why the hold strategy works safely at all.

❌ Hard Stop (mechanical endstop)

When a DC motor drives into a hard mechanical endstop, shaft velocity drops to zero. Back-EMF, which is proportional to velocity, also drops to zero. With no back-EMF to limit current, the winding resistance alone determines the current draw — often 5–10× the running current. This causes rapid thermal buildup in the motor windings, and the static load puts full torque into the gearbox, risking gear tooth shear or cam binding. There is no stable equilibrium: the motor simply dissipates power as heat until something fails.

✓ Spring Return (stall against restoring force)

The return spring provides a continuously increasing restoring force as the mechanism approaches end-of-travel. At the hold PWM setpoint, motor torque exactly balances the spring's restoring force — a stable equilibrium. Any small perturbation is self-correcting: if the mechanism slips slightly backward, the motor torque now exceeds the spring force and re-drives it forward; if it over-travels, the spring pushes back harder than the motor. Back-EMF is non-zero (the mechanism oscillates slightly), limiting current naturally. On release (PWM = 0 / coast), the spring returns the mechanism to its rest position without any reverse motor command.

Practical implication: The hold PWM must be set just above the minimum duty that prevents the mechanism from drifting back under spring load. Set it too low and the mechanism creeps back (visible as a partial hold); set it too high and unnecessary current heats the coil. For the body and mouth motors on the BreadVolt 5 V supply, 26 % and 40 % respectively were found to be stable stall points.

PWM Carrier Frequency — Why Lower is Quieter

RPi.GPIO software PWM works by toggling GPIO pins on OS-scheduler ticks. The useful range is roughly 125 Hz – 10 kHz; above ~10 kHz the OS jitter makes pulse widths unreliable. All three motors default to 125 Hz.

Inductance smoothing

Motor windings are inductors. At low carrier frequency the on-time is long, so the coil has time to build current smoothly. At high frequency the short pulses cause rapid current rise and fall (high di/dt), creating switching noise and magnetic ripple at the carrier frequency.

Buck converter resonance

The BreadVolt and any upstream supply contain LC output filters. High-frequency PWM switching transients couple onto the supply rail and can excite the LC filter's resonant frequency, producing audible coil whine whose pitch tracks the PWM carrier — particularly loud on the body motor due to lower winding inductance and larger current transients.

125 Hz as the sweet spot

At 125 Hz the switching frequency is below the audible range (~20 Hz–20 kHz), so the carrier itself cannot be heard as a tone. The long PWM period gives the motor inductance maximum time to integrate each pulse into smooth average torque. Empirically, switching noise was lowest at 125 Hz across all three motors on the 5 V battery supply; 1 kHz–10 kHz all produced varying degrees of audible whine.

Hardware fix still recommended

A 470–1 000 µF bulk capacitor placed directly across the DRV8833 VM / GND supply pins would further reduce supply-coupled switching noise by providing local charge storage — smoothing the current spikes that the motor driver draws from the supply rail on each PWM transition. This hardware modification has not yet been applied to the prototype.

Results

Billy Bass in action — three songs, fully animated mouth, body, and tail driven by the OpenMouth pipeline. Recorded on the Raspberry Pi with the HiFiBerry DAC+ ADC for audio output.

🌟

All Star

Smash Mouth · Astro Lounge (1999)

💰

Gold Digger

Kanye West ft. Jamie Foxx · Late Registration (2005)

🤖

Cruel Angel's Thesis

Yoko Takahashi · Neon Genesis Evangelion OST (1995)