Skip to content

feat: TTS audio mode — Kokoro voice personas, seekbar, conversational AI#236

Open
alichherawalla wants to merge 98 commits intomainfrom
feat/tts-implementation
Open

feat: TTS audio mode — Kokoro voice personas, seekbar, conversational AI#236
alichherawalla wants to merge 98 commits intomainfrom
feat/tts-implementation

Conversation

@alichherawalla
Copy link
Copy Markdown
Owner

@alichherawalla alichherawalla commented Apr 7, 2026

Summary

Complete TTS audio mode implementation with Kokoro text-to-speech integration:

  • Voice Personas: 8 mood-based personas (Warm, Calm, Clear, Steady, Bold, Cheerful, Gentle, Refined) with pre-configured playback speeds, selectable via popover in the audio mode bottom bar
  • Audio Mode System Prompt: When audio mode is active, appends instructions telling the LLM to respond conversationally — short sentences, no markdown, expressive punctuation for natural prosody
  • Seekable Progress Bar: Full-width draggable seekbar on AI audio bubbles — tap or drag to jump to any position. Re-speaks from nearest sentence boundary
  • Waveform Animation: Wave bounce animation during actual audio playback only (not during loading). Stops when paused
  • Playback Controls: Play/pause, speed cycling (0.5x–2.0x), progress tracking with wall-clock timer using targeted Zustand selectors (no re-render on every amplitude update)
  • Thinking Block: Renders directly in audio mode without ChatMessage bubble wrapper — clean collapsible block above audio bubble
  • Tap-to-Toggle Recording: Audio mode mic uses tap-to-start/tap-to-stop instead of hold-to-record
  • App Lifecycle: Pauses TTS on app background, resumes on foreground. Stops on back navigation (blur + beforeRemove)
  • Crash Prevention: Defers Kokoro voice config changes until native ExecuTorch worker is idle. Always waits 300ms after stop before new speak
  • Bottom Bar: All quick settings (image gen, thinking, tools, voice) directly accessible — no popover needed. Disabled states shown for unavailable features

Test plan

  • Play AI audio message — waveform animates, timer counts up smoothly
  • Tap seekbar to jump to position — audio resumes from nearest sentence
  • Drag seekbar thumb — follows finger, seeks on release
  • Change voice persona during playback — no crash, voice changes on next play
  • Switch between audio clips — previous stops, new one starts
  • Pause/resume — waveform stops/starts, timer freezes/resumes
  • Press back button — TTS stops
  • Switch to another app — TTS pauses, resumes when returning
  • Change playback speed — takes effect on next chunk
  • Audio mode system prompt — AI responds conversationally, not in markdown
  • Thinking block in audio mode — collapsible, no empty bubble wrapper

alichherawalla and others added 2 commits April 7, 2026 16:41
Implements on-device text-to-speech using OuteTTS 0.3 (454 MB) +
WavTokenizer (73 MB) via llama.rn, with react-native-audio-api for playback.

Two interface modes (user-switchable from Settings):
- Chat Mode: play/stop TTSButton on each assistant message bubble
- Audio Mode: waveform bubbles with auto-TTS after streaming, transcript expand,
  speed cycling, and PCM audio persisted to disk per message for repeat playback

New files:
- src/constants/ttsModels.ts — model URLs, RAM thresholds, cache config
- src/services/ttsService.ts — download, load, generate, persist, play
- src/stores/ttsStore.ts — Zustand store with Chat + Audio Mode actions
- src/hooks/useTTS.ts — convenience hook with RAM gate and weighted progress
- src/components/TTSButton/index.tsx — Chat Mode play/stop per message
- src/components/AudioMessageBubble/index.tsx — waveform bubble component
- src/screens/TTSSettingsScreen/index.tsx — download, mode, speed, cache

Modified:
- Message type: audioPath, waveformData, audioDurationSeconds, isGeneratingAudio
- ChatMessage: Audio Mode branch + TTSButton in meta row
- SettingsScreen: Text to Speech nav row
- Navigation: TTSSettings route
- stores/index.ts, services/index.ts: exports

Tests: 42 unit + integration tests covering service, store, and full flows

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Revert ChatMessage to main (avoids pre-existing complexity lint failure
  when the file enters the push-range diff)
- Add Audio Mode + TTSButton to MessageRenderer instead — clean, under limit
- Move audioPath/waveformData/audioDurationSeconds/isGeneratingAudio fields
  from types/index.ts to types/tts.ts via module augmentation (keeps index.ts
  under the 350-line max)
- Add react-native-audio-api global mock to jest.setup.ts so all test suites
  that transitively import ttsService can resolve the native module

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

In finalizeStreamingMessage, after addMessage() saves the assistant reply,
check if Audio Mode is active and model is loaded — if so, fire
useTTSStore.generateAndSave() in the background so the waveform bubble
auto-generates instead of spinning indefinitely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a Text-to-Speech (TTS) service and store, enabling both Chat and Audio interface modes. The implementation includes model management, audio generation, file persistence, and playback controls. My feedback highlights that btoa and atob are not natively available in React Native and require polyfills or alternative base64 utilities, and suggests adding user feedback and logging when TTS generation fails due to unloaded models.

for (let i = 0; i < uint8.length; i++) {
binary += String.fromCharCode(uint8[i]);
}
return btoa(binary);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of btoa is not available in standard React Native environments without a polyfill. Please ensure a base64 encoding utility (like buffer or a dedicated library) is used to ensure compatibility across platforms.

}

private base64ToFloat32(base64: string): Float32Array {
const binary = atob(base64);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of atob is not available in standard React Native environments without a polyfill. Please ensure a base64 decoding utility is used to ensure compatibility across platforms.

Comment on lines +155 to +157
if (!settings.enabled || !isModelLoaded) {
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check if (!settings.enabled || !isModelLoaded) is correct, but it might be better to provide user feedback if they try to speak while the model is not loaded, rather than silently failing. Additionally, ensure this failure is logged to aid in debugging, as swallowing failures can make issues harder to trace.

References
  1. When catching errors or handling failures, log them instead of swallowing them to ensure failures are visible and to aid in debugging.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 47.79051% with 319 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.66%. Comparing base (9c8a2d7) to head (c602566).

Files with missing lines Patch % Lines
src/services/ttsService.ts 57.14% 56 Missing and 10 partials ⚠️
src/screens/TTSSettingsScreen/index.tsx 8.57% 64 Missing ⚠️
src/components/AudioMessageBubble/index.tsx 12.76% 41 Missing ⚠️
src/screens/ModelDownloadScreen.tsx 33.33% 28 Missing and 4 partials ⚠️
src/components/TTSButton/index.tsx 6.66% 28 Missing ⚠️
src/screens/ModelsScreen/useTextModels.ts 76.00% 12 Missing and 6 partials ⚠️
src/hooks/useTTS.ts 0.00% 14 Missing ⚠️
src/screens/ModelsScreen/TextModelsTab.tsx 67.44% 6 Missing and 8 partials ⚠️
src/stores/ttsStore.ts 83.33% 10 Missing and 2 partials ⚠️
src/components/ChatInput/Popovers.tsx 26.66% 7 Missing and 4 partials ⚠️
... and 4 more

❌ Your patch check has failed because the patch coverage (47.79%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #236      +/-   ##
==========================================
- Coverage   85.65%   83.66%   -2.00%     
==========================================
  Files         217      224       +7     
  Lines       10766    11289     +523     
  Branches     2888     3023     +135     
==========================================
+ Hits         9222     9445     +223     
- Misses        870     1138     +268     
- Partials      674      706      +32     
Files with missing lines Coverage Δ
src/components/ChatMessage/index.tsx 95.57% <100.00%> (ø)
src/components/ModelCard.tsx 82.50% <ø> (ø)
src/constants/models.ts 100.00% <100.00%> (ø)
src/constants/ttsModels.ts 100.00% <100.00%> (ø)
src/screens/ModelDownloadHelpers.tsx 96.77% <100.00%> (ø)
src/screens/ModelsScreen/TextFiltersSection.tsx 85.29% <ø> (ø)
src/screens/ModelsScreen/constants.ts 100.00% <100.00%> (ø)
src/screens/ModelsScreen/index.tsx 82.35% <ø> (ø)
src/screens/ModelsScreen/styles.ts 100.00% <ø> (ø)
src/screens/ModelsScreen/useModelsScreen.ts 91.26% <ø> (ø)
... and 16 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

alichherawalla and others added 24 commits April 7, 2026 18:04
…, TTSButton placement

Critical fixes for TTS Audio Mode:

- Add updateMessageAudio() to chatStore — writes audioPath, waveformData,
  audioDurationSeconds, isGeneratingAudio back to the conversation message
  (without this, the waveform bubble spun forever after generation)

- Wire auto-TTS trigger in useChatScreen via useEffect on isStreamingForThisConversation:
  detects streaming → stopped, checks Audio Mode + model loaded, calls
  triggerAudioModeGeneration() which sets isGeneratingAudio:true, fires
  generateAndSave, then writes audio fields or clears the flag on error

- Fix isGenerating logic: show spinner only when isGeneratingAudio===true,
  not for every assistant message missing audioPath (which made all old
  messages spin forever in Audio Mode)

- Fix TTSButton placement: add metaExtra prop to ChatMessage/MessageMetaRow
  so TTSButton renders inline in the timestamp row rather than below the bubble

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Voice row (volume icon + Chat/Audio/N/A badge) to the quick
settings popover in the chat input. Tapping it:
- Toggles between Chat and Audio mode when models are downloaded
- Auto-loads/unloads the TTS model on switch
- Navigates to TTSSettings when models are not yet downloaded

This makes Audio Mode accessible without leaving the chat screen.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ChatInput test mock for src/stores was missing useTTSStore, causing
Popovers.tsx (which now uses useTTSStore) to throw on render.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. checkDownloadStatus() never called on TTSSettingsScreen mount
   → store always showed models as not downloaded after fresh app start

2. speak() race condition: stop() during generation didn't prevent playback
   → set isSpeakingFlag=true before generate(), check it after, use finally

3. RNFS.stat() on directory reports block size (~0), not total file size
   → replaced with readDir() recursive sum of individual .pcm file sizes

4. Historical messages without audio showed broken play button in Audio Mode
   → AudioMessageBubble only rendered when msg.audioPath || msg.isGeneratingAudio

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaced stat() mock with readDir() mocks matching the new recursive
file-size summation approach.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces slider controls with a [–] value [+] stepper row for
precise numeric input in settings screens. Supports min/max/step,
optional decimal formatting, and testID for E2E automation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes @react-native-community/slider from GenerationSettingsModal,
ModelSettingsScreen, and TTSSettingsScreen. Every numeric control
(temperature, top-p, GPU layers, speed, etc.) now uses the stepper
for touch-friendly precise adjustment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- MediaAttachment gains audioFormat and audioDurationSeconds fields
- audioRecorderService.stopRecording() now returns { path, durationSeconds }
  instead of just the path, enabling accurate audio bubble scrubbing
- ChatInput/Attachments.addAudioAttachment stores the duration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…send

In Audio Mode, user voice recordings now appear as right-aligned audio
bubbles instead of text messages, making both sides of the conversation
audio-native.

- Voice.ts: adds file-based transcription path (audioRecorderService +
  whisperService.transcribeFile) and onAutoSend callback for atomic send
  with audio attachment. Multimodal models skip transcription entirely.
- ChatInput: passes onAutoSend in Audio Mode; builds MediaAttachment
  inline to avoid async state-update race; uses attachmentsRef for sync reads.
- AudioMessageBubble: adds isUser prop for right-aligned primary-tinted style.
- MessageRenderer: renders user audio attachments as AudioMessageBubble
  before the normal message path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The streaming-complete useEffect only listed isStreamingForThisConversation
in its deps, so activeConversation was captured stale. When streaming ended,
the last message was always the old value — TTS generation was never triggered.

Fix: read conversation and last message directly from useChatStore.getState()
inside the effect instead of relying on the closed-over activeConversation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When no Whisper model is installed and the user taps the mic, show a
CustomAlert offering to download Whisper Small (466 MB) immediately,
rather than navigating away to VoiceSettings.

UnavailableButton also now shows a download icon + percentage while
the model is being fetched, so feedback is in-place.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a TEXT TO SPEECH section alongside IMAGE GENERATION and TEXT
GENERATION in the chat settings modal. Shows mode toggle (chat/audio),
enable switch, speed stepper, and auto-play toggle. Deep-links to
TTSSettingsScreen for full configuration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WHISPER_MODELS grows from 5 to 10 entries covering English-only and
Multilingual variants for tiny/base/small/medium, plus Large v3 Turbo
and Large v3.

whisperService.downloadFromUrl(url, modelId) downloads any ggml .bin
file from an arbitrary URL — enables installing community models from
HuggingFace. whisperStore exposes it as downloadFromUrl action.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrites the voice settings screen with three sections:
- Active model card with inline download progress and remove action
- Curated models grouped by English-only / Multilingual (all sizes,
  tiny → large-v3)
- Live HuggingFace search bar (500 ms debounce) that queries ASR repos;
  tap a repo to expand and browse its ggml .bin files; tap a file to
  confirm and download via downloadFromUrl

huggingFaceService gains searchWhisperRepos() and getWhisperFiles()
to power the HF search without coupling to the LLM model browser.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
llmMessages builds an input_audio content block from audio attachments
when the active model reports audio support, bypassing Whisper entirely.
llm.ts exposes getMultimodalSupport() so the voice layer can detect this.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ttsStore: adds interfaceMode, speed, autoPlay, enabled settings;
  generateAndSave flow for Audio Mode; updateMessageAudio
- ttsService: OuteTTS generate+save path for AI audio bubbles
- TTSButton: play/stop per-message with generation spinner
- KokoroTTSManager + kokoroModels: scaffold for Tier 1 Kokoro TTS
  (not yet wired to react-native-executorch, marked not started)
- App.tsx: mounts KokoroTTSManager near root
- packages: react-native-executorch, background-downloader, dr.pogodin/react-native-fs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ChatMessage: long-press action sheet gains Speak option (delegates to ttsStore)
- ModelSettingsScreen: suppress pre-existing exhaustive-deps lint warning
- Tests: update GenerationSettingsModal and ModelSettingsScreen tests for
  NumericStepper (gpu-layers-stepper-increment) replacing slider testIDs
- TTS_IMPLEMENTATION_PLAN: rewritten to reflect Audio Mode bidirectional
  voice conversation, stale closure fix, and implementation status

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sages

Two bugs causing broken Audio Mode:

1. AudioRecorder was recording at the system default rate (~44.1 kHz),
   producing WAV that Whisper interprets as static ('TV static' / [SOUND]).
   Fix: pass a preset with sampleRate:16000, BitDepth.Bit16 so the file
   is Whisper-compatible 16 kHz mono int16 PCM from the start.

2. buildOAIMessages was always including audio attachments as input_audio
   content blocks, even for models that don't support audio input (e.g.
   remote Qwen 3.5 2B / Gemma 42B). Those models replied 'I cannot hear
   audio'. Fix: buildOAIMessages now accepts supportsAudio flag (default
   false) and only emits input_audio parts when the model declares audio
   support. llm.ts passes multimodalSupport.audio when calling it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
playFromFile was treating WAV bytes as raw Float32 PCM — designed for
OuteTTS output only. WAV files have a 44-byte RIFF header plus int16
samples; reinterpreting them as Float32 produces pure static.

Fix: use AudioContext.decodeAudioData(filePath) which properly parses
the WAV header and decodes samples. The file:// prefix is added if
missing.

MessageRenderer now wraps user and assistant audio bubbles in a
container View with paddingHorizontal:16 and marginVertical:8,
matching the ChatMessage container layout so bubbles align correctly
with the chat edges instead of touching screen borders.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Audio type attachments were falling through to the FadeInImage branch,
causing Image to try to load the WAV file path — resulting in a broken
image placeholder that stretched the user bubble very wide (the 'super
long' bubble issue).

Audio attachments now render as a compact mic icon + 'Voice message'
badge (matching the document badge style), keeping the bubble compact.
In Audio Mode they never reach this code — they render as AudioMessageBubble.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add isAudioModeMessage to Message type and updateMessageAudio signature.
Set flag in triggerAudioModeGeneration so mode switches don't reformat
old text messages. MessageRenderer now checks msg.isAudioModeMessage
instead of global ttsMode for assistant audio bubbles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bug 2: handlePlayPause calls speak() for AI bubbles (empty audioPath)
instead of playMessage with empty string. Remove isGenerating spinner.
Bug 3: WaveformBars gets flex:1 + overflow:hidden, WAVEFORM_BARS 40→28,
bubble overflow:hidden, maxWidth 80%→88%.
Bug 4: user bubble flips play row order (speed+duration left, play right).
Bug 5: voice cycling chip on AI bubbles reads/writes kokoroVoiceId.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix guard: was checking isModelLoaded (OuteTTS, always false) instead
  of kokoroReady — so isAudioModeMessage was never stamped and all AI
  messages rendered as text in audio mode
- Add sentence-level streaming TTS: Kokoro now starts speaking each
  sentence as soon as LLM finishes generating it, instead of waiting
  for the full response
- Fix waveform invisible in idle state: min bar height 3→6px and
  empty waveform now renders a sine-wave placeholder instead of
  nearly-invisible flat bars

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alichherawalla and others added 29 commits April 9, 2026 11:10
- Add react-native-executorch mock to jest.setup.ts (voice configs + useTextToSpeech)
- Fix tts integration test: speak() now passes callback as 3rd arg
- Update VoiceRecordButton tests: tap-to-toggle, download prompt, no "Transcribing..." text
- Update VoiceSettingsScreen tests: new UI with English/Multilingual sections, Active badge
- Update DownloadManagerScreen tests: conditional active section, filter bar touchables
- Update messageContent test: stripControlTokens now trims output

157 suites, 5181 tests, all passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use @react-native-community/slider (already installed) instead of
custom PanResponder-based seekbar. Native component handles drag
natively at 60fps — no JS thread bottleneck. Removes ~60 lines of
PanResponder/measure/layout tracking code. Added slider mock to
jest.setup.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace animated WaveformBars (VU-meter, wave bounce, 3 animation modes,
Animated.Value refs) with simple static bars. Progress is now shown
entirely by the native Slider component. Remove RMS amplitude calculation
from KokoroTTSManager onNext callback. ~80 lines of animation code
removed. No more JS thread contention from per-chunk amplitude updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…click play

- Transcript shows karaoke-style word highlighting based on playback
  progress — spoken words in full color, upcoming words muted
- Stop any TTS playback when user starts recording (mic + speaker
  shouldn't overlap)
- Set isSpeaking + currentMessageId immediately before the 300ms Kokoro
  cleanup wait, so UI shows loading state right away when switching clips

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- KokoroTTSManager: 500ms cooldown after isSpeaking→false before applying
  voice config change, giving native ExecuTorch thread time to fully stop
- Transcript highlight: only the currently spoken word is highlighted
  (primary color + subtle background), not all spoken words
- Auto-scroll: ScrollView with maxHeight 120px, scrolls to keep the
  active word visible as playback progresses

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove word-level transcript highlighting — Kokoro doesn't provide
  word timestamps, so it was always off. Keep transcript as plain text
  in a scrollable container (max 120px)
- Waveform bars now visually distinguish playing vs idle: playing bars
  are brighter (0.6–1.0 opacity), idle bars are dimmer (0.25–0.6)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Waveform bars now tint as the playhead passes: played bars are bright,
  unplayed bars are muted — like WhatsApp voice messages
- Progress is shown directly on the bars, with the Slider below for
  drag-to-seek interaction
- Increase voice change cooldown to 1500ms to prevent native crash

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Audio bubble uses fixed width: 88% (not maxWidth) so it doesn't
  resize when transcript opens
- Thinking block wrapper matches at width: 88% (was maxWidth: 85%)
- Both bubbles now render at exactly the same width

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Slider is now positioned on top of the waveform bars (centered
  vertically) instead of as a separate row below
- Slider track is transparent — waveform bar coloring shows progress
- Slider thumb (dot) sits on top of the waveform at the current position
- Seekbar visible on both user and AI audio bubbles
- Removed separate seekbar row — cleaner layout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thumb is transparent when progress=0 and not seeking. Only becomes
visible (primary color) when audio is actively playing or user is
dragging the slider.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Thumb always shows (primary color) so users know they can seek
- Expand seekOverlay to left/right -16px to compensate for Android
  Slider's built-in ~16px internal padding — thumb now aligns with
  the waveform bar highlighting

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Play button + waveform in top row (waveform takes full remaining width)
- Show transcript, duration, speed chip in a single meta row below
- Matches WhatsApp voice message layout: play + waveform on top, info below

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bars now distribute evenly across the entire container width instead
of clustering together with fixed 2px gaps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Increase to 48 bars with 1.5px gaps — fills full width, looks denser
- Bigger speed chip (more padding, larger border radius) — easier to tap
- Voice change cooldown now uses actual stream end timestamp instead of
  isSpeaking state — waits 2 seconds from when the native stream actually
  stopped, not from when JS flag flipped
- Both user and AI bubbles use same width: 88%

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Waveform bars now span edge-to-edge across the entire bubble width.
Play button sits in the meta row below alongside show transcript,
duration, and speed chip. No more asymmetric padding.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reverted play button to left of waveform (standard layout). Reduced
playRow gap from SPACING.sm to SPACING.xs so waveform extends further
right.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Voice switch: key-based remount of KokoroTTSManager avoids native
  SIGSEGV when executorch re-initializes with a new voice config.
  Outer component manages cooldown, inner component holds the hook.
  Sets kokoroReady=false during switch so UI shows loader.

- Seekbar progress: playMessage finally block now checks ownership
  (currentMessageId === messageId) before clearing state, preventing
  it from clobbering an in-flight speak() call's isSpeaking/isAudioPlaying.
  Added playSessionId counter + retry loop (up to 10x 200ms) when
  executorch reports "model is currently generating" (code 104).

- Seekbar smoothness: timer interval 500ms→50ms, fractional seconds
  instead of Math.floor for continuous waveform bar progress.

- Transcript layout: split TranscriptSection into TranscriptToggle
  (stays in metaRow with time/speed) and TranscriptContent (renders
  below), preventing text from squeezing against duration/speed chip.

- Chat scroll: FlatList hidden (opacity:0) during initial layout,
  revealed after first scrollToEnd settles. Mode switch (chat↔audio)
  resets scroll via extraData + scrollToEnd.

- Voice loader UI: track kokoroActiveVoiceId in store, derive
  isChangingVoice in UI components from settings vs active mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tional Kokoro

- Audio mode now renders tool-call messages via ChatMessage (proper
  bubble + tool call UI) instead of dropping them as raw unstyled text.
  Plain assistant messages still render as AudioMessageBubble.

- Transcript ScrollView uses react-native-gesture-handler for reliable
  nested scrolling inside FlatList on Android. Moved transcript outside
  the TouchableOpacity wrapper so it can capture scroll gestures.

- Action menu (long-press + 3-dot) added to both user and assistant
  audio bubbles: Copy + Resend for user, Copy + Regenerate for assistant.

- Kokoro TTS only loads in audio interface mode (App.tsx), saving RAM
  when in chat mode.

- Post-stream ownership transfer: when all text was spoken by streaming
  chunks, transfers currentMessageId from 'streaming' to the real
  message ID so the AudioMessageBubble seekbar works.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When retrying a message while TTS is speaking, the audio bubble
disappears but Kokoro continues playing natively. Now calls
ttsStore.stop() before deleting messages in the retry handler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Conditional mounting (audio mode only) caused Kokoro to not be ready
during streaming — it takes ~10s to initialize, but fast models finish
streaming before that. Streaming TTS chunks silently skipped because
kokoroReady was false. Reverting to always-mounted so Kokoro is warm
when streaming starts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Streaming TTS chunks couldn't keep up with fast cloud models — Kokoro
speaks slower than tokens arrive, causing a growing backlog of unspoken
chunks, word skipping at transitions, and unpredictable playback.

Replaced with a simpler approach: text streams normally as a ChatMessage,
then when streaming ends the full response is spoken as a single TTS
call with the real message ID. Clean, predictable, no word skipping.

Also includes: stop in-flight TTS when new streaming begins, TTS stop
on retry/resend, and text offset fix for post-stream remaining calc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 9, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
6 Security Hotspots

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant