feat(vad): implement native voice activity detection for Linux#846
Open
wchest wants to merge 2 commits intoEpicenterHQ:mainfrom
Open
feat(vad): implement native voice activity detection for Linux#846wchest wants to merge 2 commits intoEpicenterHQ:mainfrom
wchest wants to merge 2 commits intoEpicenterHQ:mainfrom
Conversation
This adds native voice activity detection for Linux using the Silero VAD model, providing better speech detection performance compared to the web-based VAD. Key features: - Configurable sensitivity slider (0.1-0.9 threshold) - Automatic session cleanup to prevent conflicts - Event-based communication between Rust backend and TypeScript frontend - Proper state management matching web VAD behavior - Device enumeration support for consistent UI Technical implementation: - Uses voice_activity_detector crate with Silero v5 model - CPAL for audio capture with 16kHz sample rate preference - Separate events for speech start/end with proper timing - File contents embedded in events to bypass permission issues - Dynamic service selection based on user settings UI improvements: - Fixed icon mapping: ear (👂) for listening, chat bubble (💬) for speech detected - Sensitivity slider only shown when native VAD is enabled - Settings require page reload to apply VAD mode changes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
… VAD The VAD mode description now dynamically shows whether native Silero VAD or web-based VAD is being used, providing accurate information to users about the underlying implementation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements native voice activity detection for Linux using CPAL audio capture and the Silero VAD model, providing better platform integration and reliability compared to browser-based audio processing.
Motivation
While the existing web-based VAD works well, browser audio APIs can have limitations on Linux systems. This native implementation provides:
Implementation Details
Architecture
NativeVadServicealongside existingVadServicewith identical interfacerecording.vad.useNative)src-tauri/src/recorder/vad.rsKey Features
Technical Implementation
User Experience
Settings Integration
Testing
Breaking Changes
None. This is purely additive:
Dependencies
Added
voice_activity_detector = "0.2.1"to provide Silero VAD model integration.Files Changed
src-tauri/src/recorder/vad.rs- New native VAD implementationsrc/lib/services/native-vad.ts- TypeScript service wrappersrc/lib/settings/settings.ts- Added VAD configuration optionssrc/routes/(config)/settings/recording/+page.svelte- UI controls and descriptionsFuture Considerations
This implementation maintains Epicenter's local-first philosophy while providing Linux users with improved audio processing reliability through native platform integration.