feat(whispering): add configurable VAD silence detection latency#1241
feat(whispering): add configurable VAD silence detection latency#1241hujuDev wants to merge 4 commits intoEpicenterHQ:mainfrom
Conversation
|
Instead of debouncing, could you bind to the |
| <Input | ||
| id="vad-pause-buffer" | ||
| type="number" | ||
| min="0" |
There was a problem hiding this comment.
What will happen if user set 0ms?
There was a problem hiding this comment.
It sets the redemptionFrames to 0 in the vad-web library. Works in theory, but I think anything below 500 is pretty much unusable as it keeps recognizing a pause mid-sentence if you don't speak very quickly.
There was a problem hiding this comment.
Makes sense.
Several days ago, when I found your PR on Discord, I decided to vibe-code this solution. I chose to provide predefined options within a meaningful range, e.g., [0.25s, 0.5s, 1.0s, 1.5s, 2.0s, 3.0s, 4.0s, 5.0s, 7.5s, 10.0s]. Potentially, this could be more user-friendly.
Additionally, I've added the ability to configure the start buffer (preSpeechPadFrames) and input sensitivity (positiveSpeechThreshold). Not sure if it's something useful - I need to test it.
There was a problem hiding this comment.
Yeah I was thinking about predefined values too, but decided to go for an input field for more flexibility as it can be something that users might want to tune precisely (especially in the sub 3 second range where even 200ms changes felt noticeable to me).
Setting it to 0ms didn't really seem any different from setting it to 1ms, so maybe the info text underneath could just show a warning for small values below 250ms or 500ms or something? What do you think?
There was a problem hiding this comment.
I also played around with preSpeechPadFrames and positiveSpeechThreshold but found the default values to be working quite well and decided against including them to keep the feature minimal, but if you think they're useful to change around, I can include them!
There was a problem hiding this comment.
I haven't tested the sensitivity thoroughly yet, but I was having issues in VAD mode where the first word was being clipped. I don't recall the exact default value used here, but in newer versions of vad-web, preSpeechPadMs defaults to 800ms, which is usually plenty.
I agree we should keep the settings minimal.
Regarding the pause buffer - it’s possible that setting it to 0ms causes it to reset to the default value. Adding a note to the description or showing a warning sounds like a great idea; either option would be more than enough.
|
@hujuDev You just need to bind directly to |
eb28f64 to
7a919f3
Compare
|
Added the direct bind to |
|
Looking forward to this! |
Summary
Type of Change
feat: New featurefix: Bug fixdocs: Documentation updaterefactor: Code refactoring (no functional changes)perf: Performance improvementtest: Test additions or changeschore: Maintenance tasksstyle: Code style changesRelated Issue
Closes #462
Changes Made
I implemented the "redemptionFrames" option which is available via the already used @ricky0123/vad-web library. As I wanted to keep the changes minimal, I didn't update the dependency, but noted what would need to be changed if updating the dependency for @ricky0123/vad-web (using redemptionMS directly instead of redemptionFrames) as a commented TODO.
I also added a setting to the "Recording" tab when Recording Mode is set to "Voice Activated" called "VAD Pause Buffer (ms)" that allows users to set a custom value.
I also added a small debounce before saving the setting input field using the DEBOUNCE_TIME_MS constant, as it's a numeric input and it would otherwise trigger a save on every digit.
Testing
I tested the feature both using the global shortcut and locally in-app and ensured that the pause buffer matches the configured time using a stopwatch.
I also tested the feature with multiple different sample rates and made sure it works as expected regardless of sample rate configured.
Desktop App Testing
General Testing
Checklist
typeinstead ofinterfacein TypeScriptScreenshots/Recordings
Additional Notes