Skip to content

feat(whispering): add configurable VAD silence detection latency#1241

Open
hujuDev wants to merge 4 commits intoEpicenterHQ:mainfrom
hujuDev:feat/vad-silence-detection-latency
Open

feat(whispering): add configurable VAD silence detection latency#1241
hujuDev wants to merge 4 commits intoEpicenterHQ:mainfrom
hujuDev:feat/vad-silence-detection-latency

Conversation

@hujuDev
Copy link

@hujuDev hujuDev commented Jan 10, 2026

Summary

Type of Change

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation update
  • refactor: Code refactoring (no functional changes)
  • perf: Performance improvement
  • test: Test additions or changes
  • chore: Maintenance tasks
  • style: Code style changes

Related Issue

Closes #462

Changes Made

I implemented the "redemptionFrames" option which is available via the already used @ricky0123/vad-web library. As I wanted to keep the changes minimal, I didn't update the dependency, but noted what would need to be changed if updating the dependency for @ricky0123/vad-web (using redemptionMS directly instead of redemptionFrames) as a commented TODO.

I also added a setting to the "Recording" tab when Recording Mode is set to "Voice Activated" called "VAD Pause Buffer (ms)" that allows users to set a custom value.
I also added a small debounce before saving the setting input field using the DEBOUNCE_TIME_MS constant, as it's a numeric input and it would otherwise trigger a save on every digit.

Testing

I tested the feature both using the global shortcut and locally in-app and ensured that the pause buffer matches the configured time using a stopwatch.
I also tested the feature with multiple different sample rates and made sure it works as expected regardless of sample rate configured.

Desktop App Testing

  • Tested on macOS
  • Tested on Windows
  • Tested on Linux
  • Not applicable (web-only change)

General Testing

  • Tested with multiple API providers (if applicable)
  • Verified no API keys are exposed in logs or storage
  • Checked for console errors
  • Tested on different screen sizes (if UI change)

Checklist

  • My code follows the project's coding standards (see CONTRIBUTING.md)
  • I've used type instead of interface in TypeScript
  • I've used absolute imports where applicable
  • I've tested my changes thoroughly
  • I've added/updated tests for my changes (if applicable)
  • My changes don't break existing functionality
  • I've updated documentation (if needed)

Screenshots/Recordings

image

Additional Notes

@Leftium
Copy link
Member

Leftium commented Jan 10, 2026

Instead of debouncing, could you bind to the onchange event? See: #1236

<Input
id="vad-pause-buffer"
type="number"
min="0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if user set 0ms?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sets the redemptionFrames to 0 in the vad-web library. Works in theory, but I think anything below 500 is pretty much unusable as it keeps recognizing a pause mid-sentence if you don't speak very quickly.

Copy link

@stepansoboliev stepansoboliev Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Several days ago, when I found your PR on Discord, I decided to vibe-code this solution. I chose to provide predefined options within a meaningful range, e.g., [0.25s, 0.5s, 1.0s, 1.5s, 2.0s, 3.0s, 4.0s, 5.0s, 7.5s, 10.0s]. Potentially, this could be more user-friendly.

Additionally, I've added the ability to configure the start buffer (preSpeechPadFrames) and input sensitivity (positiveSpeechThreshold). Not sure if it's something useful - I need to test it.

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking about predefined values too, but decided to go for an input field for more flexibility as it can be something that users might want to tune precisely (especially in the sub 3 second range where even 200ms changes felt noticeable to me).

Setting it to 0ms didn't really seem any different from setting it to 1ms, so maybe the info text underneath could just show a warning for small values below 250ms or 500ms or something? What do you think?

Copy link
Author

@hujuDev hujuDev Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also played around with preSpeechPadFrames and positiveSpeechThreshold but found the default values to be working quite well and decided against including them to keep the feature minimal, but if you think they're useful to change around, I can include them!

Copy link

@stepansoboliev stepansoboliev Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested the sensitivity thoroughly yet, but I was having issues in VAD mode where the first word was being clipped. I don't recall the exact default value used here, but in newer versions of vad-web, preSpeechPadMs defaults to 800ms, which is usually plenty.

I agree we should keep the settings minimal.

Regarding the pause buffer - it’s possible that setting it to 0ms causes it to reset to the default value. Adding a note to the description or showing a warning sounds like a great idea; either option would be more than enough.

@Leftium
Copy link
Member

Leftium commented Jan 20, 2026

Sorry for the back-and-forth... Instead of #1236, this PR simply removed notifications on update: #1256

It's been merged, and the proper way to bind the settings input is with value, without any debouncing.

@hujuDev
Copy link
Author

hujuDev commented Jan 21, 2026

Sorry for the back-and-forth... Instead of #1236, this PR simply removed notifications on update: #1256

It's been merged, and the proper way to bind the settings input is with value, without any debouncing.

All good! I'll take a look at #1256 and will update this PR to match it sometime this week.

@Leftium
Copy link
Member

Leftium commented Jan 21, 2026

@hujuDev You just need to bind directly to value like before, and omit any debouncing logic (not needed any more).

@hujuDev hujuDev force-pushed the feat/vad-silence-detection-latency branch from eb28f64 to 7a919f3 Compare January 26, 2026 12:11
@hujuDev
Copy link
Author

hujuDev commented Jan 26, 2026

Added the direct bind to value now.
I also changed the default to show a placeholder with the default value used by the backend so users can easily see what the default is/was and added some more info to the description to guide the user about what the setting does and what values are recommended.

@nikosbosse
Copy link

Looking forward to this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow changing silence detection on Voice activated mode.

4 participants