Skip to content

Latest commit

 

History

History
182 lines (135 loc) · 4.34 KB

File metadata and controls

182 lines (135 loc) · 4.34 KB

🚀 Performance Improvements Applied

Issues Fixed

1. ⚡ Reduced Latency

Changes:

  • Silence detection: 3.0s → 2.0s (faster response)
  • Typing speed: 0.01s/char → 0.005s/char (2x faster typing)
  • Overall: Should feel more responsive now

Note about "typing as you speak":

  • Current limitation: Whisper needs complete audio to transcribe
  • Can't do real-time streaming with standard Whisper
  • Alternative: Could switch to faster-whisper or streaming model (more complex)

2. 🎯 Improved Command Detection

Enhanced Matching:

Old: "move left" in text New: any(phrase in text for phrase in ["move left", "go left", "left arrow", "cursor left"])

More command variations now recognized:

  • "move left" OR "go left" OR "left arrow" OR "cursor left"
  • "move to start" OR "beginning" OR "start of line"
  • "move to end" OR "end of line"
  • etc.

3. 📊 Added Debug Logging

Now shows:

📝 Raw transcription: 'move left five'
📝 Lowercase: 'move left five'
🎯 Command detected: move_left(5)
✅ Command executed successfully

Or if not a command:

📝 Raw transcription: 'echo hello'
📝 Lowercase: 'echo hello'
💬 No command detected, typing text...
⌨️  Typing: echo hello

How to Debug Intermittent Commands

Watch logs in real-time:

./test-commands.sh

or

tail -f /tmp/voice-dictation.log

What to look for:

If command works:

📝 Raw transcription: 'move left'
🎯 Command detected: move_left(1)

If command doesn't work:

📝 Raw transcription: 'muv left'    ← Whisper misheard!
💬 No command detected, typing text...

This tells you exactly how Whisper heard your voice.

Common Issues & Solutions

Issue: Commands work sometimes but not always

Possible causes:

  1. Background noise - Whisper mishears words
  2. Speech clarity - Speaking too fast/slow
  3. Whisper transcription variance - May spell words differently

Solutions:

  • Speak clearly and at normal pace
  • Reduce background noise
  • Check logs to see what Whisper actually transcribed
  • If Whisper consistently mishears, add that variant to code

Issue: Still feels laggy

Current bottleneck: Whisper transcription takes ~1-2 seconds

Transcription time depends on:

  • Audio length (shorter = faster)
  • Model size (tiny = fastest, base = medium, small = slower)
  • CPU speed

To speed up:

# Stop current
./toggle.sh

# Start with tiny model (fastest)
export PICOVOICE_ACCESS_KEY="your-key"
python main.py --no-gui --model tiny

Issue: Typing still seems slow

Current settings:

  • 0.005s per character (200 chars/second)
  • Can be reduced further but may cause input buffer issues

To make even faster, edit main.py line 63:

self.injector = InputInjector(delay=0.001)  # Even faster

Timeline Breakdown

Current flow:

Say "computer"          → <instant>
System hears wake word  → <instant>
Start recording         → <instant>
Speak your command      → <you control>
Silence for 2 seconds   → 2 seconds
Transcribe with Whisper → 1-2 seconds
Type the text          → 0.5-1 second (depends on length)
───────────────────────────────────────
Total: ~4-5 seconds from silence to completion

The 1-2 second Whisper transcription is unavoidable with current setup.

Advanced: Real Streaming (Future Enhancement)

To get true "type as you speak", would need:

  1. Switch to faster-whisper library (supports streaming)
  2. Transcribe in small chunks (0.5s segments)
  3. Type each word as it's transcribed

This is more complex but achievable. Let me know if you want this!

Current Performance Summary

Metric Before After Improvement
Silence detection 3.0s 2.0s 33% faster
Typing speed 0.01s/char 0.005s/char 2x faster
Command variations 2-3 4-5 More flexible
Debug visibility None Full logging Much better

Testing Commands

Try these phrases and watch the logs:

Navigation:

  • "move left"
  • "move right five"
  • "go to start"
  • "end of line"

Editing:

  • "delete word"
  • "delete line"
  • "backspace"

Special:

  • "press enter"
  • "press tab"

Regular text:

  • "echo hello world"
  • "ls dash la"

See logs: ./test-commands.sh