Skip to content

not2technical/voxterm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎤 VoxTerm

Talk to your terminal. It's listening.

Say "computer", speak your commands, and watch them appear—like you're in Star Trek. All processing happens locally on your Mac. No cloud, no surveillance, just you and your machine.

MIT License Python 3.9+ macOS


💡 Why VoxTerm?

macOS Has Voice Control. Why Do I Need This?

Great question. macOS Voice Control is fantastic for accessibility and general UI control. But if you live in the terminal, it's not built for you.

VoxTerm is different:

What You Get Why It Matters
🎯 Always Listening Just say "computer"—no manual key press every time
⚡ Terminal-Native Commands "delete word", "move left 5", "send it"—built for CLI workflows
🤖 AI Integration Claude mode switching with voice—no other tool does this
🔓 Open Source MIT licensed. Know exactly what's listening and where your audio goes
🎨 Fully Customizable 13+ wake words, tunable sensitivity, model choice—make it yours
🚀 Developer-First Built by devs for devs, not adapted from accessibility tools

The Cool Factor: You're not just using voice control—you're having a conversation with your terminal. Say "computer", dictate your Git commit message, say "send it", and it's done. Hands never leave your coffee cup.

The Real Reason: Because typing git commit -m "fix: resolve issue with async handler in webhook processor" when you're 8 coffees deep is pain. Just say it.


✨ What Makes VoxTerm Special

🎙️ Always-Listening Wake Word

Say "computer" (or jarvis, or alexa, or 10+ other options) and VoxTerm springs to life. No keyboard shortcuts, no mouse clicks, no interruption to your flow.

vs macOS Voice Control: Requires manual activation key every single time.

🧠 Terminal-Specific Commands

  • "move left 5" → Cursor jumps
  • "delete word" → Previous word gone
  • "send it" → Enter pressed
  • "change mode twice" → Cycle through Claude AI modes

vs macOS Voice Control: Designed for UI navigation, not CLI text editing.

🔒 Privacy-First Architecture

  • All transcription runs locally via OpenAI Whisper
  • No cloud APIs for speech-to-text
  • Open source—audit the code yourself
  • Picovoice key ONLY used for wake word detection (transparent about this)

vs macOS Voice Control: Proprietary black box. You trust Apple, but you can't verify.

🤖 Claude AI Integration

Unique to VoxTerm: Voice-controlled mode cycling for Claude AI prompts. "Change mode" switches between plan/edit/default without touching the keyboard.

vs macOS Voice Control: No AI assistant integration.

🎨 Deep Customization

  • 13+ wake words (computer, jarvis, alexa, hey google, etc.)
  • 4 model sizes (tiny for speed, large for accuracy)
  • Sensitivity tuning (0.0-1.0)
  • Background daemon mode

vs macOS Voice Control: Limited OS-level settings.

🛠️ Developer-Friendly

  • MIT licensed
  • Well-documented codebase
  • Extensible architecture
  • Active development

vs macOS Voice Control: Closed source, no extension points.


📊 VoxTerm vs Alternatives

Feature VoxTerm macOS Voice Control Talon Voice Dragon
Terminal-optimized
Wake word activation
Open source
Local processing
Free ❌ ($$$) ❌ ($$$)
Claude AI integration
macOS native
Setup difficulty Medium Easy Hard Easy

🏗️ Technology Stack

Component Technology
Audio Capture PyAudio (16kHz, 16-bit PCM)
Wake Word Porcupine (Picovoice) - requires free API key
Speech Recognition OpenAI Whisper (local/offline)
Streaming STT faster-whisper (real-time transcription)
VAD WebRTC Voice Activity Detection
Keyboard Simulation pynput

🚀 Get Started in 5 Minutes

Requirements: macOS 14+, Python 3.9+, Microphone

One-command setup:

# Clone and run setup
git clone https://github.com/not2technical/voxterm.git
cd voxterm
./setup.sh
./setup-access-key.sh
./toggle.sh

# Say "computer" and start talking! 🎉

That's it. Your terminal now understands English.


👥 Who's This For?

Perfect For:

  • Terminal power users who want hands-free command execution
  • Developers who dictate Git commits, docs, and scripts
  • AI enthusiasts who use Claude/ChatGPT and want voice integration
  • Privacy-conscious users who want local-only processing
  • Accessibility users who need more than macOS Voice Control offers
  • Anyone who thinks talking to computers is cool

Not For:

  • People happy with macOS Voice Control
  • Users who rarely use the terminal
  • Those unwilling to set up Python dependencies

Use Cases:

  • Dictating long Git commit messages
  • Writing documentation while coding
  • Hands-free command execution during demos
  • Accessibility when keyboard is difficult
  • Just being cool at coffee shops 😎

📋 Prerequisites

  • macOS (tested on macOS 14+)
  • Python 3.9+
  • Homebrew (for installing PortAudio)
  • Microphone access
  • Picovoice Access Key (free tier available)

🔑 Getting Your Picovoice API Key

The wake word detection (Porcupine) requires a free API key from Picovoice:

  1. Visit: https://console.picovoice.ai/
  2. Sign up for a free account (no credit card required)
  3. Create a new access key
  4. Copy the key

Then run our automated setup:

./setup-access-key.sh

Or manually create a .env file:

cp .env.example .env
# Edit .env and paste your key

Important:

  • The key is stored locally in .env (excluded from git)
  • Only the wake word detection uses this key
  • Speech-to-text (Whisper) runs 100% locally

See ACCESS_KEY_SETUP.md for detailed instructions.


📦 Installation

Automated Setup (Recommended)

./setup.sh

This script will:

  • Create a Python virtual environment
  • Install PortAudio via Homebrew
  • Install all Python dependencies
  • Download the Whisper model

Manual Installation

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install PortAudio
brew install portaudio

# Install Python packages
pip install --upgrade pip
pip install -r requirements.txt

Note: First run will download the Whisper model (~140MB for base model).


🎮 Usage

Background Service (Recommended)

Start/stop as background service:

./toggle.sh

Run again to toggle off.

Foreground Mode

Run in foreground for testing/debugging:

./run-streaming.sh

Press Ctrl+C to stop.

Monitoring Background Service

View logs in real-time:

tail -f /tmp/voice-dictation-streaming.log

Shows wake word detections, transcriptions, command executions, and errors.

Custom Configuration

# Use a different wake word
python main_streaming.py --wake-word jarvis

# Use a different model
python main_streaming.py --model small

# Adjust sensitivity
python main_streaming.py --sensitivity 0.7

🎤 Voice Commands

Text Input

Just speak normally:

You: "computer"
You: "echo hello world"
→ Types: echo hello world

Claude Mode Toggle

Command Action
change mode Cycle through Claude's plan/edit/default modes
change mode twice Cycle through modes twice
change mode three times Cycle three times

Text Submission

Command Action
send it Submit the current input (press Enter)
submit Submit the current input (press Enter)

Navigation Commands

Command Action
move left [N] Move cursor left N positions (default: 1)
move right [N] Move cursor right N positions
move to start Jump to start of line
move to end Jump to end of line
beginning Jump to start of line

Editing Commands

Command Action
delete word Delete previous word
delete line Delete entire line
delete [N] Delete N characters (default: 1)
backspace [N] Delete N characters

See CHEATSHEET.md for the complete command reference.


📚 Documentation


🏛️ Architecture

┌─────────────────────┐
│  Wake Word Detector │  (Porcupine)
│  "computer"         │
└──────────┬──────────┘
           │ Activated!
           ▼
┌─────────────────────┐
│  Audio Recorder     │  (PyAudio + WebRTC VAD)
│  Record until       │
│  silence detected   │
└──────────┬──────────┘
           │ Audio data
           ▼
┌─────────────────────┐
│  Transcriber        │  (OpenAI Whisper - Local)
│  Speech → Text      │  (or faster-whisper)
└──────────┬──────────┘
           │ Transcribed text
           ▼
┌─────────────────────┐
│  Command Processor  │  (Parse commands vs text)
│  "move left" → cmd  │
│  "hello" → text     │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Input Injector     │  (pynput)
│  Simulate keyboard  │
└─────────────────────┘

📂 Project Structure

voxterm/
├── main_streaming.py          # Streaming mode entry point
├── streaming_recorder.py      # Real-time audio recording
├── streaming_transcriber.py   # faster-whisper transcription
├── wake_word_detector.py      # Porcupine wake word detection
├── input_injector.py          # Keyboard simulation
├── audio_recorder.py          # Audio recording with VAD
├── transcriber.py             # Whisper transcription
├── test_mic.py               # Microphone testing utility
├── setup.sh                   # Automated installation
├── setup-access-key.sh        # API key configuration
├── run-streaming.sh           # Launch streaming mode (foreground)
├── toggle.sh                  # Start/stop service (background)
├── toggle-streaming.sh        # Streaming service toggle
├── status.sh                  # Check service status
└── requirements.txt           # Python dependencies

🔒 Security & Privacy

  • All transcription happens locally - No cloud APIs for speech-to-text
  • Picovoice key only used for wake word - Not for transcription
  • Audio never leaves your machine - 100% local processing
  • No telemetry - No usage data collected
  • Open source - Audit the code yourself

Your .env file containing the API key is automatically excluded from git via .gitignore. Never commit API keys to version control.


🐛 Troubleshooting

Issue: "No module named 'pyaudio'"

Solution:

brew install portaudio
pip install pyaudio

Issue: "Permission denied" for microphone

Solution:

  1. Go to System Settings → Privacy & Security → Microphone
  2. Enable microphone access for Terminal or your terminal app

Issue: Wake word not detecting

Solution:

  1. Speak clearly and at normal volume
  2. Increase sensitivity: python main_streaming.py --sensitivity 0.7
  3. Test microphone: ./test_mic.py
  4. Try different wake word: python main_streaming.py --wake-word jarvis

Issue: Slow transcription

Solution:

  1. Use streaming mode: ./run-streaming.sh
  2. Use smaller model: python main_streaming.py --model tiny
  3. Close other applications to free up CPU/RAM

Testing Microphone

Solution:

cd voxterm
source venv/bin/activate
python test_mic.py

Speaks a test sound and shows audio levels. Press Ctrl+C to stop.


🛠️ Advanced Usage

Toggle On/Off

# Start or stop the service
./toggle-streaming.sh

# Check status
./status.sh

Create macOS Application

./create-app.sh

Creates a VoxTerm.app that you can add to your Dock.


🤝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Before submitting:

  • Test your changes thoroughly
  • Update documentation if needed
  • Ensure no API keys in code

📝 License

MIT License - see LICENSE file for details.

Free to use for personal and commercial projects.


🙏 Credits & Acknowledgments


💬 Support


⭐ Show Your Support

If you find this project useful, please consider giving it a star on GitHub! It helps others discover the project.


Made with ❤️ for the command line

Talk to your terminal. It's listening.

About

Privacy-focused voice dictation for terminal input with wake word activation and local speech-to-text

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors