🎤 VoxTerm

Talk to your terminal. It's listening.

Say "computer", speak your commands, and watch them appear—like you're in Star Trek. All processing happens locally on your Mac. No cloud, no surveillance, just you and your machine.

💡 Why VoxTerm?

macOS Has Voice Control. Why Do I Need This?

Great question. macOS Voice Control is fantastic for accessibility and general UI control. But if you live in the terminal, it's not built for you.

VoxTerm is different:

What You Get	Why It Matters
🎯 Always Listening	Just say "computer"—no manual key press every time
⚡ Terminal-Native Commands	"delete word", "move left 5", "send it"—built for CLI workflows
🤖 AI Integration	Claude mode switching with voice—no other tool does this
🔓 Open Source	MIT licensed. Know exactly what's listening and where your audio goes
🎨 Fully Customizable	13+ wake words, tunable sensitivity, model choice—make it yours
🚀 Developer-First	Built by devs for devs, not adapted from accessibility tools

The Cool Factor: You're not just using voice control—you're having a conversation with your terminal. Say "computer", dictate your Git commit message, say "send it", and it's done. Hands never leave your coffee cup.

The Real Reason: Because typing git commit -m "fix: resolve issue with async handler in webhook processor" when you're 8 coffees deep is pain. Just say it.

✨ What Makes VoxTerm Special

🎙️ Always-Listening Wake Word

Say "computer" (or jarvis, or alexa, or 10+ other options) and VoxTerm springs to life. No keyboard shortcuts, no mouse clicks, no interruption to your flow.

vs macOS Voice Control: Requires manual activation key every single time.

🧠 Terminal-Specific Commands

"move left 5" → Cursor jumps
"delete word" → Previous word gone
"send it" → Enter pressed
"change mode twice" → Cycle through Claude AI modes

vs macOS Voice Control: Designed for UI navigation, not CLI text editing.

🔒 Privacy-First Architecture

All transcription runs locally via OpenAI Whisper
No cloud APIs for speech-to-text
Open source—audit the code yourself
Picovoice key ONLY used for wake word detection (transparent about this)

vs macOS Voice Control: Proprietary black box. You trust Apple, but you can't verify.

🤖 Claude AI Integration

Unique to VoxTerm: Voice-controlled mode cycling for Claude AI prompts. "Change mode" switches between plan/edit/default without touching the keyboard.

vs macOS Voice Control: No AI assistant integration.

🎨 Deep Customization

13+ wake words (computer, jarvis, alexa, hey google, etc.)
4 model sizes (tiny for speed, large for accuracy)
Sensitivity tuning (0.0-1.0)
Background daemon mode

vs macOS Voice Control: Limited OS-level settings.

🛠️ Developer-Friendly

MIT licensed
Well-documented codebase
Extensible architecture
Active development

vs macOS Voice Control: Closed source, no extension points.

📊 VoxTerm vs Alternatives

Feature	VoxTerm	macOS Voice Control	Talon Voice	Dragon
Terminal-optimized	✅	❌	✅	❌
Wake word activation	✅	❌	✅	❌
Open source	✅	❌	❌	❌
Local processing	✅	✅	✅	❌
Free	✅	✅	❌ ($$$)	❌ ($$$)
Claude AI integration	✅	❌	❌	❌
macOS native	✅	✅	✅	✅
Setup difficulty	Medium	Easy	Hard	Easy

🏗️ Technology Stack

Component	Technology
Audio Capture	PyAudio (16kHz, 16-bit PCM)
Wake Word	Porcupine (Picovoice) - requires free API key
Speech Recognition	OpenAI Whisper (local/offline)
Streaming STT	faster-whisper (real-time transcription)
VAD	WebRTC Voice Activity Detection
Keyboard Simulation	pynput

🚀 Get Started in 5 Minutes

Requirements: macOS 14+, Python 3.9+, Microphone

One-command setup:

# Clone and run setup
git clone https://github.com/not2technical/voxterm.git
cd voxterm
./setup.sh
./setup-access-key.sh
./toggle.sh

# Say "computer" and start talking! 🎉

That's it. Your terminal now understands English.

👥 Who's This For?

Perfect For:

Terminal power users who want hands-free command execution
Developers who dictate Git commits, docs, and scripts
AI enthusiasts who use Claude/ChatGPT and want voice integration
Privacy-conscious users who want local-only processing
Accessibility users who need more than macOS Voice Control offers
Anyone who thinks talking to computers is cool

Not For:

People happy with macOS Voice Control
Users who rarely use the terminal
Those unwilling to set up Python dependencies

Use Cases:

Dictating long Git commit messages
Writing documentation while coding
Hands-free command execution during demos
Accessibility when keyboard is difficult
Just being cool at coffee shops 😎

📋 Prerequisites

macOS (tested on macOS 14+)
Python 3.9+
Homebrew (for installing PortAudio)
Microphone access
Picovoice Access Key (free tier available)

🔑 Getting Your Picovoice API Key

The wake word detection (Porcupine) requires a free API key from Picovoice:

Visit: https://console.picovoice.ai/
Sign up for a free account (no credit card required)
Create a new access key
Copy the key

Then run our automated setup:

./setup-access-key.sh

Or manually create a .env file:

cp .env.example .env
# Edit .env and paste your key

Important:

The key is stored locally in .env (excluded from git)
Only the wake word detection uses this key
Speech-to-text (Whisper) runs 100% locally

See ACCESS_KEY_SETUP.md for detailed instructions.

📦 Installation

Automated Setup (Recommended)

./setup.sh

This script will:

Create a Python virtual environment
Install PortAudio via Homebrew
Install all Python dependencies
Download the Whisper model

Manual Installation

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install PortAudio
brew install portaudio

# Install Python packages
pip install --upgrade pip
pip install -r requirements.txt

Note: First run will download the Whisper model (~140MB for base model).

🎮 Usage

Background Service (Recommended)

Start/stop as background service:

./toggle.sh

Run again to toggle off.

Foreground Mode

Run in foreground for testing/debugging:

./run-streaming.sh

Press Ctrl+C to stop.

Monitoring Background Service

View logs in real-time:

tail -f /tmp/voice-dictation-streaming.log

Shows wake word detections, transcriptions, command executions, and errors.

Custom Configuration

# Use a different wake word
python main_streaming.py --wake-word jarvis

# Use a different model
python main_streaming.py --model small

# Adjust sensitivity
python main_streaming.py --sensitivity 0.7

🎤 Voice Commands

Text Input

Just speak normally:

You: "computer"
You: "echo hello world"
→ Types: echo hello world

Claude Mode Toggle

Command	Action
`change mode`	Cycle through Claude's plan/edit/default modes
`change mode twice`	Cycle through modes twice
`change mode three times`	Cycle three times

Text Submission

Command	Action
`send it`	Submit the current input (press Enter)
`submit`	Submit the current input (press Enter)

Navigation Commands

Command	Action
`move left [N]`	Move cursor left N positions (default: 1)
`move right [N]`	Move cursor right N positions
`move to start`	Jump to start of line
`move to end`	Jump to end of line
`beginning`	Jump to start of line

Editing Commands

Command	Action
`delete word`	Delete previous word
`delete line`	Delete entire line
`delete [N]`	Delete N characters (default: 1)
`backspace [N]`	Delete N characters

See CHEATSHEET.md for the complete command reference.

📚 Documentation

QUICKSTART.md - Get up and running in 5 minutes
USAGE.md - Detailed usage instructions
STREAMING_GUIDE.md - Streaming mode deep dive
ACCESS_KEY_SETUP.md - API key setup guide
CHEATSHEET.md - Voice commands reference
LAUNCHER_GUIDE.md - Advanced launcher options

🏛️ Architecture

┌─────────────────────┐
│  Wake Word Detector │  (Porcupine)
│  "computer"         │
└──────────┬──────────┘
           │ Activated!
           ▼
┌─────────────────────┐
│  Audio Recorder     │  (PyAudio + WebRTC VAD)
│  Record until       │
│  silence detected   │
└──────────┬──────────┘
           │ Audio data
           ▼
┌─────────────────────┐
│  Transcriber        │  (OpenAI Whisper - Local)
│  Speech → Text      │  (or faster-whisper)
└──────────┬──────────┘
           │ Transcribed text
           ▼
┌─────────────────────┐
│  Command Processor  │  (Parse commands vs text)
│  "move left" → cmd  │
│  "hello" → text     │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Input Injector     │  (pynput)
│  Simulate keyboard  │
└─────────────────────┘

📂 Project Structure

voxterm/
├── main_streaming.py          # Streaming mode entry point
├── streaming_recorder.py      # Real-time audio recording
├── streaming_transcriber.py   # faster-whisper transcription
├── wake_word_detector.py      # Porcupine wake word detection
├── input_injector.py          # Keyboard simulation
├── audio_recorder.py          # Audio recording with VAD
├── transcriber.py             # Whisper transcription
├── test_mic.py               # Microphone testing utility
├── setup.sh                   # Automated installation
├── setup-access-key.sh        # API key configuration
├── run-streaming.sh           # Launch streaming mode (foreground)
├── toggle.sh                  # Start/stop service (background)
├── toggle-streaming.sh        # Streaming service toggle
├── status.sh                  # Check service status
└── requirements.txt           # Python dependencies

🔒 Security & Privacy

✅ All transcription happens locally - No cloud APIs for speech-to-text
✅ Picovoice key only used for wake word - Not for transcription
✅ Audio never leaves your machine - 100% local processing
✅ No telemetry - No usage data collected
✅ Open source - Audit the code yourself

Your .env file containing the API key is automatically excluded from git via .gitignore. Never commit API keys to version control.

🐛 Troubleshooting

Issue: "No module named 'pyaudio'"

Solution:

brew install portaudio
pip install pyaudio

Issue: "Permission denied" for microphone

Solution:

Go to System Settings → Privacy & Security → Microphone
Enable microphone access for Terminal or your terminal app

Issue: Wake word not detecting

Solution:

Speak clearly and at normal volume
Increase sensitivity: python main_streaming.py --sensitivity 0.7
Test microphone: ./test_mic.py
Try different wake word: python main_streaming.py --wake-word jarvis

Issue: Slow transcription

Solution:

Use streaming mode: ./run-streaming.sh
Use smaller model: python main_streaming.py --model tiny
Close other applications to free up CPU/RAM

Testing Microphone

Solution:

cd voxterm
source venv/bin/activate
python test_mic.py

Speaks a test sound and shows audio levels. Press Ctrl+C to stop.

🛠️ Advanced Usage

Toggle On/Off

# Start or stop the service
./toggle-streaming.sh

# Check status
./status.sh

Create macOS Application

./create-app.sh

Creates a VoxTerm.app that you can add to your Dock.

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Before submitting:

Test your changes thoroughly
Update documentation if needed
Ensure no API keys in code

📝 License

MIT License - see LICENSE file for details.

Free to use for personal and commercial projects.

🙏 Credits & Acknowledgments

OpenAI Whisper - Local speech recognition
Picovoice Porcupine - Wake word detection
faster-whisper - Optimized Whisper inference
PyAudio - Audio capture
pynput - Keyboard simulation
WebRTC VAD - Voice activity detection

💬 Support

Issues: GitHub Issues
Discussions: GitHub Discussions

⭐ Show Your Support

If you find this project useful, please consider giving it a star on GitHub! It helps others discover the project.

Made with ❤️ for the command line

Talk to your terminal. It's listening.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.env.example		.env.example
.gitignore		.gitignore
ACCESS_KEY_SETUP.md		ACCESS_KEY_SETUP.md
CHEATSHEET.md		CHEATSHEET.md
IMPROVEMENTS.md		IMPROVEMENTS.md
LAUNCHER_GUIDE.md		LAUNCHER_GUIDE.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
REBRAND_SUMMARY.md		REBRAND_SUMMARY.md
STREAMING_GUIDE.md		STREAMING_GUIDE.md
SUBMIT_KEYWORDS.md		SUBMIT_KEYWORDS.md
USAGE.md		USAGE.md
VoiceDictation.app.sh		VoiceDictation.app.sh
audio_recorder.py		audio_recorder.py
create-app.sh		create-app.sh
input_injector.py		input_injector.py
main_streaming.py		main_streaming.py
requirements.txt		requirements.txt
run-streaming.sh		run-streaming.sh
setup-access-key.sh		setup-access-key.sh
setup.sh		setup.sh
status.sh		status.sh
streaming_recorder.py		streaming_recorder.py
streaming_transcriber.py		streaming_transcriber.py
test-commands.sh		test-commands.sh
test_mic.py		test_mic.py
toggle-streaming.sh		toggle-streaming.sh
toggle.sh		toggle.sh
transcriber.py		transcriber.py
verify-rebrand.sh		verify-rebrand.sh
wake_word_detector.py		wake_word_detector.py

Folders and files

Latest commit

History

Repository files navigation

🎤 VoxTerm

💡 Why VoxTerm?

macOS Has Voice Control. Why Do I Need This?

✨ What Makes VoxTerm Special

🎙️ Always-Listening Wake Word

🧠 Terminal-Specific Commands

🔒 Privacy-First Architecture

🤖 Claude AI Integration

🎨 Deep Customization

🛠️ Developer-Friendly

📊 VoxTerm vs Alternatives

🏗️ Technology Stack

🚀 Get Started in 5 Minutes

👥 Who's This For?

Perfect For:

Not For:

Use Cases:

📋 Prerequisites

🔑 Getting Your Picovoice API Key

📦 Installation

Automated Setup (Recommended)

Manual Installation

🎮 Usage

Background Service (Recommended)

Foreground Mode

Monitoring Background Service

Custom Configuration

🎤 Voice Commands

Text Input

Claude Mode Toggle

Text Submission

Navigation Commands

Editing Commands

📚 Documentation

🏛️ Architecture

📂 Project Structure

🔒 Security & Privacy

🐛 Troubleshooting

Issue: "No module named 'pyaudio'"

Issue: "Permission denied" for microphone

Issue: Wake word not detecting

Issue: Slow transcription

Testing Microphone

🛠️ Advanced Usage

Toggle On/Off

Create macOS Application

🤝 Contributing

📝 License

🙏 Credits & Acknowledgments

💬 Support

⭐ Show Your Support

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages

Contributors

Languages