High-quality audio separation and MIDI transcription CLI.
Separate audio into stems (vocals, drums, bass, guitar, piano, other) and transcribe each to MIDI using state-of-the-art ML models.
- Best-in-class separation: Uses BS-RoFormer (SDR 12.97) and Demucs models
- 6-stem separation: Vocals, drums, bass, guitar, piano, other
- Accurate transcription: Spotify's Basic Pitch for MIDI conversion
- GPU acceleration: CUDA (NVIDIA), MPS (Apple Silicon), CPU fallback
- BPM & key detection: Automatic tempo and musical key analysis
- DAW-ready MIDI: Proper instrument assignments and multi-track export
pip install audio2midiFor GPU acceleration (NVIDIA):
pip install audio2midi[gpu]RTX 5070/5080/5090 require PyTorch with CUDA 12.8 support:
pip install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128Separate and transcribe in one command:
audio2midi convert song.mp3 -o output/This will:
- Analyze BPM and key
- Separate into 6 stems
- Transcribe each stem to MIDI
- Output individual + combined MIDI files
audio2midi separate song.mp3 --model htdemucs_6sAvailable models:
htdemucs_6s- 6 stems (default)htdemucs- 4 stems (faster)bs_roformer- Best vocal separation
audio2midi transcribe vocals.wav -o vocals.mid --instrument vocalsaudio2midi analyze song.mp3audio2midi deviceoutput/
└── song/
├── analysis.json
├── stems/
│ ├── vocals.wav
│ ├── drums.wav
│ ├── bass.wav
│ ├── guitar.wav
│ ├── piano.wav
│ └── other.wav
└── midi/
├── vocals.mid
├── drums.mid
├── bass.mid
├── guitar.mid
├── piano.mid
├── other.mid
└── combined.mid
- Python 3.10+
- CUDA-capable GPU recommended (10x faster than CPU)
MIT