Worker for the Sunet transcription service (Sunet Scribe).
This project is developed by Sunet. Contributor: Kristofer Hallin.
This project is licensed under the Apache License, Version 2.0. See LICENSE for details.
Copyright (c) 2025-2026 Sunet. Contributor: Kristofer Hallin.
Contributions are welcome! Please feel free to open issues or submit pull requests.
- Transcription Processing: Processes audio/video transcription jobs from the backend queue
- Whisper.cpp Integration: Uses whisper.cpp for efficient local transcription
- Multiple Output Formats: Generates JSON and SRT transcription outputs
- Multi-worker Support: Run multiple workers in parallel for increased throughput
- Python 3.13+
- uv (recommended package manager)
- whisper.cpp (must be built separately)
- FFmpeg (for audio/video processing)
git clone <repository-url>
cd scribe-worker
uv syncBuild and install whisper.cpp from source. See https://github.com/ggml-org/whisper.cpp for detailed instructions.
./download_models.shCreate a .env file in the project root with the following settings:
# Debug mode
DEBUG=True
# Backend API configuration
API_BACKEND_URL="http://localhost:8000"
API_VERSION="v1"
# Worker configuration
WORKERS=2
WHISPER_CPP_PATH=<Path to whisper.cpp>
FILE_STORAGE_DIR=<Your file storage directory>uv run main.py --foreground --debugBuild and run with Docker:
docker build -t scribe-worker .
docker run --env-file .env scribe-workerscribe-worker/
├── main.py # Worker entry point
├── utils/ # Utility modules
├── models/ # Whisper model files
└── downloaded/ # Downloaded files for processing