This repository contains research code and checkpoints for SpidR-Adapt.
SpidR-Adapt enables rapid adaptation to new languages using minimal unlabeled data. The pipeline consists of three main phases:
-
Meta-Init Stage: Multi-task pre-training with interleaved supervision, learning a robust meta-initialization
$\mathbf{\phi}_0$ from a mixture of known domains. -
Meta-Training (MAdaPT-FOBLO): Further optimizes initialization across multiple domains. Each worker performs inner-loop adaptation with active forgetting (AF) on raw, unlabeled data, followed by outer-loop updates that refine
$\mathbf{\phi}$ by minimizing expected task loss on labeled data. This bi-level optimization yields a meta-learner optimized for rapid adaptation. -
Meta-Test: The learned
$\mathbf{\phi}^*$ quickly adapts to a new, unseen domain using only its raw data. Each domain corresponds to a single language.
SpidR-Adapt achieves rapid gains in phonemic discriminability (ABX) and spoken language modeling (sWUGGY, sBLIMP, tSC), outperforming in-domain language models after training on less than 1 hour of target-language audio—over 100× more data-efficient than standard training.
We recommend using uv:
uv syncAlternatively, use conda (also works for mamba/micromamba):
conda create -n spidr-adapt python=3.12 -c conda-forge
conda activate spidr-adapt
uv pip install -e . --group devOr standard pip:
python3 -m venv .venv
source .venv/bin/activate
pip install -e . --group devNote: FFmpeg is required for torchcodec audio loading. FFmpeg7 is recommended; FFmpeg4/8 may be incompatible. Check your version with ffmpeg -version. If missing or incorrect, install via:
conda install ffmpeg=7.0.0 -c conda-forgeSee these instructions for dataset preperation.
- Create a TOML config file for pretraining (see
src/spidr/config.pyfor available fields). - Start from
configs/multitask_pt_ssl.tomlorconfigs/multitask_pt_sl.toml(specifies required fields).
You will obtain two meta-initializations:
Multi-Task-PT[SSL]: Standard multi-task pretraining (self-supervised loss).Multi-Task-PT[SSL/SL]: Interleaved supervised loss in self-supervised training.
Both are pretrained on VP19 languages (excluding target languages).
Launch two implementations of MAdapt: Reptile and FOBLO with different meta-initializations by setting init_ckpt in configs, or use:
configs/*_ssl.tomlconfigs/*_ssl+sl.toml
Utilities are provided for SLURM clusters. Training and validation are separate jobs.
Example:
python -m spidr_adapt --helpKey options:
-A ACCOUNT(SLURM account)-N NODES(number of nodes)-G GPUS_PER_NODE(GPUs per node)-c CPUS_PER_TASK(CPUs per task)--mem-per-gpu MEM_PER_GPU-t TIME(time limit)-C CONSTRAINT(SLURM constraint)-q QOS(SLURM qos)--dump DUMP(Submitit dump)configs [configs ...](TOML config files)
| Method | Avg ABX Score w/o 0h (↓) | Checkpoints |
|---|---|---|
| Multi-Task-PT [SSL] | 4.33 | link |
| +MAdaPT-Reptile | 4.19 | link |
| +MAdaPT-FOBLO | 4.01 | link |
| Multi-Task-PT [SSL/SL] | 3.88 | link |
| +MAdaPT-Reptile | 3.76 | link |
| +MAdaPT-FOBLO | 3.80 | link |
- Use the last meta-training checkpoint as initialization (
init_ckptinconfigs/finetuning.toml). - For fast fine-tuning to OoD target languages, use a single GPU.
- Evaluate each model variant (Multi-Task-PT, MAdaPT-Reptile, MAdaPT-FOBLO with SSL or SSL/SL) with various adaptation dataset sizes (10 min to 100 hours).
python src/spidr_adapt/finetuning.py ~configs/finetuning.tomlThe source code and model checkpoints are provided under the CC BY-NC 4.0 License.
