retrain is a TOML-first RLVR (Reinforcement Learning with Verifiable Rewards) trainer for LLMs, built to make experiments easier to run, compare, and repeat.
If you are new, start with install -> explore commands -> run a tiny config.
Requires Python 3.11+.
# CLI + docs exploration
uv tool install retrain
# Local GPU training (adds torch)
uv tool install "retrain[local]"
# Remote Tinker backend
uv tool install "retrain[tinker]"If you are developing this repo directly:
pip install -e ".[dev]"Use these first to understand what exists before you train:
retrain --help
retrain man
retrain man --topic quickstart
retrain man --list-topics
retrain backends
retrain doctorUseful inspection commands while iterating:
retrain explain retrain.toml # dry-run: what this config would do
retrain status logs # summarize runs/campaigns under logs/
retrain plugins # list built-ins + discovered plugins
retrain init-plugin --kind transform --name my_transform --with-test
retrain man --json --topic quickstart
retrain man --path # editable bundled manual sourceCreate mini.toml:
max_tokens = 1024 below is an intentional smoke-test profile.
The standard default for full runs is max_tokens = 10240.
[model]
model = "Qwen/Qwen3-4B-Instruct-2507"
[algorithm]
advantage_mode = "grpo"
transform_mode = "none"
[training]
max_steps = 20
batch_size = 2
group_size = 8
max_tokens = 1024
lr = 4e-5
[backend]
backend = "local"
adapter_path = "adapters/mini"
[logging]
log_dir = "logs/mini"Run it:
retrain mini.tomlOverride fields from CLI without editing TOML:
retrain mini.toml --seed 42 --max-steps 40 --wandb-project my-projectretrain init --template quickstart
retrain retrain.tomlOther templates:
retrain init --list
retrain init --template experiment
retrain init --template campaign
retrain init --interactiveThe normal retrain loop is:
- Define TOML config (
retrain.tomlorcampaign.toml) - Dry-run with
retrain explain ... - Train with
retrain ... - Inspect with
retrain status logs
Use retrain man --topic capacity only when you are sizing longer runs.
- Experiment-first workflow: config -> explain -> run -> compare
- Composable advantage pipeline: GRPO/MaxRL + GTPO/HICRA/SEPA
- Pluggable backends and inference engines
- Pluggable rewards (match, math, judge, custom)
- Campaign sweeps from one TOML
- LoRA-Squeeze rank analysis/compression
- Checkpoint resume and run status tooling
Use verifiers environments from TOML:
[environment]
provider = "verifiers"
id = "primeintellect/gsm8k"
args = { split = "train" }
auto_install = true
max_turns = 8Use custom advantage + transform plugins from TOML:
[algorithm]
advantage_mode = "my_advantages.hipa_like_advantages"
transform_mode = "my_transforms.make_transform_spec"Use a full algorithm plugin (overrides composable advantage+transform path):
[algorithm]
algorithm_mode = "my_algorithms.my_algorithm"Full docs: retrain.readthedocs.io
- Getting Started
- Configuration Reference
- Advantage Functions
- SEPA Scheduling
- Campaigns
- Capacity Planning
- LoRA-Squeeze
- Reward Functions
- Inference Engines
Contributor note: run retrain man --check in CI to detect stale auto-generated manual blocks, run retrain man --sync locally to refresh them, and run uv run mkdocs build --strict before publishing docs changes.