Runs GPT-2 text generation implemented in Magnetron. Includes KV caching and optional streaming output. Uses transformers for weights and tiktoken for tokenization.
From the repo root:
uv pip install -e .[examples]python examples/gpt2/main.py "What is the answer to life?"Pick a model and generation settings:
python examples/gpt2/main.py "Write a haiku about compilers" --model gpt2-xl --max_tokens 128 --temp 0.7Disable streaming:
python examples/gpt2/main.py "Hello" --no-stream- First run downloads model weights from Hugging Face.