Contributing to alignment-evals

We welcome contributions from the AI safety research community.

How to Contribute

Create a new file in alignment_evals/evals/ that subclasses BaseEval
Implement generate_prompts(), score_response(), and optionally override aggregate()
Add corresponding test cases in tests/
Add the eval to the AlignmentSuite if appropriate
Document the methodology in the module docstring

pip install -e ".[dev]"
pytest

Evaluation modules should:

Use controlled pairing: Every test has a matched control to isolate variables
Include confidence intervals: All aggregate metrics use bootstrap CIs
Be robust to surface variation: Test across prompt rephrasings
Document assumptions: State what alignment property is being measured and why

Be respectful, constructive, and focused on advancing AI safety research.