Skip to content

Latest commit

 

History

History
46 lines (31 loc) · 1.37 KB

File metadata and controls

46 lines (31 loc) · 1.37 KB

Contributing to alignment-evals

We welcome contributions from the AI safety research community.

How to Contribute

Adding a New Evaluation Module

  1. Create a new file in alignment_evals/evals/ that subclasses BaseEval
  2. Implement generate_prompts(), score_response(), and optionally override aggregate()
  3. Add corresponding test cases in tests/
  4. Add the eval to the AlignmentSuite if appropriate
  5. Document the methodology in the module docstring

Adding New Datasets

  1. Create JSONL files in alignment_evals/datasets/
  2. Each line should be a valid JSON object with at minimum a "prompt" field
  3. Include metadata about the dataset source and construction methodology

Code Standards

  • Format with ruff
  • Type hints on all public functions
  • Docstrings on all modules and public classes
  • Tests for all new evaluation logic

Running Tests

pip install -e ".[dev]"
pytest

Methodology Guidelines

Evaluation modules should:

  • Use controlled pairing: Every test has a matched control to isolate variables
  • Include confidence intervals: All aggregate metrics use bootstrap CIs
  • Be robust to surface variation: Test across prompt rephrasings
  • Document assumptions: State what alignment property is being measured and why

Code of Conduct

Be respectful, constructive, and focused on advancing AI safety research.