evaluation-engineering

Here are 2 public repositories matching this topic...

Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild

Evaluation-first AI case study for evolving retry/backoff policies with local LLMs, strict QA gates, and holdout validation.

Add a description, image, and links to the evaluation-engineering topic page so that developers can more easily learn about it.

To associate your repository with the evaluation-engineering topic, visit your repo's landing page and select "manage topics."