Actions: EleutherAI/lm-evaluation-harness
Actions
Showing runs from all workflows
2,500+ workflow runs
2,500+ workflow runs
tool_calls and reasoning: Tracking and evaluation
Tasks Modified
#6203:
Pull request #3685
synchronize
by
RawthiL
tool_calls and reasoning: Tracking and evaluation
Unit Tests
#6176:
Pull request #3685
synchronize
by
RawthiL
answer-not-found and invalid-filter tracking
Tasks Modified
#6200:
Pull request #3709
synchronize
by
baberabb
answer-not-found and invalid-filter tracking
Unit Tests
#6173:
Pull request #3709
synchronize
by
baberabb