Skip to content

Counterfactual testing outside of loopback #275

@rolyp

Description

@rolyp

As we discussed for the ICLR submission, the counterfactual tests are part of our evaluation of the LLMs' performance, not part of the proposed Interpretation Assistant architecture (since counterfactual tests aren't in general available). The current implementation treats counterfactual test failures like the other loopback errors, which corrupts the success rate reporting.

We need to (conceptually, not necessarily "algorithmically") do the counterfactual testing separately and then revisit RQ2.

  • Fail validation if any testing-variables not found in dataset
  • Fail validation if any counterfactual variable fails to generate a counterfactual output under the gold solution
  • Report on proportion of tests with counterfactual tests
  • Push call to evaluateExpression inside validate

Done/dropped:

  • Fix weird way to exit loop
  • Report on total number of test "problems"
  • Why logs folder still populated?

See also:

Metadata

Metadata

Assignees

Projects

Status

Proposed

Relationships

None yet

Development

No branches or pull requests

Issue actions