As we discussed for the ICLR submission, the counterfactual tests are part of our evaluation of the LLMs' performance, not part of the proposed Interpretation Assistant architecture (since counterfactual tests aren't in general available). The current implementation treats counterfactual test failures like the other loopback errors, which corrupts the success rate reporting.
We need to (conceptually, not necessarily "algorithmically") do the counterfactual testing separately and then revisit RQ2.
Done/dropped:
See also:
As we discussed for the ICLR submission, the counterfactual tests are part of our evaluation of the LLMs' performance, not part of the proposed Interpretation Assistant architecture (since counterfactual tests aren't in general available). The current implementation treats counterfactual test failures like the other loopback errors, which corrupts the success rate reporting.
We need to (conceptually, not necessarily "algorithmically") do the counterfactual testing separately and then revisit RQ2.
testing-variablesnot found in datasetevaluateExpressioninsidevalidateDone/dropped:
logsfolder still populated?See also: