Skip to content

Which version of swebench is using by sb-cli? #14

@daa233

Description

@daa233

I run SWE-Bench Multimodal both by OpenHands and sb-cli. However, I got diffrent results:

  • By OpenHands eval_infer.py, the final result is 25 / 94. (26.60%)
  • By sb-cli submit according to here, the final result is 14 / 94. (14.89%)

The differences are also described by this issue OpenHands/OpenHands#10452

Could you please tell the which version of swebench is using by sb-cli?

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions