I created a gold dataset using the value of the 'patch' key in the swe-bench verified dataset and submitted it to sb_cli.
I ran the same file five times with different IDs,
expecting 100% resolution,
but I received 100% failures.
astropy__astropy-7606
astropy__astropy-8707
astropy__astropy-8872
django__django-10097
pylint-dev__pylint-6528
pylint-dev__pylint-7277
intermittent failures
psf__requests-1724
sympy__sympy-13091
I checked the results.
Did I miss something?