Skip to content

Fixed bug in reset logic caused by dictionary comprehension#998

Merged
bennybp merged 2 commits intoMolSSI:mainfrom
vlita:retry
Mar 27, 2026
Merged

Fixed bug in reset logic caused by dictionary comprehension#998
bennybp merged 2 commits intoMolSSI:mainfrom
vlita:retry

Conversation

@vlita
Copy link
Copy Markdown
Contributor

@vlita vlita commented Mar 27, 2026

Description

Our server had auto_reset for failed jobs set to a maximum of 5 retries for all error types. I discovered that qcfractal was continuously resetting my failed jobs (I eventually manually canceled them after they were attempted >1600 times). This was occurring because of line 56 in ~/qcfractal/qcfractal/components/tasks/reset_logic.py :

error_counts = {error_map.get(k, "unknown_error"): v for k, v in error_counts.items()}

This dictionary comprehension overwrites the count when there are multiple error types mapped to a single general error category. The unique_errors set defined on line 48 was correctly accumulating errors, but since set ordering is deterministic within a python process, the comprehension discarded all but one of the error types consistently, each time should_reset was called. In my case I had 1 BadStateException and 1616 TooManyJobFailuresError. While all errors were being captured, only the BadStateException would have triggered should_reset to halt.

I added a fix and a testcase.

Status

  • [ x ] fixed bug
  • [ x ] added representative testcase
  • [ x ] bug fails test, fix passes test
  • [ ] ran full test suite

@bennybp
Copy link
Copy Markdown
Contributor

bennybp commented Mar 27, 2026

Thanks for finding this! I had some users with similar errors but I couldn't quite find the error. This looks to be it, though.

I'm largely ok with merging as-is, unless you have something more to add.

I do have on my todo list to clean up the reset logic to make it more standalone and testable by not having it take ORM but plain python objects. When that happens, the test you wrote will become much simpler. But that is a ways in the future.

@vlita
Copy link
Copy Markdown
Contributor Author

vlita commented Mar 27, 2026

Yeah no problem, I don't have anything else to add on my end!

@bennybp bennybp merged commit 549eed7 into MolSSI:main Mar 27, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants