Fixed bug in reset logic caused by dictionary comprehension#998
Merged
bennybp merged 2 commits intoMolSSI:mainfrom Mar 27, 2026
Merged
Fixed bug in reset logic caused by dictionary comprehension#998bennybp merged 2 commits intoMolSSI:mainfrom
bennybp merged 2 commits intoMolSSI:mainfrom
Conversation
Contributor
|
Thanks for finding this! I had some users with similar errors but I couldn't quite find the error. This looks to be it, though. I'm largely ok with merging as-is, unless you have something more to add. I do have on my todo list to clean up the reset logic to make it more standalone and testable by not having it take ORM but plain python objects. When that happens, the test you wrote will become much simpler. But that is a ways in the future. |
Contributor
Author
|
Yeah no problem, I don't have anything else to add on my end! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Our server had auto_reset for failed jobs set to a maximum of 5 retries for all error types. I discovered that qcfractal was continuously resetting my failed jobs (I eventually manually canceled them after they were attempted >1600 times). This was occurring because of line 56 in ~/qcfractal/qcfractal/components/tasks/reset_logic.py :
error_counts = {error_map.get(k, "unknown_error"): v for k, v in error_counts.items()}
This dictionary comprehension overwrites the count when there are multiple error types mapped to a single general error category. The unique_errors set defined on line 48 was correctly accumulating errors, but since set ordering is deterministic within a python process, the comprehension discarded all but one of the error types consistently, each time should_reset was called. In my case I had 1 BadStateException and 1616 TooManyJobFailuresError. While all errors were being captured, only the BadStateException would have triggered should_reset to halt.
I added a fix and a testcase.
Status