[WIP] Wait to reenqueue job post-interrupt until after callbacks are finished#66
Draft
adrianna-chang-shopify wants to merge 2 commits intomainfrom
Draft
[WIP] Wait to reenqueue job post-interrupt until after callbacks are finished#66adrianna-chang-shopify wants to merge 2 commits intomainfrom
adrianna-chang-shopify wants to merge 2 commits intomainfrom
Conversation
4897d42 to
5340d0f
Compare
When Sidekiq is being used as the queue adapter, it is possible for job-iteration to reenqueue the job after an interruption prior to the callbacks for the original job completing. This can create race conditions if shutdown or after_perform callbacks touch data that the new job will also touch. This can be solved by waiting until after all the callbacks on the original job are complete to reenqueue the iteration job. Rather than reenqueuing the job from within #iterate_with_enumerator, we can prepend an after_perform callback (which ensures the callback runs last) when JobIteration::Iteration is included and reenqueue the job there.
5340d0f to
bf83338
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[WIP] Test suite on Core is failing, need to investigate
Problem
When Sidekiq is being used as the queue adapter, it is possible for
job-iterationto reenqueue the job after an interruption prior to the callbacks for the original job completing. This can create race conditions ifon_shutdownor{after|around}_performcallbacks touch data that the new job will also touch.This can be demonstrated by keeping the new test
"allows callbacks to finish before reenqueuing job after interrupt"but removing the rest of the changes. The callback order becomes[["before_enqueue"], ["before_enqueue"], ["on_shutdown"]], which is not what we want.Solution
This can be solved by waiting until after all the callbacks on the original job are complete to reenqueue the iteration job. Rather than reenqueuing the job from within
#iterate_with_enumerator, we can prepend anafter_performcallback (which ensures the callback runs last) whenJobIteration::Iterationis included and reenqueue the job there.More Context
We're leveraging the
job-iterationgem in the new Maintenance Tasks Framework, and we use aRunobject to keep track of a maintenance task's progress under the hood. We encountered a race condition which lead to an invalid state and raised an exception. This happened because theon_shutdowncallback:https://github.com/Shopify/maintenance_tasks/blob/fd096a5d2bcbd8340370b950c927076296ece245/app/jobs/maintenance_tasks/task_job.rb#L88-L98
occurred after the new job had already gone through the
before_performhook:https://github.com/Shopify/maintenance_tasks/blob/fd096a5d2bcbd8340370b950c927076296ece245/app/jobs/maintenance_tasks/task_job.rb#L64-L77
and started performing, which led the
@runobject to be in an invalid state.Considerations
I don't think there are any unintended consequences to delaying the reenqueue - the only incompatibility would be if users were depending on
reenqueue_iteration_jobbeing called prior to their callbacks finishing and making use of one of the variables (ie.self.already_in_queueor@retried). I've double checked inShopify/shopifyand it doesn't seem that this is the case.Suggestions for a better way to test the order of the callbacks are also welcome - matching on output sent to
$STDOUTin the callbacks seemed like a reasonable and simple way to do it, rather than fiddling around with class variables or something similar.Any other things I'm missing?