drenv: Add dependency-based addon scheduler#2412
Draft
nirs wants to merge 2 commits intoRamenDR:mainfrom
Draft
Conversation
c1d7570 to
17af995
Compare
When using per-cluster schedulers, addons on one cluster may depend on addons completing on another cluster (e.g. ocm-cluster on dr1 requires ocm-hub on hub). The PubSub class provides a simple notification mechanism for this: schedulers subscribe to external dependency keys, and when an addon completes, it posts a notification to all subscribers. Callbacks are invoked outside the lock, so subscribers can use queue.put() to wake up a blocking watcher thread without holding the lock. Assisted-by: Cursor/Claude Opus 4.6 Signed-off-by: Nir Soffer <[email protected]>
Add a simple task scheduler that runs tasks based on dependencies.
Tasks with no unmet dependencies run in parallel, up to a configurable
concurrency limit (max_workers). When a task completes, newly unblocked
tasks are scheduled automatically.
The scheduler is generic - it does not know about addons or clusters.
It works with Task objects that carry a Key (context + name), a list
of dependency keys, and opaque data passed to the run function.
Tasks from different contexts (e.g. hub, dr1) can be mixed in the
same task list and depend on each other. Per-cluster separation and
cross-cluster notifications are handled externally using pubsub.
The key types:
Key(context, name) - unique task identifier, context used for
log message prefixes
Task(key, requires, data) - unit of work with dependencies
Internal state uses three stages: pending (ordered dict), running
(mapped by future), completed (set of keys). The _validate method
builds the pending map and checks for empty input, unknown
dependencies, and cycles.
Assisted-by: Cursor/Claude Opus 4.6
Signed-off-by: Nir Soffer <[email protected]>
17af995 to
e4f8f66
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dependency-based addon scheduler
Replace the rigid worker-group model with dependency-based scheduling.
Each cluster gets its own scheduler that runs addons in parallel based
on their dependencies, with cross-cluster notifications handled by a
simple pubsub mechanism. This eliminates wasted worker slots, enables
accurate timeouts, and lets addons start as soon as their dependencies
are satisfied.
PubSub
Simple thread-safe publish-subscribe for decoupled notifications.
Subscribers register callbacks for specific keys, and publishers post
keys to notify all subscribers. Used to signal cross-cluster addon
completion without schedulers knowing about each other.
Scheduler
Runs tasks based on dependencies with configurable concurrency. Tasks
with no unmet dependencies run in parallel up to
max_workers. When atask completes, newly unblocked tasks are scheduled automatically.
Tasks carry a
Key(context, name)for identification and logging, aset of dependency keys, and opaque data passed to the run function.
Validation catches empty input, unknown dependencies, and cycles at
construction time.
How they work together
Each cluster gets its own scheduler with
max_workersmatching thecluster's capacity. Cross-cluster dependencies (e.g. dr1/ocm-cluster
requires hub/ocm-hub) are handled by pubsub:
and subscribe to the relevant pubsub keys.
the key via their queue.
picks up the newly ready task.
This keeps each scheduler simple -- it just checks
_completedforall deps. External deps arrive via notifications instead of local
task completion, but the scheduler doesn't know the difference.
Example: waiting for a cluster
Currently, addons like
ocm-clusterandsubmarinercallcluster.wait_until_ready()to poll another cluster's status,occupying a worker slot while sleeping. With the scheduler, this
becomes a dependency on a "ready" key posted when cluster startup
completes:
The scheduler won't start
dr1/ocm-clusteruntilhub/readyisposted via pubsub. No polling, no wasted slots -- the task stays
pending until the hub cluster is actually ready.
Integration with current code
The current model uses nested ThreadPoolExecutors with rigid worker
groups. The main executor starts clusters in parallel, each cluster
runs its own per-cluster executor with addon groups.
The new model replaces per-cluster executors with schedulers:
This preserves the executor-per-cluster model while replacing rigid
worker groups with dependency-based scheduling.
Next steps
contextparameter to scheduler for detecting external depsand simplifying validation and logging.
are satisfied via pubsub notifications.
list of addons per profile.
__main__.py.rbd-mirror,volsync) to the hub scheduler.They depend on addons from other clusters and start as soon as their
dependencies are satisfied via pubsub.
do_stop(reverse dependency order).