Skip to content

util: add GroupLock for two-group mutual exclusion#6183

Open
gadididi wants to merge 1 commit intoceph:develfrom
gadididi:util/add_group_locking
Open

util: add GroupLock for two-group mutual exclusion#6183
gadididi wants to merge 1 commit intoceph:develfrom
gadididi:util/add_group_locking

Conversation

@gadididi
Copy link
Contributor

@gadididi gadididi commented Mar 16, 2026

Add GroupLock type that allows operations within the same group to run concurrently while blocking operations from the other group. This wiil be in used to coordinate NVMe-oF Stage and Unstage operations - multiple Stage calls can run in parallel,
multiple Unstage calls can run in parallel, but Stage and Unstage block each other.

This can be in used in any place which needs lock between 2 types of group.

The implementation is based on Keane and Moir's "group mutual exclusion algorithm", but simplified for exactly two groups using Golang condition variables.
Operations in group A wait if any group B operations are active, and vice versa.
When the last operation in a group completes, all waiting operations in the other group are woken up together.

Also added some test, simulate parallel workers:

  • Multiple same-group operations run concurrently
  • Different groups block each other
  • No deadlock under heavy contention
  • Mutual exclusion is never violated
  • All waiting operations start together when unblocked

Related PR

#6163 - once this phase1 PR will be merged, the next phase is adding group lock for connect\ disconnect OP.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@gadididi gadididi requested a review from nixpanic March 16, 2026 13:36
@gadididi gadididi self-assigned this Mar 16, 2026
@gadididi gadididi added the component/util Utility functions shared between CephFS and RBD label Mar 16, 2026
@gadididi gadididi force-pushed the util/add_group_locking branch from d07791d to 667953c Compare March 16, 2026 14:58
@gadididi gadididi requested review from Madhu-1 and Rakshith-R March 16, 2026 14:58
Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like what we can use.

I wonder if this is over-kill for the rare cases this would be an issue? Is there an alternative that would be simpler?

In theory this could cause blocking for long time. If one group (say NodeStageVolume) gets called often, in quick succession, a NodeUnstageVolume could be blocked until all NodeStageVolume calls are handled. I don't think that will be problematic for our use now, but it may be something to document so other users are warned?

@gadididi
Copy link
Contributor Author

gadididi commented Mar 17, 2026

@nixpanic , Hi!

Is there an alternative that would be simpler?

I searched for Group Mutual Exclusion util in Go, but I did not find..

In theory this could cause blocking for long time

right. I can make it more fair with kind of queue (like in the origin algorithm here: https://dl.acm.org/doi/10.1145/301308.301319) .
In this way for sure we will not have any starvation. the big Q is it really common hundreds of pods can be created together?
I can document it.
I will figure out how difficult\over-kill it will be to add the queue.

UPDATE ABOUT THE QUEUE:

adding fairness (like a FIFO queue) would make the code significantly more complex - we'd need to manage a wait queue, allocate queue entries, and handle wakeup ordering.
I think also Go's own sync.RWMutex doesn't guarantee fairness, writers can starve readers and vice versa..

I'll add a comment documenting this limitation so future users are aware.
something like:

// GroupLock does not guarantee fairness. Under heavy load from one group,
// the other group may experience temporary delays.

what do you think?


In our case, when few pods are created\deleted in same time I think there is big chance for:

time | description

T1 - NodeStageVolume() for podA is called- try to run nvme connect but see that is already connected - ✅
T2 - NodeUnStageVolume() for podB is called- it sees no more mounted paths exists , so it runs nvme disconnect - ✅
T3 - NodeStageVolume() for podA then continue to run, it tries to mount the staging path - but failed- the nvme device does not exist anymore. - ❌

@gadididi gadididi requested a review from nixpanic March 17, 2026 08:08
@nixpanic
Copy link
Member

There is no need to implement fairness/queuing. I do not expect significant issues for the intended use-case (NVMe-oF connect/disconnect). The number of operations that will get blocked should be minimal as (un)staging volumes is not done in a continuous stream anyway.

A note like the proposed

// GroupLock does not guarantee fairness. Under heavy load from one group,
// the other group may experience temporary delays.

would be very good to have as a reminder though.

Add GroupLock type that allows operations within
the same group to run concurrently while blocking operations
from the other group. This wiil be in used to coordinate
NVMe-oF Stage and Unstage operations - multiple
Stage calls can run in parallel,
multiple Unstage calls can run in parallel,
but Stage and Unstage block each other.

This can be in used in any place which needs lock between
2 types of group.

The implementation is based on Keane and Moir's
"group mutual exclusion algorithm",
but simplified for exactly two groups using Golang condition
variables.
Operations in group A wait if any group B operations are
active, and vice versa.
When the last operation in a group completes,all
waiting operations in the other group are woken up together.

Also added some test, simulate parallel workers:
- Multiple same-group operations run concurrently
- Different groups block each other
- No deadlock under heavy contention
- Mutual exclusion is never violated
- All waiting operations start together when unblocked

Signed-off-by: gadi-didi <[email protected]>
@gadididi gadididi force-pushed the util/add_group_locking branch from 667953c to 5574e4f Compare March 17, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/util Utility functions shared between CephFS and RBD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants