core: Use correct locks while running live migration of multiple disks of one VM by sermakov-orion · Pull Request #1010 · oVirt/ovirt-engine

sermakov-orion · 2025-04-22T07:42:08Z

The live disks migration may fail with an error "The VM is performing an operation on a Snapshot".
Steps to reproduce:

Create two Storage Domains.
Create a VM with multiple (10 or more) disks on one Storage Domain.
Run the VM.
Select all disks of the VM at the "Storage -> Disks" panel and press "Move" button.
Press "OK" on the "Move Disk(s)" dialog.
Some of the disks will be moved successfully. But some disks migration would fail with the "The VM is performing an operation on a Snapshot" error message:

This is caused by the #670 (https://bugzilla.redhat.com/show_bug.cgi?id=2110186) fix.
There may be two different scenarios of the LiveMigrateDiskCommand:

Generic scenario, when the live migration initiated via UI/API. In this case needed lock is created at the MoveDiskCommand.lockVmWithWait. And it should be used at the LiveMigrateDiskCommand.performNextOperation as it was before the "Bug 2110186" fix.
"ovirt-engine restarted during the live disks migration" scenario. In this case another lock (introduced by the "Bug 2110186" fix) should be used to cleanup interrupted commands.

Changes introduced with this PR

Use locks created by MoveDiskCommand.lockVmWithWait during the generic scenario
Use locks introduced in the "Bug 2110186" fix during the "ovirt-engine restarted during the live disks migration" scenario

Are you the owner of the code you are sending in, or do you have permission of the owner?

y

JasperB-TeamBlue

Looks good to me, fixes the issue provided. Tested out the API and no functionality affected to move disks there.

tb3088 · 2025-06-27T22:53:14Z

seems like a rather important fix of a very common workflow which many competing products have been doing correctly for years (vmware, hyperv, etc). How many more months have to pass to get maintainer approval? @peter-boden et. al.?

…ne VM. The live disks migration may fail with an error "The VM is performing an operation on a Snapshot". This is caused by the oVirt#670 (https://bugzilla.redhat.com/show_bug.cgi?id=2110186) fix. There may be two different scenarios of the LiveMigrateDiskCommand: 1. Generic scenario, when the live migration initiated via UI/API. In this case needed lock is created at the MoveDiskCommand.lockVmWithWait. And it should be used at the LiveMigrateDiskCommand.performNextOperation as it was before the "Bug 2110186" fix. 2. "ovirt-engine restarted during the live disks migration" scenario. In this case another lock (introduced by the "Bug 2110186" fix) should be used to cleanup interrupted commands. Signed-off-by: Stepan Ermakov <sermakov@orionsoft.ru>

JasperB-TeamBlue · 2025-06-30T07:22:06Z

/ost

github-actions · 2025-06-30T07:22:37Z

⏳ Running ost suite 'basic-suite-master' on distro 'el9stream'.

Follow the progress here.

github-actions · 2025-06-30T08:34:56Z

😭💔 ost suite 'basic-suite-master' on distro 'el9stream' failed. (details)

JasperB-TeamBlue · 2025-07-01T14:31:47Z

/ost

github-actions · 2025-07-01T14:32:18Z

⏳ Running ost suite 'basic-suite-master' on distro 'el9stream'.

Follow the progress here.

github-actions · 2025-07-01T15:18:04Z

😭💔 ost suite 'basic-suite-master' on distro 'el9stream' failed. (details)

sermakov-orion · 2025-07-02T08:39:28Z

Hi @JasperB-TeamBlue
The OST failed couple times here. But as I can see, the failure is not related to my changes. It is because the podman was not able to download the selenium browser image. What can we do to resolve the issue?

dupondje

Hi,

Sorry for the long waiting time, but I lack some time to review PR's.
I've did a more deep-dive into the code, and I don't think we have the correct solution for the issue here.

Let me explain:
createEngineLockForSnapshotRemove()
Creates 2 locks:

LockingGroup.DISK for the disk, which is disk specific. So this always needs to be created for that particular disk. As this prevents from doing other actions on that disk during the snapshot removal.
LockingGroup.VM -> A shared! lock on the VM, to well, lock the VM :)

The lockVmWithWait in MoveDisk creates a LockingGroup.LIVE_STORAGE_MIGRATION lock. Which is again an exclusive lock.

I think now (should validate in a test scenario), the LIVE_STORAGE_MIGRATION lock is exclusive, causing (at least the snapshots), to be created/removed one by one, Which causes you to 'fix' the issue.
Can you check the logs for the Failed to acquire VM lock, will retry on the next polling cycle message during your test?

The resolution can I think be either also aquire the getLock() (which is the LIVE_STORAGE_MIGRATION lock on the VM), so having 3 locks.

Or allow the code to create multiple snapshots concurrently if the disks doesn't overlap (which I think will be the hard way :))

Thanks
Jean-Louis

dupondje · 2026-03-09T10:24:59Z

@sermakov-orion : Any follow-up?

sermakov-orion requested review from ahadas and bennyz as code owners April 22, 2025 07:42

JasperB-TeamBlue assigned sermakov-orion Apr 22, 2025

JasperB-TeamBlue requested review from JasperB-TeamBlue, dupondje, peter-boden and sandrobonazzola and removed request for ahadas and bennyz April 22, 2025 14:47

JasperB-TeamBlue approved these changes Apr 25, 2025

View reviewed changes

0ffer approved these changes Apr 28, 2025

View reviewed changes

sermakov-orion changed the title ~~Use correct locks while running live migration of multiple disks of o…~~ core: Use correct locks while running live migration of multiple disks of o… Jun 4, 2025

sermakov-orion changed the title ~~core: Use correct locks while running live migration of multiple disks of o…~~ core: Use correct locks while running live migration of multiple disks of one VM Jun 4, 2025

sandrobonazzola force-pushed the bugfix/multiple-disks-live-migration branch from 3b514a5 to fcb1779 Compare June 30, 2025 06:00

dupondje requested changes Jul 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: Use correct locks while running live migration of multiple disks of one VM#1010

core: Use correct locks while running live migration of multiple disks of one VM#1010
sermakov-orion wants to merge 1 commit intooVirt:masterfrom
sermakov-orion:bugfix/multiple-disks-live-migration

sermakov-orion commented Apr 22, 2025 •

edited

Loading

Uh oh!

JasperB-TeamBlue left a comment

Uh oh!

tb3088 commented Jun 27, 2025

Uh oh!

JasperB-TeamBlue commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

JasperB-TeamBlue commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

sermakov-orion commented Jul 2, 2025 •

edited

Loading

Uh oh!

dupondje left a comment

Uh oh!

dupondje commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sermakov-orion commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes introduced with this PR

Are you the owner of the code you are sending in, or do you have permission of the owner?

Uh oh!

JasperB-TeamBlue left a comment

Choose a reason for hiding this comment

Uh oh!

tb3088 commented Jun 27, 2025

Uh oh!

JasperB-TeamBlue commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

JasperB-TeamBlue commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

sermakov-orion commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dupondje left a comment

Choose a reason for hiding this comment

Uh oh!

dupondje commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sermakov-orion commented Apr 22, 2025 •

edited

Loading

sermakov-orion commented Jul 2, 2025 •

edited

Loading