Miscellaneous data race fixes detected by test runs with -race option by booxter · Pull Request #460 · ovn-kubernetes/libovsdb

booxter · 2026-02-06T00:24:50Z

client: fix send on closed channel panic during disconnect
client: don't close trafficSeen channel on disconnect
client: start handleDisconnectNotification after all WaitGroup Add() calls
cache: add read locks to Mapper() and DatabaseModel()

When handleDisconnectNotification() closes the trafficSeen channel, there's a race window where transact() may attempt to send on the already-closed channel, causing a panic: goroutine 1 (transact): goroutine 2 (handleDisconnectNotification): if trafficSeen != nil { close(trafficSeen) trafficSeen <- struct{}{} // PANIC: send on closed channel } This was observed in CI as a coredump during e2e tests when the services controller was syncing a service while the OVSDB connection was being closed. Fix this by adding a disconnected flag (protected by a mutex) that is set before closing the trafficSeen channel. The transact() function checks this flag before attempting to send on the channel. This follows the same pattern as the isShutdown() mechanism proposed in: ovn-kubernetes#457 A test using a TCP proxy to simulate abrupt network disconnections is included to verify the fix. Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com> Assisted-by: opus (claude-opus-4-5-20251101)

Instead of closing trafficSeen channel during disconnect, rely on the disconnected flag to stop sends and let handleInactivityProbes() exit via stopCh (which is closed first). Closing the channel races with concurrent sends in transact(), even with the disconnected flag check, because the check and send are not atomic: WARNING: DATA RACE Write at 0x00c0000a6160 by goroutine 7312: runtime.recvDirect() client.(*ovsdbClient).handleDisconnectNotification() client/client.go:1255 <-- close(o.trafficSeen) Previous read at 0x00c0000a6160 by goroutine 7315: runtime.chansend1() client.(*ovsdbClient).transact() client/client.go:861 <-- o.trafficSeen <- struct{}{} The channel doesn't need to be explicitly closed: - handleInactivityProbes() watches stopCh which is closed first - The channel will be garbage collected when no longer referenced - A new channel is created on reconnection anyway Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com> Assisted-by: opus (claude-opus-4-5-20251101)

…calls Move the goroutine spawn for handleDisconnectNotification() to after all handlerShutdown.Add() calls complete. This fixes a race between Add() and Wait() on the same WaitGroup during reconnection. The race occurs because handleDisconnectNotification() calls Wait() on handlerShutdown, but it was spawned before all Add() calls: WARNING: DATA RACE Write at 0x00c00058e158 by goroutine 14049: client.(*ovsdbClient).handleDisconnectNotification() client/client.go:1256 <-- o.handlerShutdown.Wait() Previous read at 0x00c00058e158 by goroutine 13633: client.(*ovsdbClient).connect() client/client.go:317 <-- o.handlerShutdown.Add(1) Per sync.WaitGroup documentation: "calls with a positive delta that occur when the counter is zero must happen before a Wait." Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com> Assisted-by: opus (claude-opus-4-5-20251101)

Add RLock protection to Mapper() and DatabaseModel() methods to prevent races with Purge() which writes to t.dbModel under the write lock. Without the lock, concurrent access during reconnection causes a race: WARNING: DATA RACE Read at 0x00c0006c95d0 by goroutine 1279: cache.(*TableCache).DatabaseModel() cache/cache.go:886 <-- return t.dbModel (no lock) Previous write at 0x00c0006c95d0 by goroutine 1277: cache.(*TableCache).Purge() cache/cache.go:1017 <-- t.dbModel = dbModel (with lock) Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com> Assisted-by: opus (claude-opus-4-5-20251101)

booxter added 4 commits February 5, 2026 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscellaneous data race fixes detected by test runs with -race option#460

Miscellaneous data race fixes detected by test runs with -race option#460
booxter wants to merge 4 commits intoovn-kubernetes:mainfrom
booxter:data-races-fixes

booxter commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

booxter commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant