Miscellaneous data race fixes detected by test runs with -race option#460
Draft
booxter wants to merge 4 commits intoovn-kubernetes:mainfrom
Draft
Miscellaneous data race fixes detected by test runs with -race option#460booxter wants to merge 4 commits intoovn-kubernetes:mainfrom
booxter wants to merge 4 commits intoovn-kubernetes:mainfrom
Conversation
booxter
commented
Feb 6, 2026
- client: fix send on closed channel panic during disconnect
- client: don't close trafficSeen channel on disconnect
- client: start handleDisconnectNotification after all WaitGroup Add() calls
- cache: add read locks to Mapper() and DatabaseModel()
When handleDisconnectNotification() closes the trafficSeen channel,
there's a race window where transact() may attempt to send on the
already-closed channel, causing a panic:
goroutine 1 (transact): goroutine 2 (handleDisconnectNotification):
if trafficSeen != nil {
close(trafficSeen)
trafficSeen <- struct{}{} // PANIC: send on closed channel
}
This was observed in CI as a coredump during e2e tests when the
services controller was syncing a service while the OVSDB connection
was being closed.
Fix this by adding a disconnected flag (protected by a mutex) that is
set before closing the trafficSeen channel. The transact() function
checks this flag before attempting to send on the channel.
This follows the same pattern as the isShutdown() mechanism proposed in:
ovn-kubernetes#457
A test using a TCP proxy to simulate abrupt network disconnections is
included to verify the fix.
Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com>
Assisted-by: opus (claude-opus-4-5-20251101)
Instead of closing trafficSeen channel during disconnect, rely on the
disconnected flag to stop sends and let handleInactivityProbes() exit
via stopCh (which is closed first).
Closing the channel races with concurrent sends in transact(), even
with the disconnected flag check, because the check and send are not
atomic:
WARNING: DATA RACE
Write at 0x00c0000a6160 by goroutine 7312:
runtime.recvDirect()
client.(*ovsdbClient).handleDisconnectNotification()
client/client.go:1255 <-- close(o.trafficSeen)
Previous read at 0x00c0000a6160 by goroutine 7315:
runtime.chansend1()
client.(*ovsdbClient).transact()
client/client.go:861 <-- o.trafficSeen <- struct{}{}
The channel doesn't need to be explicitly closed:
- handleInactivityProbes() watches stopCh which is closed first
- The channel will be garbage collected when no longer referenced
- A new channel is created on reconnection anyway
Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com>
Assisted-by: opus (claude-opus-4-5-20251101)
…calls
Move the goroutine spawn for handleDisconnectNotification() to after
all handlerShutdown.Add() calls complete. This fixes a race between
Add() and Wait() on the same WaitGroup during reconnection.
The race occurs because handleDisconnectNotification() calls Wait()
on handlerShutdown, but it was spawned before all Add() calls:
WARNING: DATA RACE
Write at 0x00c00058e158 by goroutine 14049:
client.(*ovsdbClient).handleDisconnectNotification()
client/client.go:1256 <-- o.handlerShutdown.Wait()
Previous read at 0x00c00058e158 by goroutine 13633:
client.(*ovsdbClient).connect()
client/client.go:317 <-- o.handlerShutdown.Add(1)
Per sync.WaitGroup documentation: "calls with a positive delta that
occur when the counter is zero must happen before a Wait."
Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com>
Assisted-by: opus (claude-opus-4-5-20251101)
Add RLock protection to Mapper() and DatabaseModel() methods to prevent
races with Purge() which writes to t.dbModel under the write lock.
Without the lock, concurrent access during reconnection causes a race:
WARNING: DATA RACE
Read at 0x00c0006c95d0 by goroutine 1279:
cache.(*TableCache).DatabaseModel()
cache/cache.go:886 <-- return t.dbModel (no lock)
Previous write at 0x00c0006c95d0 by goroutine 1277:
cache.(*TableCache).Purge()
cache/cache.go:1017 <-- t.dbModel = dbModel (with lock)
Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com>
Assisted-by: opus (claude-opus-4-5-20251101)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.