Skip to content

Comments

client: fix race condition in Echo() function#458

Open
booxter wants to merge 1 commit intoovn-kubernetes:mainfrom
booxter:fix-trafficSeen-race
Open

client: fix race condition in Echo() function#458
booxter wants to merge 1 commit intoovn-kubernetes:mainfrom
booxter:fix-trafficSeen-race

Conversation

@booxter
Copy link

@booxter booxter commented Feb 5, 2026

The Echo() function had a race condition where it would fall through
and access the reply variable after CallWithContext() returned an error
(other than ErrShutdown). This caused a data race between the calling
goroutine reading reply in reflect.DeepEqual() and the RPC readLoop
goroutine potentially still writing to it during JSON unmarshaling.

The fix adds the missing error return, matching the pattern used in
all other RPC call sites (transact, getSchema, listDbs, etc).

Race detector output before fix:

WARNING: DATA RACE
Write at 0x00c000474930 by goroutine 438:
  reflect.Value.grow()
      /nix/store/60z37432vmgkg54krwr1z057bqwp7583-go-1.25.5/share/go/src/reflect/value.go:2677 +0x151
  reflect.Value.Grow()
      /nix/store/60z37432vmgkg54krwr1z057bqwp7583-go-1.25.5/share/go/src/reflect/value.go:2664 +0x7c
  encoding/json.(*decodeState).array()
      /nix/store/60z37432vmgkg54krwr1z057bqwp7583-go-1.25.5/share/go/src/encoding/json/decode.go:552 +0x784
  encoding/json.(*decodeState).value()
      /nix/store/60z37432vmgkg54krwr1z057bqwp7583-go-1.25.5/share/go/src/encoding/json/decode.go:370 +0x10d
  encoding/json.(*decodeState).unmarshal()
      /nix/store/60z37432vmgkg54krwr1z057bqwp7583-go-1.25.5/share/go/src/encoding/json/decode.go:183 +0x284
  encoding/json.Unmarshal()
      /nix/store/60z37432vmgkg54krwr1z057bqwp7583-go-1.25.5/share/go/src/encoding/json/decode.go:113 +0x224
  github.com/cenkalti/rpc2/jsonrpc.(*jsonCodec).ReadResponseBody()
      /home/ihrachyshka/go/pkg/mod/github.com/cenkalti/rpc2@v1.0.4/jsonrpc/jsonrpc.go:174 +0x7b
  github.com/cenkalti/rpc2.(*Client).readResponse()
      /home/ihrachyshka/go/pkg/mod/github.com/cenkalti/rpc2@v1.0.4/client.go:223 +0x395
  github.com/cenkalti/rpc2.(*Client).readLoop()
      /home/ihrachyshka/go/pkg/mod/github.com/cenkalti/rpc2@v1.0.4/client.go:99 +0x2d9
  github.com/cenkalti/rpc2.(*Client).Run()
      /home/ihrachyshka/go/pkg/mod/github.com/cenkalti/rpc2@v1.0.4/client.go:63 +0x33
  github.com/ovn-kubernetes/libovsdb/client.(*ovsdbClient).createRPC2Client.gowrap1()
      /home/ihrachyshka/src/libovsdb/client/client.go:446 +0x17

Previous read at 0x00c000474930 by goroutine 132:
  github.com/ovn-kubernetes/libovsdb/client.(*ovsdbClient).Echo()
      /home/ihrachyshka/src/libovsdb/client/client.go:1095 +0x279
  github.com/ovn-kubernetes/libovsdb/client.TestEchoRace.func1()
      /home/ihrachyshka/src/libovsdb/client/echo_concurrency_test.go:51 +0xd0

The race is reliably reproduced by the new TestEchoRace test which
runs concurrent Echo() calls while triggering disconnects/reconnects.

@booxter booxter force-pushed the fix-trafficSeen-race branch from 905b89f to 7764aa5 Compare February 5, 2026 21:01
The Echo() function had a race condition where it would fall through
and access the reply variable after CallWithContext() returned an error
(other than ErrShutdown). This caused a data race between the calling
goroutine reading reply in reflect.DeepEqual() and the RPC readLoop
goroutine potentially still writing to it during JSON unmarshaling.

The fix adds the missing error return, matching the pattern used in
all other RPC call sites (transact, getSchema, listDbs, etc).

Race detector output before fix:
==================
WARNING: DATA RACE
Write at reflect.Value.grow() during json.Unmarshal()
  github.com/cenkalti/rpc2.(*Client).readLoop()

Previous read at:
  github.com/ovn-kubernetes/libovsdb/client.(*ovsdbClient).Echo()
  if !reflect.DeepEqual(args, reply) {  // Line 1095
==================

The race is reliably reproduced by the new TestEchoRace test which
runs concurrent Echo() calls while triggering disconnects/reconnects.

Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com>
Assisted-by: opus (claude-opus-4-5-20251101)
@booxter booxter force-pushed the fix-trafficSeen-race branch from 7764aa5 to 6c8c8d4 Compare February 5, 2026 21:07
@booxter booxter marked this pull request as ready for review February 5, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant