-
Notifications
You must be signed in to change notification settings - Fork 756
Open
Labels
affects-8.5This bug affects the 8.5.x(LTS) versions.This bug affects the 8.5.x(LTS) versions.severity/majortype/bugThe issue is confirmed as a bug.The issue is confirmed as a bug.
Description
Problem
When a keyspace group has multiple TSO nodes(>2), client and server behavior is not robust if a request reaches a node that has watched the keyspace group metadata but is not currently serving that group's allocator.
This can lead to:
FindGroupByKeyspaceIDreturning an error instead of usable keyspace group metadata- TSO client discovery failing to retry through another valid node
- Health checks reporting internal errors when the allocator is absent on the current node
- split-transition paths touching allocator state without guarding nil allocators
What is expected?
For multi-node TSO deployments:
- non-serving nodes should still return enough keyspace group metadata for clients to continue discovery
- clients should retry against another TSO node when the first node is not serving the allocator
- health checks should distinguish "allocator not found" from internal failures
- membership update logic should be safe when allocators are temporarily absent
Reproduction
- Start a TSO cluster with 3 nodes.
- Create a keyspace group with only a subset of nodes as members.
- Send
FindGroupByKeyspaceIDor client discovery traffic to a non-member TSO node. - Observe discovery and health behavior.
Proposal
- treat
ErrGetAllocatoras a non-fatal "not served here" signal - keep returning keyspace group metadata to the client
- retry client discovery through another TSO node
- return 404 from health API when allocator is absent
- add integration coverage for multi-node behavior and split flow
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
affects-8.5This bug affects the 8.5.x(LTS) versions.This bug affects the 8.5.x(LTS) versions.severity/majortype/bugThe issue is confirmed as a bug.The issue is confirmed as a bug.