Skip to content

perftest: Fix handshake deadlock by adding timeouts#381

Open
SherrinZhou wants to merge 1 commit into
linux-rdma:masterfrom
SherrinZhou:fix/fix_handshake
Open

perftest: Fix handshake deadlock by adding timeouts#381
SherrinZhou wants to merge 1 commit into
linux-rdma:masterfrom
SherrinZhou:fix/fix_handshake

Conversation

@SherrinZhou
Copy link
Copy Markdown
Contributor

Currently, if one side (e.g., the server) fails during the handshake phase inside ctx_hand_shake (specifically in rdma_write_keys) and exits, the peer (client) remains blocked indefinitely.

This deadlock occurs because rdma_read_keys function uses an infinite loop to poll the CQ without a timeout. This patch prevents the deadlock by: adding a timeout mechanism to the ibv_poll_cq loop in rdma_read_keys.

This ensures that the waiting process terminates with an error if the peer fails to respond within a reasonable timeframe (default 60s).

Currently, if one side (e.g., the server) fails during the handshake
phase inside `ctx_hand_shake` (specifically in `rdma_write_keys`) and
exits, the peer (client) remains blocked indefinitely.

This deadlock occurs because `rdma_read_keys` function uses an
infinite loop to poll the CQ without a timeout. This patch prevents the
deadlock by: adding a timeout mechanism to the `ibv_poll_cq` loop in
`rdma_read_keys`.

This ensures that the waiting process terminates with an error if the
peer fails to respond within a reasonable timeframe (default 60s).

Signed-off-by: Ruizhe Zhou <zhouruizhe@resnics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant