issue: 4946495 Merge Ultra API gtests by categories#592
issue: 4946495 Merge Ultra API gtests by categories#592pasis wants to merge 1 commit intoMellanox:vNextfrom
Conversation
Greptile SummaryThis PR consolidates four test files into two by merging Key observations:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph ultra_api_socket_send_receive ["ultra_api_socket_send_receive fixture (xlio_socket_send_receive.cc)"]
SU[SetUp: reset counters, mr_buf=NULL,\nmemset sndbuf with 'A']
SU --> ti1 & ti2 & ti3
subgraph ti1["ti_1: initiator sends → target receives"]
A1[fork] --> A2[child: bind/listen\nparent: connect]
A2 --> A3[socket_event_cb: ESTABLISHED\nregister MR 256B, send data_to_send]
A3 --> A4[child: socket_rx_cb\nmemcmp check]
A4 --> A5[barrier + teardown]
end
subgraph ti2["ti_2: target sends → initiator receives"]
B1[fork] --> B2[child: bind/listen\nparent: connect]
B2 --> B3[socket_accept_cb_send: ACCEPT\nregister MR 256B, send data_to_send]
B3 --> B4[parent: socket_rx_cb\nmemcmp check]
B4 --> B5[barrier + teardown]
end
subgraph ti3["ti_3: full SQ stress (XLIO_TCP_CC_ALGO=2 required)"]
C1[fork] --> C2[child: bind/listen\nparent: connect]
C2 --> C3[socket_event_cb_full_sq: ESTABLISHED\nregister MR 32B, send 10 initial msgs,\nset send_attr.mkey]
C3 --> C4[parent: fill SQ with 16384 × 32B WQEs]
C4 --> C5[wait for completions\nsocket_rx_cb_nocheck]
C5 --> C6[barrier + teardown]
end
end
subgraph migrate ["ultra_api_socket_migrate fixture (xlio_socket_migrate.cc)"]
SM[SetUp: reset all counters,\ndo_migrate=false]
SM --> m1 & m2
subgraph m1["ti_1: immediate migrate"]
D1[fork → connect] --> D2[first rx triggers do_migrate\ndetach+attach to group_2 immediately]
end
subgraph m2["ti_2: delayed migrate (10s timer)"]
E1[fork → connect] --> E2[first rx triggers do_migrate\ndetach, then attach after 10s]
end
end
Reviews (2): Last reviewed commit: "issue: 4946495 Merge Ultra API gtests by..." | Re-trigger Greptile |
|
|
||
| # Full SQ completion test - Needs XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1 to fill SQ deterministically without handling initially small congestion window and delayed packets. | ||
| eval "${sudo_cmd} $timeout_exe env GTEST_TAP=2 LD_PRELOAD=$gtest_lib XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1 $gtest_app $gtest_opt --gtest_filter=ultra_api_socket_send_receive_full_sq* --gtest_output=xml:${WORKSPACE}/${prefix}/test-xlio_ultra_api_full_sq_completion.xml" | ||
| eval "${sudo_cmd} $timeout_exe env GTEST_TAP=2 LD_PRELOAD=$gtest_lib XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1 $gtest_app $gtest_opt --gtest_filter=ultra_api_socket_send_receive.* --gtest_output=xml:${WORKSPACE}/${prefix}/test-xlio_ultra_api_full_sq_completion.xml" |
There was a problem hiding this comment.
Overly broad filter re-runs
ti_1/ti_2 with unintended env vars
The old filter ultra_api_socket_send_receive_full_sq* was designed to run only the full-SQ test with XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1. The new filter ultra_api_socket_send_receive.* matches all three tests (ti_1, ti_2, ti_3) so:
ti_1andti_2are now executed twice — once already by the genericultra_api*filter on line 108 (without special vars), and again here withXLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1.- Those two tests were not designed to run under the CC-algo/no-delay configuration, which may silently change their behaviour or mask real failures.
- The output XML file is still named
test-xlio_ultra_api_full_sq_completion.xml, making the results forti_1/ti_2hard to diagnose if they fail in this context.
The intent is clearly to run only the full-SQ test (ti_3) with the special vars. The filter should target that test specifically:
| eval "${sudo_cmd} $timeout_exe env GTEST_TAP=2 LD_PRELOAD=$gtest_lib XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1 $gtest_app $gtest_opt --gtest_filter=ultra_api_socket_send_receive.* --gtest_output=xml:${WORKSPACE}/${prefix}/test-xlio_ultra_api_full_sq_completion.xml" | |
| eval "${sudo_cmd} $timeout_exe env GTEST_TAP=2 LD_PRELOAD=$gtest_lib XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1 $gtest_app $gtest_opt --gtest_filter=ultra_api_socket_send_receive.ti_3 --gtest_output=xml:${WORKSPACE}/${prefix}/test-xlio_ultra_api_full_sq_completion.xml" |
| pd = NULL; | ||
| mr_buf = NULL; | ||
| accepted_sockets.clear(); | ||
| memset(sndbuf, 'A', sizeof(sndbuf)); |
There was a problem hiding this comment.
send_attr not reset between test runs
send_attr.mkey is set inside socket_event_cb_full_sq when the XLIO_SOCKET_EVENT_ESTABLISHED event fires, but send_attr is never cleared in SetUp(). If a test run leaves a stale mkey (e.g. from a crash or early exit in the child process before the event fires) and the memory region has already been deregistered by TearDown(), the next run of ti_3 could attempt a send with an invalid lkey before the ESTABLISHED callback has a chance to reset it.
Consider resetting send_attr.mkey to 0U in SetUp() to make the state fully deterministic:
| memset(sndbuf, 'A', sizeof(sndbuf)); | |
| memset(sndbuf, 'A', sizeof(sndbuf)); | |
| send_attr.mkey = 0U; |
Merge multiple send_receive and migrate test suites into single respective test suites. This reduces number of files and copy-paste. Signed-off-by: Dmytro Podgornyi <dmytrop@nvidia.com>
|
bot:retest |
Description
Merge multiple send_receive and migrate test suites into single respective test suites. This reduces number of files and copy-paste.
What
Merge multiple send_receive and migrate test suites into single respective test suites.
Why ?
Reduce numbe of files and noise during testing.
Change type
What kind of change does this PR introduce?
Check list