Skip to content

issue: 4946495 Merge Ultra API gtests by categories#592

Open
pasis wants to merge 1 commit intoMellanox:vNextfrom
pasis:ultra-api-gtests
Open

issue: 4946495 Merge Ultra API gtests by categories#592
pasis wants to merge 1 commit intoMellanox:vNextfrom
pasis:ultra-api-gtests

Conversation

@pasis
Copy link
Copy Markdown
Member

@pasis pasis commented Mar 24, 2026

Description

Merge multiple send_receive and migrate test suites into single respective test suites. This reduces number of files and copy-paste.

What

Merge multiple send_receive and migrate test suites into single respective test suites.

Why ?

Reduce numbe of files and noise during testing.

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

@pasis pasis requested a review from BasharRadya March 24, 2026 15:11
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR consolidates four test files into two by merging xlio_socket_migrate_2.ccxlio_socket_migrate.cc (as ti_2) and both xlio_socket_send_receive_2.cc + xlio_socket_send_receive_full_sq.ccxlio_socket_send_receive.cc (as ti_2 and ti_3). The consolidation is functionally faithful to the deleted originals, reduces boilerplate, and introduces helpful named constants (NUMBER_OF_MESSAGES, MIGRATE_AFTER_SECONDS, full_sq_segment_size) in place of magic numbers.

Key observations:

  • Behavioral fidelity: The migrated test logic — including fork/barrier patterns, MR registration sizes, and completion-counter thresholds — is preserved exactly from the deleted files.
  • sndbuf sizing: The original full-SQ test used char sndbuf[32]; the merged file uses char sndbuf[256] but limits the MR registration and all sends to full_sq_segment_size = 32, keeping the RDMA behaviour identical.
  • send_attr linkage improvement: send_attr is now correctly declared static (file-local linkage) rather than the non-static declaration in the deleted file.
  • memset in SetUp: Adding memset(sndbuf, 'A', sizeof(sndbuf)) to the shared SetUp is safe for ti_1 and ti_2 because base_send_single_msg copies data_to_send into sndbuf before posting the send, and the existing socket_rx_cb memcmp check continues to pass.
  • Open concerns from prior review threads (not re-raised here): the --gtest_filter=ultra_api_socket_send_receive.* in gtest.sh re-runs ti_1/ti_2 under XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1, and send_attr.mkey is not reset in SetUp() before ti_3.

Confidence Score: 4/5

  • Safe to merge once the two open concerns from prior review threads are resolved — test logic itself is faithfully preserved from the deleted files.
  • The consolidation is mechanically correct with no new logic bugs introduced. All migrated tests are functionally identical to their deleted originals. The only outstanding issues (gtest.sh filter scope and send_attr.mkey reset) were flagged in earlier review rounds; if the author has addressed or consciously deferred those, the PR is effectively ready to land.
  • contrib/jenkins_tests/gtest.sh — the broadened gtest filter and stale XML filename warrant a targeted fix before merging.

Important Files Changed

Filename Overview
contrib/jenkins_tests/gtest.sh Filter changed from ultra_api_socket_send_receive_full_sq* to ultra_api_socket_send_receive.*, causing all three merged tests to run with special CC/nodelay env vars instead of only ti_3; XML output filename is stale.
tests/gtest/Makefile.am Correctly removes the four deleted source files and keeps only the two merged files (xlio_socket_migrate.cc and xlio_socket_send_receive.cc).
tests/gtest/xlio_ultra_api/xlio_socket_send_receive.cc Merges ti_2 (server-sends) and ti_3 (full-SQ) into the existing fixture; introduces new callbacks and full_sq_segment_size constant; send_attr.mkey is not reset in SetUp(), leaving stale state across test runs.
tests/gtest/xlio_ultra_api/xlio_socket_migrate.cc Merges delayed-migrate test (formerly xlio_socket_migrate_2.cc ti_1) into new ti_2; adds NUMBER_OF_MESSAGES / MIGRATE_AFTER_SECONDS macros replacing magic numbers; logic is functionally identical to the deleted file.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph ultra_api_socket_send_receive ["ultra_api_socket_send_receive fixture (xlio_socket_send_receive.cc)"]
        SU[SetUp: reset counters, mr_buf=NULL,\nmemset sndbuf with 'A']
        SU --> ti1 & ti2 & ti3

        subgraph ti1["ti_1: initiator sends → target receives"]
            A1[fork] --> A2[child: bind/listen\nparent: connect]
            A2 --> A3[socket_event_cb: ESTABLISHED\nregister MR 256B, send data_to_send]
            A3 --> A4[child: socket_rx_cb\nmemcmp check]
            A4 --> A5[barrier + teardown]
        end

        subgraph ti2["ti_2: target sends → initiator receives"]
            B1[fork] --> B2[child: bind/listen\nparent: connect]
            B2 --> B3[socket_accept_cb_send: ACCEPT\nregister MR 256B, send data_to_send]
            B3 --> B4[parent: socket_rx_cb\nmemcmp check]
            B4 --> B5[barrier + teardown]
        end

        subgraph ti3["ti_3: full SQ stress (XLIO_TCP_CC_ALGO=2 required)"]
            C1[fork] --> C2[child: bind/listen\nparent: connect]
            C2 --> C3[socket_event_cb_full_sq: ESTABLISHED\nregister MR 32B, send 10 initial msgs,\nset send_attr.mkey]
            C3 --> C4[parent: fill SQ with 16384 × 32B WQEs]
            C4 --> C5[wait for completions\nsocket_rx_cb_nocheck]
            C5 --> C6[barrier + teardown]
        end
    end

    subgraph migrate ["ultra_api_socket_migrate fixture (xlio_socket_migrate.cc)"]
        SM[SetUp: reset all counters,\ndo_migrate=false]
        SM --> m1 & m2

        subgraph m1["ti_1: immediate migrate"]
            D1[fork → connect] --> D2[first rx triggers do_migrate\ndetach+attach to group_2 immediately]
        end

        subgraph m2["ti_2: delayed migrate (10s timer)"]
            E1[fork → connect] --> E2[first rx triggers do_migrate\ndetach, then attach after 10s]
        end
    end
Loading

Reviews (2): Last reviewed commit: "issue: 4946495 Merge Ultra API gtests by..." | Re-trigger Greptile


# Full SQ completion test - Needs XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1 to fill SQ deterministically without handling initially small congestion window and delayed packets.
eval "${sudo_cmd} $timeout_exe env GTEST_TAP=2 LD_PRELOAD=$gtest_lib XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1 $gtest_app $gtest_opt --gtest_filter=ultra_api_socket_send_receive_full_sq* --gtest_output=xml:${WORKSPACE}/${prefix}/test-xlio_ultra_api_full_sq_completion.xml"
eval "${sudo_cmd} $timeout_exe env GTEST_TAP=2 LD_PRELOAD=$gtest_lib XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1 $gtest_app $gtest_opt --gtest_filter=ultra_api_socket_send_receive.* --gtest_output=xml:${WORKSPACE}/${prefix}/test-xlio_ultra_api_full_sq_completion.xml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Overly broad filter re-runs ti_1/ti_2 with unintended env vars

The old filter ultra_api_socket_send_receive_full_sq* was designed to run only the full-SQ test with XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1. The new filter ultra_api_socket_send_receive.* matches all three tests (ti_1, ti_2, ti_3) so:

  1. ti_1 and ti_2 are now executed twice — once already by the generic ultra_api* filter on line 108 (without special vars), and again here with XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1.
  2. Those two tests were not designed to run under the CC-algo/no-delay configuration, which may silently change their behaviour or mask real failures.
  3. The output XML file is still named test-xlio_ultra_api_full_sq_completion.xml, making the results for ti_1/ti_2 hard to diagnose if they fail in this context.

The intent is clearly to run only the full-SQ test (ti_3) with the special vars. The filter should target that test specifically:

Suggested change
eval "${sudo_cmd} $timeout_exe env GTEST_TAP=2 LD_PRELOAD=$gtest_lib XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1 $gtest_app $gtest_opt --gtest_filter=ultra_api_socket_send_receive.* --gtest_output=xml:${WORKSPACE}/${prefix}/test-xlio_ultra_api_full_sq_completion.xml"
eval "${sudo_cmd} $timeout_exe env GTEST_TAP=2 LD_PRELOAD=$gtest_lib XLIO_TCP_CC_ALGO=2 XLIO_TCP_NODELAY=1 $gtest_app $gtest_opt --gtest_filter=ultra_api_socket_send_receive.ti_3 --gtest_output=xml:${WORKSPACE}/${prefix}/test-xlio_ultra_api_full_sq_completion.xml"

pd = NULL;
mr_buf = NULL;
accepted_sockets.clear();
memset(sndbuf, 'A', sizeof(sndbuf));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 send_attr not reset between test runs

send_attr.mkey is set inside socket_event_cb_full_sq when the XLIO_SOCKET_EVENT_ESTABLISHED event fires, but send_attr is never cleared in SetUp(). If a test run leaves a stale mkey (e.g. from a crash or early exit in the child process before the event fires) and the memory region has already been deregistered by TearDown(), the next run of ti_3 could attempt a send with an invalid lkey before the ESTABLISHED callback has a chance to reset it.

Consider resetting send_attr.mkey to 0U in SetUp() to make the state fully deterministic:

Suggested change
memset(sndbuf, 'A', sizeof(sndbuf));
memset(sndbuf, 'A', sizeof(sndbuf));
send_attr.mkey = 0U;

Merge multiple send_receive and migrate test suites into single respective
test suites. This reduces number of files and copy-paste.

Signed-off-by: Dmytro Podgornyi <dmytrop@nvidia.com>
@pasis pasis force-pushed the ultra-api-gtests branch from d70f448 to 2dafbb1 Compare March 25, 2026 07:57
@pasis
Copy link
Copy Markdown
Member Author

pasis commented Mar 25, 2026

bot:retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant