Skip to content

Add test for --unload-on-kill spawner lock behavior#3035

Open
naitikpahwa18 wants to merge 9 commits intoros-controls:masterfrom
naitikpahwa18:test-unload-on-kill-2566
Open

Add test for --unload-on-kill spawner lock behavior#3035
naitikpahwa18 wants to merge 9 commits intoros-controls:masterfrom
naitikpahwa18:test-unload-on-kill-2566

Conversation

@naitikpahwa18
Copy link
Copy Markdown

This PR adds a regression test to verify that a spawner started with
--unload-on-kill does not block other spawners while waiting for an
interrupt.

The test launches one spawner in the background and ensures a second
spawner can start without being delayed, covering the locking fix
introduced in #2559

Closes #2566

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a C++ regression test in controller_manager to ensure a spawner started with --unload-on-kill does not hold the global spawner lock while waiting for interrupt, allowing other spawners to start promptly (covers the fix from #2559; closes #2566).

Changes:

  • Add unload_on_kill_does_not_block_other_spawners test that runs one spawner in the background with --unload-on-kill.
  • Measure that a second spawner invocation completes within a bounded time, indicating it wasn’t blocked on the lock.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bmagyar
bmagyar previously approved these changes Feb 26, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 26, 2026

Codecov Report

❌ Patch coverage is 84.37500% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.17%. Comparing base (31fa961) to head (aecf629).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
controller_manager/test/test_spawner_unspawner.cpp 84.37% 0 Missing and 5 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3035   +/-   ##
=======================================
  Coverage   89.16%   89.17%           
=======================================
  Files         158      158           
  Lines       19538    19570   +32     
  Branches     1594     1599    +5     
=======================================
+ Hits        17422    17452   +30     
+ Misses       1462     1458    -4     
- Partials      654      660    +6     
Flag Coverage Δ
unittests 89.17% <84.37%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
controller_manager/test/test_spawner_unspawner.cpp 95.19% <84.37%> (-0.82%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +617 to +618
"timeout --signal=INT 5 "
"$(ros2 pkg prefix controller_manager)/lib/controller_manager/spawner "
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file’s call_spawner(...) helper prefixes commands with coveragepy_script (see the helper near the top of the file), and the existing timeout --signal=INT ... tests here also include that prefix. spawner_a_cmd omits it, which can make coverage runs inconsistent. Consider adding the same prefix (or reusing call_spawner via a timeout wrapper) so this spawner path is covered consistently.

Suggested change
"timeout --signal=INT 5 "
"$(ros2 pkg prefix controller_manager)/lib/controller_manager/spawner "
std::string("timeout --signal=INT 5 ") +
coveragepy_script +
" $(ros2 pkg prefix controller_manager)/lib/controller_manager/spawner "

Copilot uses AI. Check for mistakes.
Comment on lines +635 to +641
EXPECT_EQ(call_spawner("ctrl_2 -c test_controller_manager"), 0)
<< "Spawner B should not be blocked by Spawner A's --unload-on-kill lock";
auto spawner_b_elapsed = std::chrono::steady_clock::now() - spawner_b_start;

EXPECT_LT(std::chrono::duration_cast<std::chrono::seconds>(spawner_b_elapsed).count(), 30)
<< "Spawner B took too long — likely blocked by Spawner A holding the lock";

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call_spawner(...) is a blocking system() call. If the lock regression comes back, this call can still block for ~100s (20s timeout × 5 retries) before returning, and only then will the < 30s assertion fail. Consider adding a hard upper bound (e.g., wrap Spawner B in timeout, or run it in a future and fail if it doesn’t complete within the expected time).

Suggested change
EXPECT_EQ(call_spawner("ctrl_2 -c test_controller_manager"), 0)
<< "Spawner B should not be blocked by Spawner A's --unload-on-kill lock";
auto spawner_b_elapsed = std::chrono::steady_clock::now() - spawner_b_start;
EXPECT_LT(std::chrono::duration_cast<std::chrono::seconds>(spawner_b_elapsed).count(), 30)
<< "Spawner B took too long — likely blocked by Spawner A holding the lock";
auto spawner_b_future = std::async(
std::launch::async,
[]() { return call_spawner("ctrl_2 -c test_controller_manager"); });
// Enforce a hard upper bound on Spawner B execution time.
auto spawner_b_status = spawner_b_future.wait_for(std::chrono::seconds(30));
EXPECT_NE(spawner_b_status, std::future_status::timeout)
<< "Spawner B call_spawner did not complete within 30 seconds — likely blocked by "
<< "Spawner A holding the lock";
if (spawner_b_status != std::future_status::timeout)
{
int spawner_b_result = spawner_b_future.get();
EXPECT_EQ(spawner_b_result, 0)
<< "Spawner B should not be blocked by Spawner A's --unload-on-kill lock";
auto spawner_b_elapsed = std::chrono::steady_clock::now() - spawner_b_start;
EXPECT_LT(std::chrono::duration_cast<std::chrono::seconds>(spawner_b_elapsed).count(), 30)
<< "Spawner B took too long — likely blocked by Spawner A holding the lock";
}

Copilot uses AI. Check for mistakes.
Comment on lines +615 to +619
// Run Spawner A with --unload-on-kill in background; timeout sends SIGINT after 5s for cleanup.
std::string spawner_a_cmd =
"timeout --signal=INT 5 "
"$(ros2 pkg prefix controller_manager)/lib/controller_manager/spawner "
"ctrl_1 -c test_controller_manager --unload-on-kill";
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To exercise the lock regression deterministically, Spawner B should start while Spawner A is still alive and waiting for the interrupt. With a 5s timeout, there’s a race where Spawner A can exit before Spawner B starts, making the test pass without actually validating the lock-release behavior. Consider increasing the timeout or asserting Spawner A is still running (e.g., spawner_a_future.wait_for(0) not ready) before starting Spawner B.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@christophfroehlich christophfroehlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please discuss the feedback by the copilot review, and don't close the review threads by yourself (this should only be done by the reviewer).

@naitikpahwa18
Copy link
Copy Markdown
Author

@christophfroehlich Sorry for closing the threads, I thought they were addressed and didn't realize only the reviewer should do that. I've addressed all Copilot feedback in the latest commit.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +648 to +662
// Spawner B must not be blocked by Spawner A's lock; bug would cause ~100s delay (20s x 5
// retries).
auto spawner_b_start = std::chrono::steady_clock::now();
std::string spawner_b_cmd =
"timeout --signal=INT 30 " +
std::string(coveragepy_script) +
" $(ros2 pkg prefix controller_manager)/lib/controller_manager/spawner "
"ctrl_2 -c test_controller_manager";
int spawner_b_exit_code = std::system(spawner_b_cmd.c_str());
EXPECT_EQ(spawner_b_exit_code, 0)
<< "Spawner B should not be blocked by Spawner A's --unload-on-kill lock";
auto spawner_b_elapsed = std::chrono::steady_clock::now() - spawner_b_start;

EXPECT_LT(std::chrono::duration_cast<std::chrono::seconds>(spawner_b_elapsed).count(), 30)
<< "Spawner B took too long and may have been blocked by Spawner A holding the lock";
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current timing/timeout setup can still pass even if Spawner B is blocked on the spawner lock: if Spawner A holds the lock until it is killed at 15s, Spawner B can unblock after ~15s and still exit 0 well under the 30s timeout/threshold. To make this a real regression test for “does not block”, the Spawner B timeout/allowed elapsed time should be well below Spawner A’s lifetime (e.g., B timeout 5–10s and assert elapsed < that), or keep A alive longer but explicitly interrupt it after B finishes. Also, comparing duration_cast<seconds>(...).count() truncates sub-second precision; prefer comparing durations directly (e.g., elapsed < std::chrono::seconds(...)).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

" $(ros2 pkg prefix controller_manager)/lib/controller_manager/spawner "
"ctrl_1 -c test_controller_manager --unload-on-kill";
auto spawner_a_future = std::async(
std::launch::async, [&spawner_a_cmd]() { return std::system(spawner_a_cmd.c_str()); });
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spawner_a_cmd is captured by reference in the async lambda. While it currently stays in scope until the end of the test, capturing by value would make the thread safety/lifetime guarantees explicit and avoid accidental future refactors turning this into a use-after-scope issue.

Suggested change
std::launch::async, [&spawner_a_cmd]() { return std::system(spawner_a_cmd.c_str()); });
std::launch::async, [spawner_a_cmd]() { return std::system(spawner_a_cmd.c_str()); });

Copilot uses AI. Check for mistakes.
Comment on lines +617 to +620
std::string spawner_a_cmd =
"timeout --signal=INT 30 " + std::string(coveragepy_script) +
" $(ros2 pkg prefix controller_manager)/lib/controller_manager/spawner "
"ctrl_1 -c test_controller_manager --unload-on-kill";
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawner A and Spawner B are both invoked via coveragepy_script (python3 -m coverage run --append ...) and run concurrently. Coverage’s default .coverage data file is not safe to append from multiple processes at the same time, so this can corrupt coverage output and/or make the test flaky. Consider running only one spawner under coverage here, or switch to per-process coverage files (e.g., --parallel-mode / unique COVERAGE_FILE) and combine them after the test run.

Copilot uses AI. Check for mistakes.
Comment on lines +664 to +666
// Wait for Spawner A to be killed by timeout and verify it did not exit successfully
int spawner_a_exit_code = spawner_a_future.get();
EXPECT_NE(spawner_a_exit_code, 0)
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will always wait for Spawner A’s timeout --signal=INT 30 to elapse before finishing (spawner_a_future.get()), adding ~30s to the test suite even on success. Please reduce the timeout substantially (it only needs to be > Spawner B’s timeout) or explicitly terminate Spawner A once Spawner B has completed (e.g., by tracking the PID and sending SIGINT) to keep CI runtime reasonable.

Copilot uses AI. Check for mistakes.
Comment on lines +651 to +662
auto spawner_b_start = std::chrono::steady_clock::now();
std::string spawner_b_cmd =
"timeout --signal=INT 10 " + std::string(coveragepy_script) +
" $(ros2 pkg prefix controller_manager)/lib/controller_manager/spawner "
"ctrl_2 -c test_controller_manager";
int spawner_b_exit_code = std::system(spawner_b_cmd.c_str());
EXPECT_EQ(spawner_b_exit_code, 0)
<< "Spawner B should not be blocked by Spawner A's --unload-on-kill lock";
auto spawner_b_elapsed = std::chrono::steady_clock::now() - spawner_b_start;

EXPECT_LT(spawner_b_elapsed, std::chrono::seconds(10))
<< "Spawner B took too long and may have been blocked by Spawner A holding the lock";
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elapsed-time assertion is redundant and potentially flaky because the command is already wrapped in timeout ... 10, and the timer starts before building the command string (so small overhead can push the measured duration to ~10s even when timeout doesn’t trigger). Either remove this check, or assert against a smaller threshold with margin and measure only around the std::system() call.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@saikishor saikishor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good

Just some minor nitpicks

cm_->set_parameter(rclcpp::Parameter("ctrl_2.type", test_controller::TEST_CONTROLLER_CLASS_NAME));

// Run Spawner A with --unload-on-kill in background, keep it alive long enough to ensure
// Spawner B cannot succeed by waiting for Spawner A's timeout.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Spawner B cannot succeed by waiting for Spawner A's timeout.
// Spawner B should succeed, inspite of the waiting for Spawner A's timeout.

EXPECT_EQ(spawner_b_exit_code, 0)
<< "Spawner B should not be blocked by Spawner A's --unload-on-kill lock";

// Wait for Spawner A to be killed by timeout and verify it did not exit successfully
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Wait for Spawner A to be killed by timeout and verify it did not exit successfully
EXPECT_EQ(spawner_a_future.wait_for(std::chrono::seconds(0)), std::future_status::timeout)
<< "Spawner A exited already; lock-contention scenario might not be exercised";
// Wait for Spawner A to be killed by timeout and verify it did not exit successfully

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also verify that the ctrl_1 and ctrl_2 are active right after the spawner B success and then once the unload is done by the timeout, also verify that only ctrl_2 is active

@naitikpahwa18 naitikpahwa18 requested a review from saikishor March 30, 2026 13:43
Signed-off-by: Naitik Pahwa <naitikpahwa18@users.noreply.github.com>
Signed-off-by: Naitik Pahwa <naitikpahwa18@users.noreply.github.com>
Signed-off-by: Naitik Pahwa <naitikpahwa18@users.noreply.github.com>
Signed-off-by: Naitik Pahwa <naitikpahwa18@users.noreply.github.com>
Signed-off-by: Naitik Pahwa <naitikpahwa18@users.noreply.github.com>
Signed-off-by: Naitik Pahwa <naitikpahwa18@users.noreply.github.com>
Signed-off-by: Naitik Pahwa <naitikpahwa18@users.noreply.github.com>
Signed-off-by: Naitik Pahwa <naitikpahwa18@users.noreply.github.com>
Signed-off-by: Naitik Pahwa <naitikpahwa18@users.noreply.github.com>
@naitikpahwa18 naitikpahwa18 force-pushed the test-unload-on-kill-2566 branch from be135e6 to aecf629 Compare March 31, 2026 14:29
Copy link
Copy Markdown
Member

@christophfroehlich christophfroehlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't do force pushes, this only requires more effort for another review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Spawner test for the --unload-on-kill locking other spawners

5 participants