Add -ORBTransportIdleTimeout to purge idle transports#2523
Add -ORBTransportIdleTimeout to purge idle transports#2523jwillemsen merged 50 commits intoDOCGroup:masterfrom
-ORBTransportIdleTimeout to purge idle transports#2523Conversation
* TAO/tao/Transport_Idle_Timer.cpp:
* TAO/tao/Transport_Idle_Timer.h:
Added.
* TAO/tao/Cache_Entries_T.inl:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport.h:
* TAO/tao/Transport_Cache_Manager_T.h:
* TAO/tao/Transport_Cache_Manager_T.inl:
* TAO/tao/tao.mpc:
* TAO/tao/Transport.h:
* TAO/tao/Transport_Idle_Timer.h:
* TAO/tao/Transport_Timer.cpp:
* TAO/tao/Transport_Timer.h:
* TAO/tao/Transport_Cache_Manager_T.inl:
* TAO/tao/default_resource.cpp:
* TAO/tao/default_resource.h:
* TAO/tao/Resource_Factory.h:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport.h:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport.h:
* TAO/tao/Transport_Idle_Timer.cpp:
* TAO/tao/Transport_Idle_Timer.h:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport.h:
* TAO/tao/Transport_Cache_Manager_T.cpp:
* TAO/tao/Transport_Connector.cpp:
* TAO/tao/Transport_Idle_Timer.h:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/server.cpp:
* TAO/tests/Transport_Idle_Timeout/svc.conf:
* TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
* TAO/tao/Transport.cpp:
* TAO/bin/tao_orb_tests.lst:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/test.idl:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport_Idle_Timer.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/docs/Options.html:
* TAO/tao/Transport_Idle_Timer.h:
* TAO/tao/Transport_Idle_Timer.h:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/test.idl:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/test.idl:
* TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
Deleted.
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
Added.
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/svc.conf:
* TAO/tests/Transport_Idle_Timeout/test.idl:
…d multiple gets too complicated for a unit test
* TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
Added.
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Idle_Transport_Timeout.mpc:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/svc.conf:
* TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/test.idl:
* TAO/tao/Transport_Cache_Manager_T.cpp:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport_Cache_Manager_T.cpp:
* TAO/tao/Transport_Cache_Manager_T.h:
* TAO/tao/Transport.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport_Cache_Manager_T.cpp:
* TAO/tao/Transport_Cache_Manager_T.h:
* TAO/tao/Transport_Cache_Manager_T.inl:
* TAO/tao/Transport_Connector.cpp:
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a transport idle-timeout feature: new Transport_Idle_Timer type, timer wiring in TAO_Transport (schedule/cancel/handler), cache-manager conditional purge path, Resource_Factory API + option parsing, build/test registrations, and a regression test suite covering enabled, disabled, and multi-server scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/App
participant Cache as Transport Cache Manager
participant Trans as TAO_Transport
participant Timer as Transport Idle Timer
participant Reactor as ORB Reactor
Client->>Cache: find_transport()
activate Cache
Cache->>Trans: entry->item().transport()->cancel_idle_timer()
activate Trans
Trans->>Reactor: cancel scheduled idle timer
deactivate Trans
Cache-->>Client: return transport
deactivate Cache
Client->>Trans: use transport, then make_idle()
activate Trans
Trans->>Cache: notify entry idle (mark ENTRY_IDLE_AND_PURGABLE)
Cache-->>Trans: confirmed idle/purgable
Trans->>Reactor: schedule idle timer (timeout seconds)
deactivate Trans
Note over Reactor: time passes
Reactor->>Timer: handle_timeout(current_time)
activate Timer
Timer->>Trans: add_reference()
Timer->>Trans: Trans.handle_idle_timeout(current_time)
activate Trans
Trans->>Cache: purge_entry_when_purgable(entry)
Cache-->>Trans: purged? (yes/no)
alt purged
Trans->>Trans: close_connection()
else not purged
Trans->>Reactor: schedule idle timer (reschedule)
end
Trans->>Timer: remove_reference()
deactivate Trans
deactivate Timer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
* TAO/tao/Transport.cpp:
ORBTransportIdleTimeout to purge idle transports-ORBTransportIdleTimeout to purge idle transports
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
TAO/tao/Transport.cpp (1)
988-1026:⚠️ Potential issue | 🔴 CriticalStale idle callbacks can still clobber a newer timer.
TAO/tao/Transport_Cache_Manager_T.cpp:273-371cancels the timer on cache reuse, butschedule_idle_timer()still passesnullptras ACT andhandle_idle_timeout()still ignoresact. Combined with the unsynchronized reads/writes ofidle_timer_id_, an already-dequeued old callback can clear the ID for a newer timer and then reschedule or close the transport while it is busy. This still needs versioned timer state beforeidle_timer_id_is mutated.Verify that
idle_timer_id_is written from multiple paths and that there is no generation/version check guarding stale callbacks. Expected result: matches inschedule_idle_timer(),cancel_idle_timer(),handle_idle_timeout(), and the cache-reuse path, but noidle_timer_gen_-style state.#!/bin/bash set -euo pipefail rg -n -C3 '\bidle_timer_id_\b|\bidle_timer_gen_\b|handle_idle_timeout\s*\(|schedule_idle_timer\s*\(|cancel_idle_timer\s*\(' \ TAO/tao/Transport.cpp \ TAO/tao/Transport.h \ TAO/tao/Transport_Cache_Manager_T.cpp \ TAO/tao/Transport_Idle_Timer.cppAlso applies to: 2941-2983
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TAO/tao/Transport.cpp`:
- Around line 566-571: post_open() is exposing the cache entry as
ENTRY_IDLE_AND_PURGABLE but the idle timer is being armed here instead of via
make_idle(), which can leave the initial idle window untracked; update the code
so the transition is routed through
transport_cache_manager().make_idle(cache_map_entry_) (or modify make_idle()) to
arm the idle timer internally rather than calling schedule_idle_timer() locally
— ensure make_idle() is responsible for calling schedule_idle_timer() when it
returns success (result == 0) for ENTRY_IDLE_AND_PURGABLE so the initial idle
transition uses the same -ORBTransportIdleTimeout path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 43cb5897-8084-4754-82d2-fd253ef62bd2
📒 Files selected for processing (5)
TAO/docs/Options.htmlTAO/tao/Transport.cppTAO/tao/Transport.hTAO/tao/Transport_Cache_Manager_T.hTAO/tao/Transport_Cache_Manager_T.inl
✅ Files skipped from review due to trivial changes (1)
- TAO/docs/Options.html
🚧 Files skipped from review as they are similar to previous changes (2)
- TAO/tao/Transport_Cache_Manager_T.inl
- TAO/tao/Transport.h
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
TAO/tao/Transport.cpp (2)
2876-2877:⚠️ Potential issue | 🟠 MajorInitial idle transition still bypasses
make_idle().Line 2877 sets
ENTRY_IDLE_AND_PURGABLEdirectly, but timer arming now lives inmake_idle()(Line 566-Line 571). This still leaves the first idle window untracked for never-reused transports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Transport.cpp` around lines 2876 - 2877, The transport entry is being moved to ENTRY_IDLE_AND_PURGABLE by directly calling transport_cache_manager().set_entry_state(this->cache_map_entry_, TAO::ENTRY_IDLE_AND_PURGABLE) which bypasses make_idle() (which arms the idle timer); instead use the existing make_idle() logic to perform the transition so the timer is armed for first-time idle entries — i.e., replace the direct set_entry_state call (or call make_idle() immediately when creating/marking the entry) so make_idle() handles the state change and timer arming for cache_map_entry_ via transport_cache_manager()/make_idle().
989-1001:⚠️ Potential issue | 🔴 CriticalIdle-timer state is still racy; stale callbacks can corrupt timer lifecycle.
Line 989 ignores
act, and Lines 1000, 2949, and 2979 read/writeidle_timer_id_across different threads without serialization. A stale timeout/cancel can clear the id for a newer timer, leaving orphaned reactor timers and unsafe callback lifetimes.Suggested direction
- int - TAO_Transport::handle_idle_timeout (const ACE_Time_Value &, const void */*act*/) + int + TAO_Transport::handle_idle_timeout (const ACE_Time_Value &, const void *act) { + // decode generation from act and ignore stale callbacks + // guard idle timer state updates under handler_lock_ (or equivalent) } void TAO_Transport::schedule_idle_timer () { + // serialize with handler_lock_ + // increment generation + // optionally cancel previous timer id before scheduling new one + // pass generation token via act } void TAO_Transport::cancel_idle_timer () { + // serialize with handler_lock_ + // clear/cancel only current generation }Also applies to: 2941-2980
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Transport.cpp` around lines 989 - 1001, The idle-timer handling is racy: TAO_Transport::handle_idle_timeout ignores the act and unsafely reads/writes idle_timer_id_ across threads; change the logic to validate the callback token before clearing or updating idle_timer_id_ (e.g., compare the incoming act/token to the transport's current timer identifier or a per-timer opaque token), and protect access to idle_timer_id_ with a synchronization mechanism (either use an atomic compare-exchange on idle_timer_id_ or a mutex used by all timer start/cancel paths such as the methods that set/clear idle_timer_id_ and in handle_idle_timeout); only clear/cancel the reactor timer when the token matches to prevent a stale callback from clearing a newer timer and leaking/reactor-callback lifetime corruption.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TAO/tao/Transport.cpp`:
- Around line 2947-2950: The schedule_idle_timer() call currently dereferences
ACE_Reactor* reactor obtained from this->orb_core_->reactor() and blindly
assigns reactor->schedule_timer(...) to this->idle_timer_id_, which can crash if
reactor is null or hide failures when schedule_timer returns -1; update the
method to (1) check reactor for nullptr before calling schedule_timer and skip
scheduling if null, (2) call reactor->schedule_timer(...) and check the return
value for -1, (3) on failure log an appropriate message and set
this->idle_timer_id_ to an invalid sentinel (e.g. -1) or leave it unchanged, and
(4) ensure you still use this->transport_idle_timer_ as the handler; reference
symbols: schedule_idle_timer (method), reactor (ACE_Reactor*),
this->orb_core_->reactor(), reactor->schedule_timer(), this->idle_timer_id_,
this->transport_idle_timer_.
---
Duplicate comments:
In `@TAO/tao/Transport.cpp`:
- Around line 2876-2877: The transport entry is being moved to
ENTRY_IDLE_AND_PURGABLE by directly calling
transport_cache_manager().set_entry_state(this->cache_map_entry_,
TAO::ENTRY_IDLE_AND_PURGABLE) which bypasses make_idle() (which arms the idle
timer); instead use the existing make_idle() logic to perform the transition so
the timer is armed for first-time idle entries — i.e., replace the direct
set_entry_state call (or call make_idle() immediately when creating/marking the
entry) so make_idle() handles the state change and timer arming for
cache_map_entry_ via transport_cache_manager()/make_idle().
- Around line 989-1001: The idle-timer handling is racy:
TAO_Transport::handle_idle_timeout ignores the act and unsafely reads/writes
idle_timer_id_ across threads; change the logic to validate the callback token
before clearing or updating idle_timer_id_ (e.g., compare the incoming act/token
to the transport's current timer identifier or a per-timer opaque token), and
protect access to idle_timer_id_ with a synchronization mechanism (either use an
atomic compare-exchange on idle_timer_id_ or a mutex used by all timer
start/cancel paths such as the methods that set/clear idle_timer_id_ and in
handle_idle_timeout); only clear/cancel the reactor timer when the token matches
to prevent a stale callback from clearing a newer timer and
leaking/reactor-callback lifetime corruption.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 43cb5897-8084-4754-82d2-fd253ef62bd2
📒 Files selected for processing (5)
TAO/docs/Options.htmlTAO/tao/Transport.cppTAO/tao/Transport.hTAO/tao/Transport_Cache_Manager_T.hTAO/tao/Transport_Cache_Manager_T.inl
✅ Files skipped from review due to trivial changes (1)
- TAO/docs/Options.html
🚧 Files skipped from review as they are similar to previous changes (2)
- TAO/tao/Transport_Cache_Manager_T.h
- TAO/tao/Transport.h
* TAO/tao/Transport.cpp:
* TAO/tao/Transport_Cache_Manager_T.cpp:
* TAO/tao/Transport_Cache_Manager_T.inl:
* TAO/tao/Transport.cpp:
* ACE/ace/Event_Handler_Handle_Timeout_Upcall.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/test.idl:
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
TAO/tests/Transport_Idle_Timeout/run_test.pl (1)
149-191:⚠️ Potential issue | 🟠 MajorStop both servers on every multi-server setup failure.
Once
server1is running, these early-return branches still only kill the process that directly failed. The sibling server can stay alive and contaminate the next scenario.Suggested cleanup pattern
sub run_multiple_scenario { my ($label, $svc_conf, $extra_client_args) = `@_`; + my ($sv1_started, $sv2_started) = (0, 0); print "\n### $label ###\n"; @@ my $SV1 = $server1->CreateProcess ( "server", "-ORBdebuglevel $debug_level -ORBVerboseLogging 1 -ORBSvcConf $svc_conf -o $server1_ior" ); my $SV2 = $server2->CreateProcess ( "server", "-ORBdebuglevel $debug_level -ORBVerboseLogging 1 -ORBSvcConf $svc_conf -o $server2_ior" ); + my $cleanup_servers = sub { + $SV1->Kill (); $SV1->TimedWait (1) if $sv1_started; + $SV2->Kill (); $SV2->TimedWait (1) if $sv2_started; + }; @@ if ($server1_status != 0) { print STDERR "ERROR: server1 Spawn returned $server1_status\n"; return 1; } + $sv1_started = 1; @@ if ($server2_status != 0) { print STDERR "ERROR: server2 Spawn returned $server2_status\n"; + $cleanup_servers->(); return 1; } + $sv2_started = 1; @@ - $SV1->Kill (); $SV1->TimedWait (1); + $cleanup_servers->(); return 1;Apply the same
$cleanup_servers->()replacement to the laterWaitForFileTimed,GetFile, andPutFilefailure branches too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 149 - 191, The failure branches after starting servers only kill the directly-failed server (e.g., the WaitForFileTimed, GetFile, PutFile error handlers referencing $SV1/$SV2), leaving the sibling server running; update those branches to call the shared cleanup routine (invoke $cleanup_servers->() or the existing cleanup function used elsewhere) instead of directly calling $SV1->Kill/$SV2->Kill and TimedWait so both servers are stopped on any multi-server setup failure (apply this to the WaitForFileTimed, GetFile, and PutFile failure blocks that reference $server1/$server2, $SV1/$SV2).
🧹 Nitpick comments (1)
TAO/tests/Transport_Idle_Timeout/run_test.pl (1)
91-106: Use a per-scenario status instead of the global$status.After the first failure, both helpers return non-zero on every later happy path because they mutate and return the script-wide flag. The final exit code is still right, but the helper result no longer describes the scenario that just ran.
Minimal refactor
sub run_scenario { my ($label, $svc_conf, $extra_client_args) = `@_`; + my $scenario_status = 0; @@ if ($client_status != 0) { print STDERR "ERROR: client returned $client_status\n"; - $status = 1; + $scenario_status = 1; } @@ if ($server_exit != 0) { print STDERR "ERROR: server returned $server_exit\n"; - $status = 1; + $scenario_status = 1; } @@ - return $status; + return $scenario_status; }Mirror the same change in
run_multiple_scenario.Also applies to: 200-223
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 91 - 106, The helper currently mutates and returns the script-global $status, so after one failure subsequent scenario helper calls always return non-zero; change each helper (the block that checks $client_status and $server_exit in run_test.pl, and the analogous code in run_multiple_scenario) to use a local per-scenario variable (e.g., my $scenario_status = 0) instead of the global $status: set $scenario_status = 1 when $client_status != 0 or $server_exit != 0, delete files as before, and return $scenario_status; repeat the same refactor for the other occurrence (the code region around lines 200-223 / run_multiple_scenario) so each scenario result reflects only that scenario.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TAO/tests/Transport_Idle_Timeout/client_multiple.cpp`:
- Around line 105-126: The code currently treats the second server IOR (ior2) as
optional; update parse_args to require and validate -l by adding a nil/empty
check for ior2 and return an error if missing, and also add a runtime check
after converting the IOR to an object/narrowing (the echo2 variable) to ensure
the narrowed servant is not nil and fail-fast with a clear error message; modify
the parse_args function to enforce ior2 presence and update the place where
echo2 is obtained to validate the result and exit on failure.
---
Duplicate comments:
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 149-191: The failure branches after starting servers only kill the
directly-failed server (e.g., the WaitForFileTimed, GetFile, PutFile error
handlers referencing $SV1/$SV2), leaving the sibling server running; update
those branches to call the shared cleanup routine (invoke $cleanup_servers->()
or the existing cleanup function used elsewhere) instead of directly calling
$SV1->Kill/$SV2->Kill and TimedWait so both servers are stopped on any
multi-server setup failure (apply this to the WaitForFileTimed, GetFile, and
PutFile failure blocks that reference $server1/$server2, $SV1/$SV2).
---
Nitpick comments:
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 91-106: The helper currently mutates and returns the script-global
$status, so after one failure subsequent scenario helper calls always return
non-zero; change each helper (the block that checks $client_status and
$server_exit in run_test.pl, and the analogous code in run_multiple_scenario) to
use a local per-scenario variable (e.g., my $scenario_status = 0) instead of the
global $status: set $scenario_status = 1 when $client_status != 0 or
$server_exit != 0, delete files as before, and return $scenario_status; repeat
the same refactor for the other occurrence (the code region around lines 200-223
/ run_multiple_scenario) so each scenario result reflects only that scenario.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f17d0686-b922-4458-a73b-7218c2d5ca41
📒 Files selected for processing (7)
ACE/ace/Event_Handler_Handle_Timeout_Upcall.cppTAO/tests/Transport_Idle_Timeout/Echo_i.cppTAO/tests/Transport_Idle_Timeout/Echo_i.hTAO/tests/Transport_Idle_Timeout/client.cppTAO/tests/Transport_Idle_Timeout/client_multiple.cppTAO/tests/Transport_Idle_Timeout/run_test.plTAO/tests/Transport_Idle_Timeout/test.idl
✅ Files skipped from review due to trivial changes (2)
- ACE/ace/Event_Handler_Handle_Timeout_Upcall.cpp
- TAO/tests/Transport_Idle_Timeout/Echo_i.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
- TAO/tests/Transport_Idle_Timeout/test.idl
- TAO/tests/Transport_Idle_Timeout/client.cpp
* TAO/tao/Transport.cpp:
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
TAO/tests/Transport_Idle_Timeout/client_multiple.cpp (1)
105-126:⚠️ Potential issue | 🟠 MajorRequire and validate the second server IOR.
client_multiplepassesecho2into every scenario, but-lis still treated as optional and onlyechois nil-checked. If the second IOR is missing orecho2narrows to nil, the test may fail in non-obvious ways whenping()is called with a nil reference.🔧 Fail fast on a missing or invalid second endpoint
static int parse_args (int argc, ACE_TCHAR *argv[]) { ACE_Get_Opt get_opts (argc, argv, ACE_TEXT ("k:l:t:n:d")); // ... default: ACE_ERROR_RETURN ((LM_ERROR, - ACE_TEXT ("Usage: client -k <ior> [-l <ior>] [-t <sec>] [-n <N>] [-d]\n")), + ACE_TEXT ("Usage: client_multiple -k <ior1> -l <ior2> [-t <sec>] [-n <N>] [-d]\n")), -1); } - if (ior == nullptr) + if (ior == nullptr || ior2 == nullptr) ACE_ERROR_RETURN ((LM_ERROR, - ACE_TEXT ("client: -k <IOR> is required\n")), -1); + ACE_TEXT ("client_multiple: -k <IOR1> and -l <IOR2> are required\n")), -1); return 0; }And add nil check after narrowing:
Test::Echo_var echo = Test::Echo::_narrow (obj.in ()); - Test::Echo_var echo2 = Test::Echo::_narrow (obj2.in ());; + Test::Echo_var echo2 = Test::Echo::_narrow (obj2.in ()); if (CORBA::is_nil (echo.in ())) ACE_ERROR_RETURN ((LM_ERROR, ACE_TEXT ("Narrow to Test::Echo failed\n")), 1); + if (CORBA::is_nil (echo2.in ())) + ACE_ERROR_RETURN ((LM_ERROR, + ACE_TEXT ("Narrow to second Test::Echo failed\n")), 1);Also applies to: 289-301
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tests/Transport_Idle_Timeout/client_multiple.cpp` around lines 105 - 126, Make the second server IOR mandatory and validate the narrowed reference before use: update parse_args to treat -l as required by checking ior2 != nullptr and returning an error if missing, and in the client initialization where echo2 is obtained via narrowing (the echo2 narrow call), add a nil-check after narrowing and fail fast with a clear ACE_ERROR_RETURN if echo2 is null so ping()/other calls never receive a nil reference; apply the same validation pattern to the other occurrence referenced around lines 289-301.TAO/tests/Transport_Idle_Timeout/run_test.pl (1)
149-191:⚠️ Potential issue | 🟠 MajorStop both servers on multi-server setup failures.
Once
server1has started, theserver2spawn / IOR wait / file-transfer failure branches only kill the process that directly failed. The other server can stay alive and contaminate the next scenario. Please funnel those early returns through a shared cleanup helper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 149 - 191, The failure branches after server1 has started only kill the individual failing server (e.g., branches checking $server2_status, WaitForFileTimed on $ior_file2, GetFile/PutFile for $ior_file2) leaving the other server ($SV1/$server1) running; add a shared cleanup helper (e.g., cleanup_servers or stop_all_servers) that always stops and waits for both $SV1 and $SV2 (kill + TimedWait) and call this helper before every early return in all error branches after server1 starts (refer to symbols $SV1, $SV2, $server1, $server2, $client, $ior_file1, $ior_file2 and the existing Spawn/WaitForFileTimed/GetFile/PutFile checks to locate where to insert the calls).TAO/tao/Transport.cpp (1)
988-1025:⚠️ Potential issue | 🔴 CriticalVersion the idle timer state;
idle_timer_id_alone is still racy.
handle_idle_timeout()still ignoresactand unconditionally clearsidle_timer_id_, whileschedule_idle_timer()andcancel_idle_timer()mutate the same field from other paths. A dequeued stale timeout can still clobber a newer timer or reschedule after reuse, leaving an orphaned reactor entry behind this embedded handler. Please serialize this state and gate callbacks/cancels with a generation token.Also applies to: 2944-2987
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Transport.cpp` around lines 988 - 1025, The idle timer removal is racy because handle_idle_timeout() ignores the act parameter and unconditionally clears idle_timer_id_, letting stale timeouts clobber newer timers; fix by introducing a generation token (e.g. a member like idle_timer_generation_ of an atomic integer type) that schedule_idle_timer() increments and embeds into the reactor act, have cancel_idle_timer() increment the generation to invalidate outstanding timers, and modify handle_idle_timeout(void const* act) to cast and compare the incoming generation token against idle_timer_generation_ before clearing idle_timer_id_ or rescheduling; only clear or act on the timer when the generations match so stale dequeued timers are ignored.
🧹 Nitpick comments (4)
TAO/tao/Transport_Cache_Manager_T.h (1)
130-132: Consider documenting return semantics for the new purge API.A short contract note here will make callers safer and reduce ambiguity.
Suggested doc-only update
- /// Purge the entry from the Cache Map only when the entry is purgable + /// Purge from the cache only when the entry is currently purgable. + /// Returns the underlying purge result when purged; otherwise returns + /// a non-success code for no-op/not-purgable paths. int purge_entry_when_purgable (HASH_MAP_ENTRY *& entry);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Transport_Cache_Manager_T.h` around lines 130 - 132, Add a short contract comment above the declaration of purge_entry_when_purgable(HASH_MAP_ENTRY *& entry) that documents its return semantics: explicitly state what numeric values mean (e.g., success vs. not-purgable vs. error), which error codes are returned (or that it returns -1 on failure), and whether callers must free/handle the entry on each case; align wording with existing conventions used by other cache APIs in this file. This makes callers like those invoking purge_entry_when_purgable and any code inspecting HASH_MAP_ENTRY behavior unambiguous about handling return values and ownership.TAO/tao/Resource_Factory.h (1)
257-258: Document the new interface contract at declaration site.Please add a short API comment describing units and disable semantics to prevent divergent implementations.
Suggested doc-only update
- virtual int transport_idle_timeout () const = 0; + /// Idle transport timeout in seconds; 0 disables idle-timeout purging. + virtual int transport_idle_timeout () const = 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Resource_Factory.h` around lines 257 - 258, Add an API comment above the pure virtual method transport_idle_timeout() in Resource_Factory.h describing the contract: state the units (e.g., seconds or milliseconds), what a return value of 0 or negative means (e.g., disables timeout), whether the value is a hard or soft limit, whether callers must convert units, and any thread-safety/ownership expectations; update the declaration site comment so all implementers use the same semantics when overriding transport_idle_timeout().TAO/tao/Transport.h (1)
1126-1133: Consider usingstd::atomic<long>foridle_timer_id_for clarity and consistency.The
idle_timer_id_member follows the same access pattern asflush_timer_id_, with reads and writes from different thread contexts:
schedule_idle_timer()(called from app thread) writes the timer IDcancel_idle_timer()(called from destructor/close_connection) reads and writeshandle_idle_timeout()(called from reactor thread) writes -1While ACE's
reactor->cancel_timer()is thread-safe and the existing pattern withflush_timer_id_works reliably in practice, usingstd::atomic<long>would make the thread-safe intent explicit and avoid potential platform-specific issues with non-atomiclongwrites. This would be consistent with modern C++ practices, though it is not required for correctness given ACE's guarantees.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Transport.h` around lines 1126 - 1133, Replace the plain long idle_timer_id_ with std::atomic<long> to make the thread-safety intent explicit and match the access pattern used by flush_timer_id_; update its declaration (idle_timer_id_) and initialization to use std::atomic<long>{-1}, and ensure any reads/writes in schedule_idle_timer(), cancel_idle_timer(), and handle_idle_timeout() use the atomic (load/store or simple assignment) semantics; also add the required <atomic> include if not already present.TAO/tests/Transport_Idle_Timeout/client.cpp (1)
52-68: Add await_for_cache_size()helper next tosleep_with_reactor().The timeout checks below sample
transport_cache().current_size()only once after fixed sleeps, butTAO/tao/Transport_Cache_Manager_T.inl:175-177returns the raw map count and the idle purge runs asynchronously. A poll-until-deadline helper here would make TC-1/TC-3/TC-4 much less timing-sensitive and can be reused byTAO/tests/Transport_Idle_Timeout/Echo_i.cpp.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tests/Transport_Idle_Timeout/client.cpp` around lines 52 - 68, Add a new helper function wait_for_cache_size( CORBA::ORB_ptr orb, size_t expected_size, int timeout_seconds ) next to sleep_with_reactor() that repeatedly polls transport_cache().current_size() until it equals (or is <=/>= as appropriate for the test) expected_size or a deadline is reached; inside the loop call orb->perform_work(ACE_Time_Value(0,50000)) in the same 50ms slices used by sleep_with_reactor(), compute the deadline using ACE_OS::gettimeofday() + ACE_Time_Value(timeout_seconds), and return a boolean indicating success so tests TC-1/TC-3/TC-4 can call wait_for_cache_size(...) instead of sampling once; ensure the helper accounts for Transport_Cache_Manager_T.inl returning raw map counts and keep the helper signature and behavior reusable from TAO/tests/Transport_Idle_Timeout/Echo_i.cpp.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TAO/tao/default_resource.cpp`:
- Around line 486-493: The code increments curarg and then dereferences
argv[curarg] even when curarg >= argc, causing an OOB access and it silently
accepts invalid/negative timeouts via ACE_OS::atoi; change the parsing in the
-ORBTransportIdleTimeout branch (symbols: transport_idle_timeout_,
report_option_value_error, ACE_OS::atoi, argv, curarg) to first verify curarg+1
< argc (or check curarg after increment) before reading argv[curarg], then parse
the value with a numeric-aware routine (e.g. strtol/ACE_OS::strtol) and validate
that the conversion consumed the whole string and produced a non-negative value;
on any failure call report_option_value_error("-ORBTransportIdleTimeout",
argv[curarg]) (or the appropriate safe message when argv[curarg] is unavailable)
instead of assigning the bad value.
In `@TAO/tao/Transport_Idle_Timer.h`:
- Around line 4-5: Fix the typo in the documentation comment in
Transport_Idle_Timer.h: change "transport_idl_timeout" to
"transport_idle_timeout" in the comment describing the reactor timer (ensure the
corrected identifier is used in the comment near the Transport_Idle_Timer
class/struct declaration); if the misspelling appears elsewhere in comments or
docs in the same file, update those occurrences as well to match the correct
name.
---
Duplicate comments:
In `@TAO/tao/Transport.cpp`:
- Around line 988-1025: The idle timer removal is racy because
handle_idle_timeout() ignores the act parameter and unconditionally clears
idle_timer_id_, letting stale timeouts clobber newer timers; fix by introducing
a generation token (e.g. a member like idle_timer_generation_ of an atomic
integer type) that schedule_idle_timer() increments and embeds into the reactor
act, have cancel_idle_timer() increment the generation to invalidate outstanding
timers, and modify handle_idle_timeout(void const* act) to cast and compare the
incoming generation token against idle_timer_generation_ before clearing
idle_timer_id_ or rescheduling; only clear or act on the timer when the
generations match so stale dequeued timers are ignored.
In `@TAO/tests/Transport_Idle_Timeout/client_multiple.cpp`:
- Around line 105-126: Make the second server IOR mandatory and validate the
narrowed reference before use: update parse_args to treat -l as required by
checking ior2 != nullptr and returning an error if missing, and in the client
initialization where echo2 is obtained via narrowing (the echo2 narrow call),
add a nil-check after narrowing and fail fast with a clear ACE_ERROR_RETURN if
echo2 is null so ping()/other calls never receive a nil reference; apply the
same validation pattern to the other occurrence referenced around lines 289-301.
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 149-191: The failure branches after server1 has started only kill
the individual failing server (e.g., branches checking $server2_status,
WaitForFileTimed on $ior_file2, GetFile/PutFile for $ior_file2) leaving the
other server ($SV1/$server1) running; add a shared cleanup helper (e.g.,
cleanup_servers or stop_all_servers) that always stops and waits for both $SV1
and $SV2 (kill + TimedWait) and call this helper before every early return in
all error branches after server1 starts (refer to symbols $SV1, $SV2, $server1,
$server2, $client, $ior_file1, $ior_file2 and the existing
Spawn/WaitForFileTimed/GetFile/PutFile checks to locate where to insert the
calls).
---
Nitpick comments:
In `@TAO/tao/Resource_Factory.h`:
- Around line 257-258: Add an API comment above the pure virtual method
transport_idle_timeout() in Resource_Factory.h describing the contract: state
the units (e.g., seconds or milliseconds), what a return value of 0 or negative
means (e.g., disables timeout), whether the value is a hard or soft limit,
whether callers must convert units, and any thread-safety/ownership
expectations; update the declaration site comment so all implementers use the
same semantics when overriding transport_idle_timeout().
In `@TAO/tao/Transport_Cache_Manager_T.h`:
- Around line 130-132: Add a short contract comment above the declaration of
purge_entry_when_purgable(HASH_MAP_ENTRY *& entry) that documents its return
semantics: explicitly state what numeric values mean (e.g., success vs.
not-purgable vs. error), which error codes are returned (or that it returns -1
on failure), and whether callers must free/handle the entry on each case; align
wording with existing conventions used by other cache APIs in this file. This
makes callers like those invoking purge_entry_when_purgable and any code
inspecting HASH_MAP_ENTRY behavior unambiguous about handling return values and
ownership.
In `@TAO/tao/Transport.h`:
- Around line 1126-1133: Replace the plain long idle_timer_id_ with
std::atomic<long> to make the thread-safety intent explicit and match the access
pattern used by flush_timer_id_; update its declaration (idle_timer_id_) and
initialization to use std::atomic<long>{-1}, and ensure any reads/writes in
schedule_idle_timer(), cancel_idle_timer(), and handle_idle_timeout() use the
atomic (load/store or simple assignment) semantics; also add the required
<atomic> include if not already present.
In `@TAO/tests/Transport_Idle_Timeout/client.cpp`:
- Around line 52-68: Add a new helper function wait_for_cache_size(
CORBA::ORB_ptr orb, size_t expected_size, int timeout_seconds ) next to
sleep_with_reactor() that repeatedly polls transport_cache().current_size()
until it equals (or is <=/>= as appropriate for the test) expected_size or a
deadline is reached; inside the loop call
orb->perform_work(ACE_Time_Value(0,50000)) in the same 50ms slices used by
sleep_with_reactor(), compute the deadline using ACE_OS::gettimeofday() +
ACE_Time_Value(timeout_seconds), and return a boolean indicating success so
tests TC-1/TC-3/TC-4 can call wait_for_cache_size(...) instead of sampling once;
ensure the helper accounts for Transport_Cache_Manager_T.inl returning raw map
counts and keep the helper signature and behavior reusable from
TAO/tests/Transport_Idle_Timeout/Echo_i.cpp.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ea8eeb7d-18d4-47fb-acff-b9202ec3eb24
📒 Files selected for processing (28)
ACE/ace/Event_Handler_Handle_Timeout_Upcall.cppTAO/bin/tao_orb_tests.lstTAO/docs/Options.htmlTAO/tao/Cache_Entries_T.inlTAO/tao/Resource_Factory.hTAO/tao/Transport.cppTAO/tao/Transport.hTAO/tao/Transport_Cache_Manager_T.cppTAO/tao/Transport_Cache_Manager_T.hTAO/tao/Transport_Cache_Manager_T.inlTAO/tao/Transport_Connector.cppTAO/tao/Transport_Idle_Timer.cppTAO/tao/Transport_Idle_Timer.hTAO/tao/Transport_Timer.cppTAO/tao/Transport_Timer.hTAO/tao/default_resource.cppTAO/tao/default_resource.hTAO/tao/tao.mpcTAO/tests/Transport_Idle_Timeout/Echo_i.cppTAO/tests/Transport_Idle_Timeout/Echo_i.hTAO/tests/Transport_Idle_Timeout/Idle_Transport_Timeout.mpcTAO/tests/Transport_Idle_Timeout/client.cppTAO/tests/Transport_Idle_Timeout/client_multiple.cppTAO/tests/Transport_Idle_Timeout/run_test.plTAO/tests/Transport_Idle_Timeout/server.cppTAO/tests/Transport_Idle_Timeout/svc.confTAO/tests/Transport_Idle_Timeout/svc_disabled.confTAO/tests/Transport_Idle_Timeout/test.idl
* TAO/tao/Transport_Idle_Timer.h:
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
TAO/tao/Transport.cpp (2)
988-1026:⚠️ Potential issue | 🔴 CriticalSerialize or version
idle_timer_id_.
idle_timer_id_is still mutated fromhandle_idle_timeout(),schedule_idle_timer(), andcancel_idle_timer()without synchronization. A timeout that is already dequeued can still run aftercancel_idle_timer()clears the id and then re-arm itself against a closing transport.TAO/tao/Transport_Idle_Timer.cpp:16-28only protects an in-flight callback; it does not make stale reactor entries safe.Also applies to: 2944-2988
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Transport.cpp` around lines 988 - 1026, handle_idle_timeout() mutates idle_timer_id_ without synchronization and can race with schedule_idle_timer() and cancel_idle_timer(), allowing stale reactor callbacks to re-arm or close the wrong transport; fix by serializing access to idle_timer_id_ (and associated state) via either a mutex protecting reads/writes or a version/token scheme: add a small atomic/version field incremented by schedule_idle_timer() and cancel_idle_timer(), capture that version when scheduling the reactor callback, and in handle_idle_timeout() compare the captured version to the current version before clearing idle_timer_id_ or proceeding to reschedule/close; ensure all accesses to idle_timer_id_ and the new version/token are performed atomically or under the same lock so stale callbacks are no-ops.
2950-2963:⚠️ Potential issue | 🟠 MajorDon't treat
schedule_timer()failure as a successful arm.The null-reactor case is covered now, but
schedule_timer()returning-1still falls through to the normal debug path. That silently disables idle expiry and logs a bogus timer id.🛠️ Suggested hardening
ACE_Time_Value const tv (static_cast<time_t> (timeout_sec)); - this->idle_timer_id_= reactor->schedule_timer (std::addressof(this->transport_idle_timer_), nullptr, tv); + this->idle_timer_id_ = + reactor->schedule_timer (std::addressof (this->transport_idle_timer_), + nullptr, + tv); + if (this->idle_timer_id_ == -1) + { + if (TAO_debug_level > 0) + { + TAOLIB_ERROR ((LM_ERROR, + ACE_TEXT ("TAO (%P|%t) - Transport[%d]::schedule_idle_timer, ") + ACE_TEXT ("schedule_timer failed\n"), + this->id ())); + } + return; + } if (TAO_debug_level > 6) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Transport.cpp` around lines 2950 - 2963, schedule_timer() failure is currently treated as success; after calling reactor->schedule_timer(...) check the returned value stored in this->idle_timer_id_ for -1 and handle it as an error: log an error (using TAOLIB_DEBUG or process logger consistent with TAO_debug_level) indicating schedule_timer failed and do not proceed with the "scheduled idle timer" debug path or assume the timer is armed; ensure transport_idle_timer_ is not considered active when idle_timer_id_ == -1 and leave/assign idle_timer_id_ to a sentinel (e.g. -1) so later logic can detect the unarmed state (reference symbols: reactor, schedule_timer, idle_timer_id_, transport_idle_timer_, TAO_debug_level).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TAO/tao/Transport.cpp`:
- Around line 566-571: The idle timer is being armed too early (post_open()),
never cleared (update_transport()), and schedule_idle_timer() refuses to re-arm
if an older id is pending, so move timer start to the actual idle transition:
remove or disable immediate arming in post_open(), ensure update_transport()
clears/cancels any pending idle timer when activity occurs, and modify the
make_idle() success path (where
transport_cache_manager().make_idle(this->cache_map_entry_) returns 0) to reset
any pending timer id or cancel existing timers and then call
schedule_idle_timer() so the idle countdown always starts from the real
transition to idle (update code in functions post_open(), update_transport(),
schedule_idle_timer(), and the make_idle success branch that currently calls
schedule_idle_timer()).
---
Duplicate comments:
In `@TAO/tao/Transport.cpp`:
- Around line 988-1026: handle_idle_timeout() mutates idle_timer_id_ without
synchronization and can race with schedule_idle_timer() and cancel_idle_timer(),
allowing stale reactor callbacks to re-arm or close the wrong transport; fix by
serializing access to idle_timer_id_ (and associated state) via either a mutex
protecting reads/writes or a version/token scheme: add a small atomic/version
field incremented by schedule_idle_timer() and cancel_idle_timer(), capture that
version when scheduling the reactor callback, and in handle_idle_timeout()
compare the captured version to the current version before clearing
idle_timer_id_ or proceeding to reschedule/close; ensure all accesses to
idle_timer_id_ and the new version/token are performed atomically or under the
same lock so stale callbacks are no-ops.
- Around line 2950-2963: schedule_timer() failure is currently treated as
success; after calling reactor->schedule_timer(...) check the returned value
stored in this->idle_timer_id_ for -1 and handle it as an error: log an error
(using TAOLIB_DEBUG or process logger consistent with TAO_debug_level)
indicating schedule_timer failed and do not proceed with the "scheduled idle
timer" debug path or assume the timer is armed; ensure transport_idle_timer_ is
not considered active when idle_timer_id_ == -1 and leave/assign idle_timer_id_
to a sentinel (e.g. -1) so later logic can detect the unarmed state (reference
symbols: reactor, schedule_timer, idle_timer_id_, transport_idle_timer_,
TAO_debug_level).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: efe46e7c-1341-4604-b16b-6d5b41fcf2a9
📒 Files selected for processing (2)
TAO/tao/Transport.cppTAO/tao/Transport_Idle_Timer.h
✅ Files skipped from review due to trivial changes (1)
- TAO/tao/Transport_Idle_Timer.h
* TAO/tao/Transport.cpp:
* TAO/tao/Transport_Cache_Manager_T.cpp:
There was a problem hiding this comment.
♻️ Duplicate comments (1)
TAO/tao/Transport.cpp (1)
190-190:⚠️ Potential issue | 🔴 CriticalThe idle timer still has a stale-callback/lifetime race.
handle_idle_timeout()still clearsidle_timer_id_unconditionally and ignoresact, whileschedule_idle_timer(),cancel_idle_timer(),close_connection(), and the destructor all mutate the same state from different threads. An older timeout that has already been dequeued can still resetidle_timer_id_to-1after a newer timer is armed, which makes the replacement timer impossible to cancel.TAO/tao/Transport_Idle_Timer.cpp:11-28only takes a transport reference after dequeue, so that orphaned callback can later dispatch through the embeddedtransport_idle_timer_after the transport is destroyed. Please version the idle-timer state and gate both callback and cancel paths with that token under synchronization.Also applies to: 341-342, 989-1010, 2944-2994
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Transport.cpp` at line 190, The idle-timer has a race where stale callbacks can reset idle_timer_id_ and outlive the Transport; update the implementation by introducing a monotonically increasing generation/token (e.g., idle_timer_gen_) protected by the same mutex used around schedule_idle_timer(), cancel_idle_timer(), close_connection(), and the destructor, incrementing the token whenever a new timer is armed or when cancelling; store the current token with each scheduled timer and make handle_idle_timeout() check the dequeued timer's token against idle_timer_gen_ under the mutex before mutating idle_timer_id_ or touching transport_idle_timer_, and likewise have cancel_idle_timer() validate the token before clearing state so callbacks and cancels are gated by the same generation to prevent stale callbacks from clearing/reusing newer timers or accessing destroyed Transport.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@TAO/tao/Transport.cpp`:
- Line 190: The idle-timer has a race where stale callbacks can reset
idle_timer_id_ and outlive the Transport; update the implementation by
introducing a monotonically increasing generation/token (e.g., idle_timer_gen_)
protected by the same mutex used around schedule_idle_timer(),
cancel_idle_timer(), close_connection(), and the destructor, incrementing the
token whenever a new timer is armed or when cancelling; store the current token
with each scheduled timer and make handle_idle_timeout() check the dequeued
timer's token against idle_timer_gen_ under the mutex before mutating
idle_timer_id_ or touching transport_idle_timer_, and likewise have
cancel_idle_timer() validate the token before clearing state so callbacks and
cancels are gated by the same generation to prevent stale callbacks from
clearing/reusing newer timers or accessing destroyed Transport.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a41e376a-c6ca-4611-bcfa-53db29d74456
📒 Files selected for processing (2)
TAO/tao/Transport.cppTAO/tao/Transport_Cache_Manager_T.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- TAO/tao/Transport_Cache_Manager_T.cpp
* ACE/ace/WFMO_Reactor.cpp:
* TAO/tao/Resource_Factory.h:
* TAO/tao/Transport_Cache_Manager_T.inl:
* TAO/NEWS:
|
@Mergifyio backport ace6tao2 |
❌ No backport have been createdDetails
Git reported the following error: |
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/Idle_Transport_Timeout.mpc:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/server.cpp:
* TAO/tests/Transport_Idle_Timeout/svc.conf:
* TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
* TAO/tests/Transport_Idle_Timeout/test.idl:
Added.
* ACE/ace/Event_Handler_Handle_Timeout_Upcall.cpp:
* ACE/ace/WFMO_Reactor.cpp:
* TAO/NEWS:
* TAO/bin/tao_orb_tests.lst:
* TAO/docs/Options.html:
* TAO/tao/Cache_Entries_T.inl:
* TAO/tao/Resource_Factory.h:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport.h:
* TAO/tao/Transport_Cache_Manager_T.cpp:
* TAO/tao/Transport_Cache_Manager_T.h:
* TAO/tao/Transport_Cache_Manager_T.inl:
* TAO/tao/Transport_Connector.cpp:
* TAO/tao/Transport_Timer.cpp:
* TAO/tao/Transport_Timer.h:
* TAO/tao/default_resource.cpp:
* TAO/tao/default_resource.h:
* TAO/tao/tao.mpc:
[ace6tao2] Backport #2523, add `-ORBTransportIdleTimeout` to purge idle transports
Add new
-ORBTransportIdleTimeoutto purge idl connections after the specified number of secondsSummary by CodeRabbit
New Features
Documentation
Tests
Behavior