Skip to content

Add -ORBTransportIdleTimeout to purge idle transports#2523

Merged
jwillemsen merged 50 commits intoDOCGroup:masterfrom
jwillemsen:jwi-idletransport
Apr 8, 2026
Merged

Add -ORBTransportIdleTimeout to purge idle transports#2523
jwillemsen merged 50 commits intoDOCGroup:masterfrom
jwillemsen:jwi-idletransport

Conversation

@jwillemsen
Copy link
Copy Markdown
Member

@jwillemsen jwillemsen commented Apr 3, 2026

Add new -ORBTransportIdleTimeout to purge idl connections after the specified number of seconds

Summary by CodeRabbit

  • New Features

    • Configurable transport idle timeout (-ORBTransportIdleTimeout) to automatically close and purge idle connections after the specified seconds (0 disables).
  • Documentation

    • Configuration reference updated to document the new idle-timeout option and its behavior.
  • Tests

    • New regression test suite for idle-timeout scenarios added and integrated into automated test runs.
  • Behavior

    • Improved idle-timer handling to more reliably schedule, cancel, and act on idle transport timeouts.

    * TAO/tao/Transport_Idle_Timer.cpp:
    * TAO/tao/Transport_Idle_Timer.h:
      Added.

    * TAO/tao/Cache_Entries_T.inl:
    * TAO/tao/Transport.cpp:
    * TAO/tao/Transport.h:
    * TAO/tao/Transport_Cache_Manager_T.h:
    * TAO/tao/Transport_Cache_Manager_T.inl:
    * TAO/tao/tao.mpc:
    * TAO/tao/Transport.h:
    * TAO/tao/Transport_Idle_Timer.h:
    * TAO/tao/Transport_Timer.cpp:
    * TAO/tao/Transport_Timer.h:
    * TAO/tao/Transport_Cache_Manager_T.inl:
    * TAO/tao/default_resource.cpp:
    * TAO/tao/default_resource.h:
    * TAO/tao/Resource_Factory.h:
    * TAO/tao/Transport.cpp:
    * TAO/tao/Transport.h:
    * TAO/tao/Transport.cpp:
    * TAO/tao/Transport.h:
    * TAO/tao/Transport_Idle_Timer.cpp:
    * TAO/tao/Transport_Idle_Timer.h:
    * TAO/tao/Transport.cpp:
    * TAO/tao/Transport.h:
    * TAO/tao/Transport_Cache_Manager_T.cpp:
    * TAO/tao/Transport_Connector.cpp:
    * TAO/tao/Transport_Idle_Timer.h:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
    * TAO/tests/Transport_Idle_Timeout/client.cpp:
    * TAO/tests/Transport_Idle_Timeout/run_test.pl:
    * TAO/tests/Transport_Idle_Timeout/server.cpp:
    * TAO/tests/Transport_Idle_Timeout/svc.conf:
    * TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
    * TAO/tao/Transport.cpp:
    * TAO/bin/tao_orb_tests.lst:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.h:
    * TAO/tests/Transport_Idle_Timeout/client.cpp:
    * TAO/tests/Transport_Idle_Timeout/test.idl:
    * TAO/tao/Transport.cpp:
    * TAO/tao/Transport_Idle_Timer.cpp:
    * TAO/tests/Transport_Idle_Timeout/run_test.pl:
    * TAO/docs/Options.html:
    * TAO/tao/Transport_Idle_Timer.h:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.h:
    * TAO/tests/Transport_Idle_Timeout/svc.conf:
    * TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
    * TAO/tests/Transport_Idle_Timeout/test.idl:
    * TAO/tao/Transport_Idle_Timer.h:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.h:
    * TAO/tests/Transport_Idle_Timeout/client.cpp:
    * TAO/tests/Transport_Idle_Timeout/test.idl:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.h:
    * TAO/tests/Transport_Idle_Timeout/client.cpp:
    * TAO/tests/Transport_Idle_Timeout/test.idl:
    * TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
      Deleted.

    * TAO/tests/Transport_Idle_Timeout/run_test.pl:
    * TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
      Added.

    * TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.h:
    * TAO/tests/Transport_Idle_Timeout/client.cpp:
    * TAO/tests/Transport_Idle_Timeout/run_test.pl:
    * TAO/tests/Transport_Idle_Timeout/svc.conf:
    * TAO/tests/Transport_Idle_Timeout/test.idl:
…d multiple gets too complicated for a unit test

    * TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
      Added.

    * TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
    * TAO/tests/Transport_Idle_Timeout/Idle_Transport_Timeout.mpc:
    * TAO/tests/Transport_Idle_Timeout/client.cpp:
    * TAO/tests/Transport_Idle_Timeout/run_test.pl:
    * TAO/tests/Transport_Idle_Timeout/svc.conf:
    * TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.h:
    * TAO/tests/Transport_Idle_Timeout/client.cpp:
    * TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
    * TAO/tests/Transport_Idle_Timeout/run_test.pl:
    * TAO/tests/Transport_Idle_Timeout/test.idl:
    * TAO/tao/Transport_Cache_Manager_T.cpp:
    * TAO/tao/Transport.cpp:
    * TAO/tao/Transport_Cache_Manager_T.cpp:
    * TAO/tao/Transport_Cache_Manager_T.h:
    * TAO/tao/Transport.cpp:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
    * TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
    * TAO/tests/Transport_Idle_Timeout/run_test.pl:
    * TAO/tao/Transport.cpp:
    * TAO/tao/Transport_Cache_Manager_T.cpp:
    * TAO/tao/Transport_Cache_Manager_T.h:
    * TAO/tao/Transport_Cache_Manager_T.inl:
    * TAO/tao/Transport_Connector.cpp:
@jwillemsen jwillemsen linked an issue Apr 3, 2026 that may be closed by this pull request
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a transport idle-timeout feature: new Transport_Idle_Timer type, timer wiring in TAO_Transport (schedule/cancel/handler), cache-manager conditional purge path, Resource_Factory API + option parsing, build/test registrations, and a regression test suite covering enabled, disabled, and multi-server scenarios.

Changes

Cohort / File(s) Summary
New idle timer type
TAO/tao/Transport_Idle_Timer.h, TAO/tao/Transport_Idle_Timer.cpp
Introduce TAO::Transport_Idle_Timer (ACE_Event_Handler) that holds a TAO_Transport*, manages transport refcount around timeout dispatch, and delegates to TAO_Transport::handle_idle_timeout.
Transport integration (heavy)
TAO/tao/Transport.h, TAO/tao/Transport.cpp
Add idle-timer state (idle_timer_id_, transport_idle_timer_), public handlers handle_idle_timeout/cancel_idle_timer, private scheduling helpers, wire timer lifecycle into close/post_open/make_idle, reschedule/purge logic on timer expiry, and additional debug logging.
Cache manager purge & null modernizations
TAO/tao/Transport_Cache_Manager_T.h, TAO/tao/Transport_Cache_Manager_T.inl, TAO/tao/Transport_Cache_Manager_T.cpp
Add purge_entry_when_purgable() API/impl to purge only if entry is purgable; call entry->item().transport()->cancel_idle_timer() when taking an idle entry; replace literal 0 with nullptr and minor formatting/locking refinements.
Cache entries minor fix
TAO/tao/Cache_Entries_T.inl
Initialize/assign transport_ with nullptr instead of 0 in constructors and relinquish_transport(); trivial whitespace cleanup.
Resource factory & defaults
TAO/tao/Resource_Factory.h, TAO/tao/default_resource.h, TAO/tao/default_resource.cpp, TAO/docs/Options.html
Add pure-virtual transport_idle_timeout() to TAO_Resource_Factory; default factory parses/stores -ORBTransportIdleTimeout and exposes accessor; docs updated to document the new option.
Existing transport timer namespace change
TAO/tao/Transport_Timer.h, TAO/tao/Transport_Timer.cpp
Move existing transport timer type into namespace TAO and rename to TAO::Transport_Timer (qualification/name changes only).
Build & test registration
TAO/tao/tao.mpc, TAO/bin/tao_orb_tests.lst
Add Transport_Idle_Timer.cpp/.h to TAO build and register TAO/tests/Transport_Idle_Timeout/run_test.pl in test list.
New regression test suite
TAO/tests/Transport_Idle_Timeout/test.idl, TAO/tests/Transport_Idle_Timeout/Echo_i.h, TAO/tests/Transport_Idle_Timeout/Echo_i.cpp, TAO/tests/Transport_Idle_Timeout/server.cpp, TAO/tests/Transport_Idle_Timeout/client.cpp, TAO/tests/Transport_Idle_Timeout/client_multiple.cpp, TAO/tests/Transport_Idle_Timeout/run_test.pl, TAO/tests/Transport_Idle_Timeout/Idle_Transport_Timeout.mpc, TAO/tests/Transport_Idle_Timeout/svc.conf, TAO/tests/Transport_Idle_Timeout/svc_disabled.conf
Add IDL/servant, server/clients, MPC project, svc.conf variants, and Perl driver exercising idle-timeout behavior (enabled, disabled, and multi-server scenarios).
Minor formatting/locals
TAO/tao/Transport_Connector.cpp, ACE/ace/Event_Handler_Handle_Timeout_Upcall.cpp
Small formatting and local-type changes (0nullptr, intbool, braces/whitespace adjustments).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/App
    participant Cache as Transport Cache Manager
    participant Trans as TAO_Transport
    participant Timer as Transport Idle Timer
    participant Reactor as ORB Reactor

    Client->>Cache: find_transport()
    activate Cache
    Cache->>Trans: entry->item().transport()->cancel_idle_timer()
    activate Trans
    Trans->>Reactor: cancel scheduled idle timer
    deactivate Trans
    Cache-->>Client: return transport
    deactivate Cache

    Client->>Trans: use transport, then make_idle()
    activate Trans
    Trans->>Cache: notify entry idle (mark ENTRY_IDLE_AND_PURGABLE)
    Cache-->>Trans: confirmed idle/purgable
    Trans->>Reactor: schedule idle timer (timeout seconds)
    deactivate Trans

    Note over Reactor: time passes

    Reactor->>Timer: handle_timeout(current_time)
    activate Timer
    Timer->>Trans: add_reference()
    Timer->>Trans: Trans.handle_idle_timeout(current_time)
    activate Trans
    Trans->>Cache: purge_entry_when_purgable(entry)
    Cache-->>Trans: purged? (yes/no)
    alt purged
        Trans->>Trans: close_connection()
    else not purged
        Trans->>Reactor: schedule idle timer (reschedule)
    end
    Trans->>Timer: remove_reference()
    deactivate Trans
    deactivate Timer
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • Use null, override, default #2473 — overlaps transport/cache/timer changes (e.g., Transport_Cache_Manager_T., Transport_Timer., default_resource.cpp); strong code-level relation.

Poem

🐰 A hop, a tick, a quiet chime,
Timers hush the roads of time,
Caches shrink where silence grew,
Tests arrive to prove what's true,
I nibble code and hum a rhyme.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main feature being added: a new ORB option -ORBTransportIdleTimeout for purging idle transports after a specified timeout period, which directly aligns with the core changes across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jwillemsen jwillemsen changed the title Add ORBTransportIdleTimeout to purge idle transports Add -ORBTransportIdleTimeout to purge idle transports Apr 7, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
TAO/tao/Transport.cpp (1)

988-1026: ⚠️ Potential issue | 🔴 Critical

Stale idle callbacks can still clobber a newer timer.

TAO/tao/Transport_Cache_Manager_T.cpp:273-371 cancels the timer on cache reuse, but schedule_idle_timer() still passes nullptr as ACT and handle_idle_timeout() still ignores act. Combined with the unsynchronized reads/writes of idle_timer_id_, an already-dequeued old callback can clear the ID for a newer timer and then reschedule or close the transport while it is busy. This still needs versioned timer state before idle_timer_id_ is mutated.

Verify that idle_timer_id_ is written from multiple paths and that there is no generation/version check guarding stale callbacks. Expected result: matches in schedule_idle_timer(), cancel_idle_timer(), handle_idle_timeout(), and the cache-reuse path, but no idle_timer_gen_-style state.

#!/bin/bash
set -euo pipefail

rg -n -C3 '\bidle_timer_id_\b|\bidle_timer_gen_\b|handle_idle_timeout\s*\(|schedule_idle_timer\s*\(|cancel_idle_timer\s*\(' \
  TAO/tao/Transport.cpp \
  TAO/tao/Transport.h \
  TAO/tao/Transport_Cache_Manager_T.cpp \
  TAO/tao/Transport_Idle_Timer.cpp

Also applies to: 2941-2983

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@TAO/tao/Transport.cpp`:
- Around line 566-571: post_open() is exposing the cache entry as
ENTRY_IDLE_AND_PURGABLE but the idle timer is being armed here instead of via
make_idle(), which can leave the initial idle window untracked; update the code
so the transition is routed through
transport_cache_manager().make_idle(cache_map_entry_) (or modify make_idle()) to
arm the idle timer internally rather than calling schedule_idle_timer() locally
— ensure make_idle() is responsible for calling schedule_idle_timer() when it
returns success (result == 0) for ENTRY_IDLE_AND_PURGABLE so the initial idle
transition uses the same -ORBTransportIdleTimeout path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 43cb5897-8084-4754-82d2-fd253ef62bd2

📥 Commits

Reviewing files that changed from the base of the PR and between 7c944d4 and 1f9f833.

📒 Files selected for processing (5)
  • TAO/docs/Options.html
  • TAO/tao/Transport.cpp
  • TAO/tao/Transport.h
  • TAO/tao/Transport_Cache_Manager_T.h
  • TAO/tao/Transport_Cache_Manager_T.inl
✅ Files skipped from review due to trivial changes (1)
  • TAO/docs/Options.html
🚧 Files skipped from review as they are similar to previous changes (2)
  • TAO/tao/Transport_Cache_Manager_T.inl
  • TAO/tao/Transport.h

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
TAO/tao/Transport.cpp (2)

2876-2877: ⚠️ Potential issue | 🟠 Major

Initial idle transition still bypasses make_idle().

Line 2877 sets ENTRY_IDLE_AND_PURGABLE directly, but timer arming now lives in make_idle() (Line 566-Line 571). This still leaves the first idle window untracked for never-reused transports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tao/Transport.cpp` around lines 2876 - 2877, The transport entry is being
moved to ENTRY_IDLE_AND_PURGABLE by directly calling
transport_cache_manager().set_entry_state(this->cache_map_entry_,
TAO::ENTRY_IDLE_AND_PURGABLE) which bypasses make_idle() (which arms the idle
timer); instead use the existing make_idle() logic to perform the transition so
the timer is armed for first-time idle entries — i.e., replace the direct
set_entry_state call (or call make_idle() immediately when creating/marking the
entry) so make_idle() handles the state change and timer arming for
cache_map_entry_ via transport_cache_manager()/make_idle().

989-1001: ⚠️ Potential issue | 🔴 Critical

Idle-timer state is still racy; stale callbacks can corrupt timer lifecycle.

Line 989 ignores act, and Lines 1000, 2949, and 2979 read/write idle_timer_id_ across different threads without serialization. A stale timeout/cancel can clear the id for a newer timer, leaving orphaned reactor timers and unsafe callback lifetimes.

Suggested direction
- int
- TAO_Transport::handle_idle_timeout (const ACE_Time_Value &, const void */*act*/)
+ int
+ TAO_Transport::handle_idle_timeout (const ACE_Time_Value &, const void *act)
{
+  // decode generation from act and ignore stale callbacks
+  // guard idle timer state updates under handler_lock_ (or equivalent)
}

void
TAO_Transport::schedule_idle_timer ()
{
+  // serialize with handler_lock_
+  // increment generation
+  // optionally cancel previous timer id before scheduling new one
+  // pass generation token via act
}

void
TAO_Transport::cancel_idle_timer ()
{
+  // serialize with handler_lock_
+  // clear/cancel only current generation
}

Also applies to: 2941-2980

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tao/Transport.cpp` around lines 989 - 1001, The idle-timer handling is
racy: TAO_Transport::handle_idle_timeout ignores the act and unsafely
reads/writes idle_timer_id_ across threads; change the logic to validate the
callback token before clearing or updating idle_timer_id_ (e.g., compare the
incoming act/token to the transport's current timer identifier or a per-timer
opaque token), and protect access to idle_timer_id_ with a synchronization
mechanism (either use an atomic compare-exchange on idle_timer_id_ or a mutex
used by all timer start/cancel paths such as the methods that set/clear
idle_timer_id_ and in handle_idle_timeout); only clear/cancel the reactor timer
when the token matches to prevent a stale callback from clearing a newer timer
and leaking/reactor-callback lifetime corruption.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@TAO/tao/Transport.cpp`:
- Around line 2947-2950: The schedule_idle_timer() call currently dereferences
ACE_Reactor* reactor obtained from this->orb_core_->reactor() and blindly
assigns reactor->schedule_timer(...) to this->idle_timer_id_, which can crash if
reactor is null or hide failures when schedule_timer returns -1; update the
method to (1) check reactor for nullptr before calling schedule_timer and skip
scheduling if null, (2) call reactor->schedule_timer(...) and check the return
value for -1, (3) on failure log an appropriate message and set
this->idle_timer_id_ to an invalid sentinel (e.g. -1) or leave it unchanged, and
(4) ensure you still use this->transport_idle_timer_ as the handler; reference
symbols: schedule_idle_timer (method), reactor (ACE_Reactor*),
this->orb_core_->reactor(), reactor->schedule_timer(), this->idle_timer_id_,
this->transport_idle_timer_.

---

Duplicate comments:
In `@TAO/tao/Transport.cpp`:
- Around line 2876-2877: The transport entry is being moved to
ENTRY_IDLE_AND_PURGABLE by directly calling
transport_cache_manager().set_entry_state(this->cache_map_entry_,
TAO::ENTRY_IDLE_AND_PURGABLE) which bypasses make_idle() (which arms the idle
timer); instead use the existing make_idle() logic to perform the transition so
the timer is armed for first-time idle entries — i.e., replace the direct
set_entry_state call (or call make_idle() immediately when creating/marking the
entry) so make_idle() handles the state change and timer arming for
cache_map_entry_ via transport_cache_manager()/make_idle().
- Around line 989-1001: The idle-timer handling is racy:
TAO_Transport::handle_idle_timeout ignores the act and unsafely reads/writes
idle_timer_id_ across threads; change the logic to validate the callback token
before clearing or updating idle_timer_id_ (e.g., compare the incoming act/token
to the transport's current timer identifier or a per-timer opaque token), and
protect access to idle_timer_id_ with a synchronization mechanism (either use an
atomic compare-exchange on idle_timer_id_ or a mutex used by all timer
start/cancel paths such as the methods that set/clear idle_timer_id_ and in
handle_idle_timeout); only clear/cancel the reactor timer when the token matches
to prevent a stale callback from clearing a newer timer and
leaking/reactor-callback lifetime corruption.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 43cb5897-8084-4754-82d2-fd253ef62bd2

📥 Commits

Reviewing files that changed from the base of the PR and between 7c944d4 and 1f9f833.

📒 Files selected for processing (5)
  • TAO/docs/Options.html
  • TAO/tao/Transport.cpp
  • TAO/tao/Transport.h
  • TAO/tao/Transport_Cache_Manager_T.h
  • TAO/tao/Transport_Cache_Manager_T.inl
✅ Files skipped from review due to trivial changes (1)
  • TAO/docs/Options.html
🚧 Files skipped from review as they are similar to previous changes (2)
  • TAO/tao/Transport_Cache_Manager_T.h
  • TAO/tao/Transport.h

Comment thread TAO/tao/Transport.cpp Outdated
    * TAO/tao/Transport.cpp:
    * TAO/tao/Transport_Cache_Manager_T.cpp:
    * TAO/tao/Transport_Cache_Manager_T.inl:
    * TAO/tao/Transport.cpp:
    * ACE/ace/Event_Handler_Handle_Timeout_Upcall.cpp:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.h:
    * TAO/tests/Transport_Idle_Timeout/client.cpp:
    * TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
    * TAO/tests/Transport_Idle_Timeout/run_test.pl:
    * TAO/tests/Transport_Idle_Timeout/test.idl:
@jwillemsen
Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
TAO/tests/Transport_Idle_Timeout/run_test.pl (1)

149-191: ⚠️ Potential issue | 🟠 Major

Stop both servers on every multi-server setup failure.

Once server1 is running, these early-return branches still only kill the process that directly failed. The sibling server can stay alive and contaminate the next scenario.

Suggested cleanup pattern
 sub run_multiple_scenario {
     my ($label, $svc_conf, $extra_client_args) = `@_`;
+    my ($sv1_started, $sv2_started) = (0, 0);

     print "\n### $label ###\n";
@@
     my $SV1 = $server1->CreateProcess (
         "server",
         "-ORBdebuglevel $debug_level -ORBVerboseLogging 1 -ORBSvcConf $svc_conf -o $server1_ior"
     );
     my $SV2 = $server2->CreateProcess (
         "server",
         "-ORBdebuglevel $debug_level -ORBVerboseLogging 1 -ORBSvcConf $svc_conf -o $server2_ior"
     );
+    my $cleanup_servers = sub {
+        $SV1->Kill (); $SV1->TimedWait (1) if $sv1_started;
+        $SV2->Kill (); $SV2->TimedWait (1) if $sv2_started;
+    };
@@
     if ($server1_status != 0) {
         print STDERR "ERROR: server1 Spawn returned $server1_status\n";
         return 1;
     }
+    $sv1_started = 1;
@@
     if ($server2_status != 0) {
         print STDERR "ERROR: server2 Spawn returned $server2_status\n";
+        $cleanup_servers->();
         return 1;
     }
+    $sv2_started = 1;
@@
-        $SV1->Kill (); $SV1->TimedWait (1);
+        $cleanup_servers->();
         return 1;

Apply the same $cleanup_servers->() replacement to the later WaitForFileTimed, GetFile, and PutFile failure branches too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 149 - 191, The
failure branches after starting servers only kill the directly-failed server
(e.g., the WaitForFileTimed, GetFile, PutFile error handlers referencing
$SV1/$SV2), leaving the sibling server running; update those branches to call
the shared cleanup routine (invoke $cleanup_servers->() or the existing cleanup
function used elsewhere) instead of directly calling $SV1->Kill/$SV2->Kill and
TimedWait so both servers are stopped on any multi-server setup failure (apply
this to the WaitForFileTimed, GetFile, and PutFile failure blocks that reference
$server1/$server2, $SV1/$SV2).
🧹 Nitpick comments (1)
TAO/tests/Transport_Idle_Timeout/run_test.pl (1)

91-106: Use a per-scenario status instead of the global $status.

After the first failure, both helpers return non-zero on every later happy path because they mutate and return the script-wide flag. The final exit code is still right, but the helper result no longer describes the scenario that just ran.

Minimal refactor
 sub run_scenario {
     my ($label, $svc_conf, $extra_client_args) = `@_`;
+    my $scenario_status = 0;
@@
     if ($client_status != 0) {
         print STDERR "ERROR: client returned $client_status\n";
-        $status = 1;
+        $scenario_status = 1;
     }
@@
     if ($server_exit != 0) {
         print STDERR "ERROR: server returned $server_exit\n";
-        $status = 1;
+        $scenario_status = 1;
     }
@@
-    return $status;
+    return $scenario_status;
 }

Mirror the same change in run_multiple_scenario.

Also applies to: 200-223

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 91 - 106, The
helper currently mutates and returns the script-global $status, so after one
failure subsequent scenario helper calls always return non-zero; change each
helper (the block that checks $client_status and $server_exit in run_test.pl,
and the analogous code in run_multiple_scenario) to use a local per-scenario
variable (e.g., my $scenario_status = 0) instead of the global $status: set
$scenario_status = 1 when $client_status != 0 or $server_exit != 0, delete files
as before, and return $scenario_status; repeat the same refactor for the other
occurrence (the code region around lines 200-223 / run_multiple_scenario) so
each scenario result reflects only that scenario.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@TAO/tests/Transport_Idle_Timeout/client_multiple.cpp`:
- Around line 105-126: The code currently treats the second server IOR (ior2) as
optional; update parse_args to require and validate -l by adding a nil/empty
check for ior2 and return an error if missing, and also add a runtime check
after converting the IOR to an object/narrowing (the echo2 variable) to ensure
the narrowed servant is not nil and fail-fast with a clear error message; modify
the parse_args function to enforce ior2 presence and update the place where
echo2 is obtained to validate the result and exit on failure.

---

Duplicate comments:
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 149-191: The failure branches after starting servers only kill the
directly-failed server (e.g., the WaitForFileTimed, GetFile, PutFile error
handlers referencing $SV1/$SV2), leaving the sibling server running; update
those branches to call the shared cleanup routine (invoke $cleanup_servers->()
or the existing cleanup function used elsewhere) instead of directly calling
$SV1->Kill/$SV2->Kill and TimedWait so both servers are stopped on any
multi-server setup failure (apply this to the WaitForFileTimed, GetFile, and
PutFile failure blocks that reference $server1/$server2, $SV1/$SV2).

---

Nitpick comments:
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 91-106: The helper currently mutates and returns the script-global
$status, so after one failure subsequent scenario helper calls always return
non-zero; change each helper (the block that checks $client_status and
$server_exit in run_test.pl, and the analogous code in run_multiple_scenario) to
use a local per-scenario variable (e.g., my $scenario_status = 0) instead of the
global $status: set $scenario_status = 1 when $client_status != 0 or
$server_exit != 0, delete files as before, and return $scenario_status; repeat
the same refactor for the other occurrence (the code region around lines 200-223
/ run_multiple_scenario) so each scenario result reflects only that scenario.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f17d0686-b922-4458-a73b-7218c2d5ca41

📥 Commits

Reviewing files that changed from the base of the PR and between 6c7ef47 and 52835b6.

📒 Files selected for processing (7)
  • ACE/ace/Event_Handler_Handle_Timeout_Upcall.cpp
  • TAO/tests/Transport_Idle_Timeout/Echo_i.cpp
  • TAO/tests/Transport_Idle_Timeout/Echo_i.h
  • TAO/tests/Transport_Idle_Timeout/client.cpp
  • TAO/tests/Transport_Idle_Timeout/client_multiple.cpp
  • TAO/tests/Transport_Idle_Timeout/run_test.pl
  • TAO/tests/Transport_Idle_Timeout/test.idl
✅ Files skipped from review due to trivial changes (2)
  • ACE/ace/Event_Handler_Handle_Timeout_Upcall.cpp
  • TAO/tests/Transport_Idle_Timeout/Echo_i.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • TAO/tests/Transport_Idle_Timeout/test.idl
  • TAO/tests/Transport_Idle_Timeout/client.cpp

Comment thread TAO/tests/Transport_Idle_Timeout/client_multiple.cpp
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
TAO/tests/Transport_Idle_Timeout/client_multiple.cpp (1)

105-126: ⚠️ Potential issue | 🟠 Major

Require and validate the second server IOR.

client_multiple passes echo2 into every scenario, but -l is still treated as optional and only echo is nil-checked. If the second IOR is missing or echo2 narrows to nil, the test may fail in non-obvious ways when ping() is called with a nil reference.

🔧 Fail fast on a missing or invalid second endpoint
 static int
 parse_args (int argc, ACE_TCHAR *argv[])
 {
   ACE_Get_Opt get_opts (argc, argv, ACE_TEXT ("k:l:t:n:d"));
   // ...
       default:
         ACE_ERROR_RETURN ((LM_ERROR,
-            ACE_TEXT ("Usage: client -k <ior> [-l <ior>] [-t <sec>] [-n <N>] [-d]\n")),
+            ACE_TEXT ("Usage: client_multiple -k <ior1> -l <ior2> [-t <sec>] [-n <N>] [-d]\n")),
             -1);
       }
-  if (ior == nullptr)
+  if (ior == nullptr || ior2 == nullptr)
     ACE_ERROR_RETURN ((LM_ERROR,
-        ACE_TEXT ("client: -k <IOR> is required\n")), -1);
+        ACE_TEXT ("client_multiple: -k <IOR1> and -l <IOR2> are required\n")), -1);
   return 0;
 }

And add nil check after narrowing:

       Test::Echo_var echo = Test::Echo::_narrow (obj.in ());
-      Test::Echo_var echo2 = Test::Echo::_narrow (obj2.in ());;
+      Test::Echo_var echo2 = Test::Echo::_narrow (obj2.in ());

       if (CORBA::is_nil (echo.in ()))
         ACE_ERROR_RETURN ((LM_ERROR,
                            ACE_TEXT ("Narrow to Test::Echo failed\n")), 1);
+      if (CORBA::is_nil (echo2.in ()))
+        ACE_ERROR_RETURN ((LM_ERROR,
+                           ACE_TEXT ("Narrow to second Test::Echo failed\n")), 1);

Also applies to: 289-301

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tests/Transport_Idle_Timeout/client_multiple.cpp` around lines 105 - 126,
Make the second server IOR mandatory and validate the narrowed reference before
use: update parse_args to treat -l as required by checking ior2 != nullptr and
returning an error if missing, and in the client initialization where echo2 is
obtained via narrowing (the echo2 narrow call), add a nil-check after narrowing
and fail fast with a clear ACE_ERROR_RETURN if echo2 is null so ping()/other
calls never receive a nil reference; apply the same validation pattern to the
other occurrence referenced around lines 289-301.
TAO/tests/Transport_Idle_Timeout/run_test.pl (1)

149-191: ⚠️ Potential issue | 🟠 Major

Stop both servers on multi-server setup failures.

Once server1 has started, the server2 spawn / IOR wait / file-transfer failure branches only kill the process that directly failed. The other server can stay alive and contaminate the next scenario. Please funnel those early returns through a shared cleanup helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 149 - 191, The
failure branches after server1 has started only kill the individual failing
server (e.g., branches checking $server2_status, WaitForFileTimed on $ior_file2,
GetFile/PutFile for $ior_file2) leaving the other server ($SV1/$server1)
running; add a shared cleanup helper (e.g., cleanup_servers or stop_all_servers)
that always stops and waits for both $SV1 and $SV2 (kill + TimedWait) and call
this helper before every early return in all error branches after server1 starts
(refer to symbols $SV1, $SV2, $server1, $server2, $client, $ior_file1,
$ior_file2 and the existing Spawn/WaitForFileTimed/GetFile/PutFile checks to
locate where to insert the calls).
TAO/tao/Transport.cpp (1)

988-1025: ⚠️ Potential issue | 🔴 Critical

Version the idle timer state; idle_timer_id_ alone is still racy.

handle_idle_timeout() still ignores act and unconditionally clears idle_timer_id_, while schedule_idle_timer() and cancel_idle_timer() mutate the same field from other paths. A dequeued stale timeout can still clobber a newer timer or reschedule after reuse, leaving an orphaned reactor entry behind this embedded handler. Please serialize this state and gate callbacks/cancels with a generation token.

Also applies to: 2944-2987

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tao/Transport.cpp` around lines 988 - 1025, The idle timer removal is
racy because handle_idle_timeout() ignores the act parameter and unconditionally
clears idle_timer_id_, letting stale timeouts clobber newer timers; fix by
introducing a generation token (e.g. a member like idle_timer_generation_ of an
atomic integer type) that schedule_idle_timer() increments and embeds into the
reactor act, have cancel_idle_timer() increment the generation to invalidate
outstanding timers, and modify handle_idle_timeout(void const* act) to cast and
compare the incoming generation token against idle_timer_generation_ before
clearing idle_timer_id_ or rescheduling; only clear or act on the timer when the
generations match so stale dequeued timers are ignored.
🧹 Nitpick comments (4)
TAO/tao/Transport_Cache_Manager_T.h (1)

130-132: Consider documenting return semantics for the new purge API.

A short contract note here will make callers safer and reduce ambiguity.

Suggested doc-only update
-    /// Purge the entry from the Cache Map only when the entry is purgable
+    /// Purge from the cache only when the entry is currently purgable.
+    /// Returns the underlying purge result when purged; otherwise returns
+    /// a non-success code for no-op/not-purgable paths.
     int purge_entry_when_purgable (HASH_MAP_ENTRY *& entry);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tao/Transport_Cache_Manager_T.h` around lines 130 - 132, Add a short
contract comment above the declaration of
purge_entry_when_purgable(HASH_MAP_ENTRY *& entry) that documents its return
semantics: explicitly state what numeric values mean (e.g., success vs.
not-purgable vs. error), which error codes are returned (or that it returns -1
on failure), and whether callers must free/handle the entry on each case; align
wording with existing conventions used by other cache APIs in this file. This
makes callers like those invoking purge_entry_when_purgable and any code
inspecting HASH_MAP_ENTRY behavior unambiguous about handling return values and
ownership.
TAO/tao/Resource_Factory.h (1)

257-258: Document the new interface contract at declaration site.

Please add a short API comment describing units and disable semantics to prevent divergent implementations.

Suggested doc-only update
-  virtual int transport_idle_timeout () const = 0;
+  /// Idle transport timeout in seconds; 0 disables idle-timeout purging.
+  virtual int transport_idle_timeout () const = 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tao/Resource_Factory.h` around lines 257 - 258, Add an API comment above
the pure virtual method transport_idle_timeout() in Resource_Factory.h
describing the contract: state the units (e.g., seconds or milliseconds), what a
return value of 0 or negative means (e.g., disables timeout), whether the value
is a hard or soft limit, whether callers must convert units, and any
thread-safety/ownership expectations; update the declaration site comment so all
implementers use the same semantics when overriding transport_idle_timeout().
TAO/tao/Transport.h (1)

1126-1133: Consider using std::atomic<long> for idle_timer_id_ for clarity and consistency.

The idle_timer_id_ member follows the same access pattern as flush_timer_id_, with reads and writes from different thread contexts:

  • schedule_idle_timer() (called from app thread) writes the timer ID
  • cancel_idle_timer() (called from destructor/close_connection) reads and writes
  • handle_idle_timeout() (called from reactor thread) writes -1

While ACE's reactor->cancel_timer() is thread-safe and the existing pattern with flush_timer_id_ works reliably in practice, using std::atomic<long> would make the thread-safe intent explicit and avoid potential platform-specific issues with non-atomic long writes. This would be consistent with modern C++ practices, though it is not required for correctness given ACE's guarantees.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tao/Transport.h` around lines 1126 - 1133, Replace the plain long
idle_timer_id_ with std::atomic<long> to make the thread-safety intent explicit
and match the access pattern used by flush_timer_id_; update its declaration
(idle_timer_id_) and initialization to use std::atomic<long>{-1}, and ensure any
reads/writes in schedule_idle_timer(), cancel_idle_timer(), and
handle_idle_timeout() use the atomic (load/store or simple assignment)
semantics; also add the required <atomic> include if not already present.
TAO/tests/Transport_Idle_Timeout/client.cpp (1)

52-68: Add a wait_for_cache_size() helper next to sleep_with_reactor().

The timeout checks below sample transport_cache().current_size() only once after fixed sleeps, but TAO/tao/Transport_Cache_Manager_T.inl:175-177 returns the raw map count and the idle purge runs asynchronously. A poll-until-deadline helper here would make TC-1/TC-3/TC-4 much less timing-sensitive and can be reused by TAO/tests/Transport_Idle_Timeout/Echo_i.cpp.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tests/Transport_Idle_Timeout/client.cpp` around lines 52 - 68, Add a new
helper function wait_for_cache_size( CORBA::ORB_ptr orb, size_t expected_size,
int timeout_seconds ) next to sleep_with_reactor() that repeatedly polls
transport_cache().current_size() until it equals (or is <=/>= as appropriate for
the test) expected_size or a deadline is reached; inside the loop call
orb->perform_work(ACE_Time_Value(0,50000)) in the same 50ms slices used by
sleep_with_reactor(), compute the deadline using ACE_OS::gettimeofday() +
ACE_Time_Value(timeout_seconds), and return a boolean indicating success so
tests TC-1/TC-3/TC-4 can call wait_for_cache_size(...) instead of sampling once;
ensure the helper accounts for Transport_Cache_Manager_T.inl returning raw map
counts and keep the helper signature and behavior reusable from
TAO/tests/Transport_Idle_Timeout/Echo_i.cpp.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@TAO/tao/default_resource.cpp`:
- Around line 486-493: The code increments curarg and then dereferences
argv[curarg] even when curarg >= argc, causing an OOB access and it silently
accepts invalid/negative timeouts via ACE_OS::atoi; change the parsing in the
-ORBTransportIdleTimeout branch (symbols: transport_idle_timeout_,
report_option_value_error, ACE_OS::atoi, argv, curarg) to first verify curarg+1
< argc (or check curarg after increment) before reading argv[curarg], then parse
the value with a numeric-aware routine (e.g. strtol/ACE_OS::strtol) and validate
that the conversion consumed the whole string and produced a non-negative value;
on any failure call report_option_value_error("-ORBTransportIdleTimeout",
argv[curarg]) (or the appropriate safe message when argv[curarg] is unavailable)
instead of assigning the bad value.

In `@TAO/tao/Transport_Idle_Timer.h`:
- Around line 4-5: Fix the typo in the documentation comment in
Transport_Idle_Timer.h: change "transport_idl_timeout" to
"transport_idle_timeout" in the comment describing the reactor timer (ensure the
corrected identifier is used in the comment near the Transport_Idle_Timer
class/struct declaration); if the misspelling appears elsewhere in comments or
docs in the same file, update those occurrences as well to match the correct
name.

---

Duplicate comments:
In `@TAO/tao/Transport.cpp`:
- Around line 988-1025: The idle timer removal is racy because
handle_idle_timeout() ignores the act parameter and unconditionally clears
idle_timer_id_, letting stale timeouts clobber newer timers; fix by introducing
a generation token (e.g. a member like idle_timer_generation_ of an atomic
integer type) that schedule_idle_timer() increments and embeds into the reactor
act, have cancel_idle_timer() increment the generation to invalidate outstanding
timers, and modify handle_idle_timeout(void const* act) to cast and compare the
incoming generation token against idle_timer_generation_ before clearing
idle_timer_id_ or rescheduling; only clear or act on the timer when the
generations match so stale dequeued timers are ignored.

In `@TAO/tests/Transport_Idle_Timeout/client_multiple.cpp`:
- Around line 105-126: Make the second server IOR mandatory and validate the
narrowed reference before use: update parse_args to treat -l as required by
checking ior2 != nullptr and returning an error if missing, and in the client
initialization where echo2 is obtained via narrowing (the echo2 narrow call),
add a nil-check after narrowing and fail fast with a clear ACE_ERROR_RETURN if
echo2 is null so ping()/other calls never receive a nil reference; apply the
same validation pattern to the other occurrence referenced around lines 289-301.

In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 149-191: The failure branches after server1 has started only kill
the individual failing server (e.g., branches checking $server2_status,
WaitForFileTimed on $ior_file2, GetFile/PutFile for $ior_file2) leaving the
other server ($SV1/$server1) running; add a shared cleanup helper (e.g.,
cleanup_servers or stop_all_servers) that always stops and waits for both $SV1
and $SV2 (kill + TimedWait) and call this helper before every early return in
all error branches after server1 starts (refer to symbols $SV1, $SV2, $server1,
$server2, $client, $ior_file1, $ior_file2 and the existing
Spawn/WaitForFileTimed/GetFile/PutFile checks to locate where to insert the
calls).

---

Nitpick comments:
In `@TAO/tao/Resource_Factory.h`:
- Around line 257-258: Add an API comment above the pure virtual method
transport_idle_timeout() in Resource_Factory.h describing the contract: state
the units (e.g., seconds or milliseconds), what a return value of 0 or negative
means (e.g., disables timeout), whether the value is a hard or soft limit,
whether callers must convert units, and any thread-safety/ownership
expectations; update the declaration site comment so all implementers use the
same semantics when overriding transport_idle_timeout().

In `@TAO/tao/Transport_Cache_Manager_T.h`:
- Around line 130-132: Add a short contract comment above the declaration of
purge_entry_when_purgable(HASH_MAP_ENTRY *& entry) that documents its return
semantics: explicitly state what numeric values mean (e.g., success vs.
not-purgable vs. error), which error codes are returned (or that it returns -1
on failure), and whether callers must free/handle the entry on each case; align
wording with existing conventions used by other cache APIs in this file. This
makes callers like those invoking purge_entry_when_purgable and any code
inspecting HASH_MAP_ENTRY behavior unambiguous about handling return values and
ownership.

In `@TAO/tao/Transport.h`:
- Around line 1126-1133: Replace the plain long idle_timer_id_ with
std::atomic<long> to make the thread-safety intent explicit and match the access
pattern used by flush_timer_id_; update its declaration (idle_timer_id_) and
initialization to use std::atomic<long>{-1}, and ensure any reads/writes in
schedule_idle_timer(), cancel_idle_timer(), and handle_idle_timeout() use the
atomic (load/store or simple assignment) semantics; also add the required
<atomic> include if not already present.

In `@TAO/tests/Transport_Idle_Timeout/client.cpp`:
- Around line 52-68: Add a new helper function wait_for_cache_size(
CORBA::ORB_ptr orb, size_t expected_size, int timeout_seconds ) next to
sleep_with_reactor() that repeatedly polls transport_cache().current_size()
until it equals (or is <=/>= as appropriate for the test) expected_size or a
deadline is reached; inside the loop call
orb->perform_work(ACE_Time_Value(0,50000)) in the same 50ms slices used by
sleep_with_reactor(), compute the deadline using ACE_OS::gettimeofday() +
ACE_Time_Value(timeout_seconds), and return a boolean indicating success so
tests TC-1/TC-3/TC-4 can call wait_for_cache_size(...) instead of sampling once;
ensure the helper accounts for Transport_Cache_Manager_T.inl returning raw map
counts and keep the helper signature and behavior reusable from
TAO/tests/Transport_Idle_Timeout/Echo_i.cpp.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ea8eeb7d-18d4-47fb-acff-b9202ec3eb24

📥 Commits

Reviewing files that changed from the base of the PR and between 2e72456 and 52835b6.

📒 Files selected for processing (28)
  • ACE/ace/Event_Handler_Handle_Timeout_Upcall.cpp
  • TAO/bin/tao_orb_tests.lst
  • TAO/docs/Options.html
  • TAO/tao/Cache_Entries_T.inl
  • TAO/tao/Resource_Factory.h
  • TAO/tao/Transport.cpp
  • TAO/tao/Transport.h
  • TAO/tao/Transport_Cache_Manager_T.cpp
  • TAO/tao/Transport_Cache_Manager_T.h
  • TAO/tao/Transport_Cache_Manager_T.inl
  • TAO/tao/Transport_Connector.cpp
  • TAO/tao/Transport_Idle_Timer.cpp
  • TAO/tao/Transport_Idle_Timer.h
  • TAO/tao/Transport_Timer.cpp
  • TAO/tao/Transport_Timer.h
  • TAO/tao/default_resource.cpp
  • TAO/tao/default_resource.h
  • TAO/tao/tao.mpc
  • TAO/tests/Transport_Idle_Timeout/Echo_i.cpp
  • TAO/tests/Transport_Idle_Timeout/Echo_i.h
  • TAO/tests/Transport_Idle_Timeout/Idle_Transport_Timeout.mpc
  • TAO/tests/Transport_Idle_Timeout/client.cpp
  • TAO/tests/Transport_Idle_Timeout/client_multiple.cpp
  • TAO/tests/Transport_Idle_Timeout/run_test.pl
  • TAO/tests/Transport_Idle_Timeout/server.cpp
  • TAO/tests/Transport_Idle_Timeout/svc.conf
  • TAO/tests/Transport_Idle_Timeout/svc_disabled.conf
  • TAO/tests/Transport_Idle_Timeout/test.idl

Comment thread TAO/tao/default_resource.cpp
Comment thread TAO/tao/Transport_Idle_Timer.h Outdated
    * TAO/tao/Transport_Idle_Timer.h:
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
TAO/tao/Transport.cpp (2)

988-1026: ⚠️ Potential issue | 🔴 Critical

Serialize or version idle_timer_id_.

idle_timer_id_ is still mutated from handle_idle_timeout(), schedule_idle_timer(), and cancel_idle_timer() without synchronization. A timeout that is already dequeued can still run after cancel_idle_timer() clears the id and then re-arm itself against a closing transport. TAO/tao/Transport_Idle_Timer.cpp:16-28 only protects an in-flight callback; it does not make stale reactor entries safe.

Also applies to: 2944-2988

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tao/Transport.cpp` around lines 988 - 1026, handle_idle_timeout() mutates
idle_timer_id_ without synchronization and can race with schedule_idle_timer()
and cancel_idle_timer(), allowing stale reactor callbacks to re-arm or close the
wrong transport; fix by serializing access to idle_timer_id_ (and associated
state) via either a mutex protecting reads/writes or a version/token scheme: add
a small atomic/version field incremented by schedule_idle_timer() and
cancel_idle_timer(), capture that version when scheduling the reactor callback,
and in handle_idle_timeout() compare the captured version to the current version
before clearing idle_timer_id_ or proceeding to reschedule/close; ensure all
accesses to idle_timer_id_ and the new version/token are performed atomically or
under the same lock so stale callbacks are no-ops.

2950-2963: ⚠️ Potential issue | 🟠 Major

Don't treat schedule_timer() failure as a successful arm.

The null-reactor case is covered now, but schedule_timer() returning -1 still falls through to the normal debug path. That silently disables idle expiry and logs a bogus timer id.

🛠️ Suggested hardening
       ACE_Time_Value const tv (static_cast<time_t> (timeout_sec));
-      this->idle_timer_id_= reactor->schedule_timer (std::addressof(this->transport_idle_timer_), nullptr, tv);
+      this->idle_timer_id_ =
+        reactor->schedule_timer (std::addressof (this->transport_idle_timer_),
+                                 nullptr,
+                                 tv);
+      if (this->idle_timer_id_ == -1)
+        {
+          if (TAO_debug_level > 0)
+            {
+              TAOLIB_ERROR ((LM_ERROR,
+                          ACE_TEXT ("TAO (%P|%t) - Transport[%d]::schedule_idle_timer, ")
+                          ACE_TEXT ("schedule_timer failed\n"),
+                          this->id ()));
+            }
+          return;
+        }
 
       if (TAO_debug_level > 6)
         {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tao/Transport.cpp` around lines 2950 - 2963, schedule_timer() failure is
currently treated as success; after calling reactor->schedule_timer(...) check
the returned value stored in this->idle_timer_id_ for -1 and handle it as an
error: log an error (using TAOLIB_DEBUG or process logger consistent with
TAO_debug_level) indicating schedule_timer failed and do not proceed with the
"scheduled idle timer" debug path or assume the timer is armed; ensure
transport_idle_timer_ is not considered active when idle_timer_id_ == -1 and
leave/assign idle_timer_id_ to a sentinel (e.g. -1) so later logic can detect
the unarmed state (reference symbols: reactor, schedule_timer, idle_timer_id_,
transport_idle_timer_, TAO_debug_level).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@TAO/tao/Transport.cpp`:
- Around line 566-571: The idle timer is being armed too early (post_open()),
never cleared (update_transport()), and schedule_idle_timer() refuses to re-arm
if an older id is pending, so move timer start to the actual idle transition:
remove or disable immediate arming in post_open(), ensure update_transport()
clears/cancels any pending idle timer when activity occurs, and modify the
make_idle() success path (where
transport_cache_manager().make_idle(this->cache_map_entry_) returns 0) to reset
any pending timer id or cancel existing timers and then call
schedule_idle_timer() so the idle countdown always starts from the real
transition to idle (update code in functions post_open(), update_transport(),
schedule_idle_timer(), and the make_idle success branch that currently calls
schedule_idle_timer()).

---

Duplicate comments:
In `@TAO/tao/Transport.cpp`:
- Around line 988-1026: handle_idle_timeout() mutates idle_timer_id_ without
synchronization and can race with schedule_idle_timer() and cancel_idle_timer(),
allowing stale reactor callbacks to re-arm or close the wrong transport; fix by
serializing access to idle_timer_id_ (and associated state) via either a mutex
protecting reads/writes or a version/token scheme: add a small atomic/version
field incremented by schedule_idle_timer() and cancel_idle_timer(), capture that
version when scheduling the reactor callback, and in handle_idle_timeout()
compare the captured version to the current version before clearing
idle_timer_id_ or proceeding to reschedule/close; ensure all accesses to
idle_timer_id_ and the new version/token are performed atomically or under the
same lock so stale callbacks are no-ops.
- Around line 2950-2963: schedule_timer() failure is currently treated as
success; after calling reactor->schedule_timer(...) check the returned value
stored in this->idle_timer_id_ for -1 and handle it as an error: log an error
(using TAOLIB_DEBUG or process logger consistent with TAO_debug_level)
indicating schedule_timer failed and do not proceed with the "scheduled idle
timer" debug path or assume the timer is armed; ensure transport_idle_timer_ is
not considered active when idle_timer_id_ == -1 and leave/assign idle_timer_id_
to a sentinel (e.g. -1) so later logic can detect the unarmed state (reference
symbols: reactor, schedule_timer, idle_timer_id_, transport_idle_timer_,
TAO_debug_level).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: efe46e7c-1341-4604-b16b-6d5b41fcf2a9

📥 Commits

Reviewing files that changed from the base of the PR and between 52835b6 and 1f9bf7a.

📒 Files selected for processing (2)
  • TAO/tao/Transport.cpp
  • TAO/tao/Transport_Idle_Timer.h
✅ Files skipped from review due to trivial changes (1)
  • TAO/tao/Transport_Idle_Timer.h

Comment thread TAO/tao/Transport.cpp
    * TAO/tao/Transport.cpp:
    * TAO/tao/Transport_Cache_Manager_T.cpp:
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
TAO/tao/Transport.cpp (1)

190-190: ⚠️ Potential issue | 🔴 Critical

The idle timer still has a stale-callback/lifetime race.

handle_idle_timeout() still clears idle_timer_id_ unconditionally and ignores act, while schedule_idle_timer(), cancel_idle_timer(), close_connection(), and the destructor all mutate the same state from different threads. An older timeout that has already been dequeued can still reset idle_timer_id_ to -1 after a newer timer is armed, which makes the replacement timer impossible to cancel. TAO/tao/Transport_Idle_Timer.cpp:11-28 only takes a transport reference after dequeue, so that orphaned callback can later dispatch through the embedded transport_idle_timer_ after the transport is destroyed. Please version the idle-timer state and gate both callback and cancel paths with that token under synchronization.

Also applies to: 341-342, 989-1010, 2944-2994

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TAO/tao/Transport.cpp` at line 190, The idle-timer has a race where stale
callbacks can reset idle_timer_id_ and outlive the Transport; update the
implementation by introducing a monotonically increasing generation/token (e.g.,
idle_timer_gen_) protected by the same mutex used around schedule_idle_timer(),
cancel_idle_timer(), close_connection(), and the destructor, incrementing the
token whenever a new timer is armed or when cancelling; store the current token
with each scheduled timer and make handle_idle_timeout() check the dequeued
timer's token against idle_timer_gen_ under the mutex before mutating
idle_timer_id_ or touching transport_idle_timer_, and likewise have
cancel_idle_timer() validate the token before clearing state so callbacks and
cancels are gated by the same generation to prevent stale callbacks from
clearing/reusing newer timers or accessing destroyed Transport.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@TAO/tao/Transport.cpp`:
- Line 190: The idle-timer has a race where stale callbacks can reset
idle_timer_id_ and outlive the Transport; update the implementation by
introducing a monotonically increasing generation/token (e.g., idle_timer_gen_)
protected by the same mutex used around schedule_idle_timer(),
cancel_idle_timer(), close_connection(), and the destructor, incrementing the
token whenever a new timer is armed or when cancelling; store the current token
with each scheduled timer and make handle_idle_timeout() check the dequeued
timer's token against idle_timer_gen_ under the mutex before mutating
idle_timer_id_ or touching transport_idle_timer_, and likewise have
cancel_idle_timer() validate the token before clearing state so callbacks and
cancels are gated by the same generation to prevent stale callbacks from
clearing/reusing newer timers or accessing destroyed Transport.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a41e376a-c6ca-4611-bcfa-53db29d74456

📥 Commits

Reviewing files that changed from the base of the PR and between 1f9bf7a and 6fde616.

📒 Files selected for processing (2)
  • TAO/tao/Transport.cpp
  • TAO/tao/Transport_Cache_Manager_T.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • TAO/tao/Transport_Cache_Manager_T.cpp

    * ACE/ace/WFMO_Reactor.cpp:
    * TAO/tao/Resource_Factory.h:
    * TAO/tao/Transport_Cache_Manager_T.inl:
    * TAO/tao/Transport_Cache_Manager_T.cpp:
@jwillemsen jwillemsen merged commit 2eaaa97 into DOCGroup:master Apr 8, 2026
37 of 38 checks passed
@jwillemsen jwillemsen deleted the jwi-idletransport branch April 8, 2026 10:34
@jwillemsen
Copy link
Copy Markdown
Member Author

@Mergifyio backport ace6tao2

@mergify
Copy link
Copy Markdown

mergify bot commented Apr 8, 2026

backport ace6tao2

❌ No backport have been created

Details
  • Backport to branch ace6tao2 failed

Git reported the following error:

error: commit 94df85a50de3de5ed4fa7e36fb7769412c4d0ba5 is a merge but no -m option was given.
fatal: cherry-pick failed

jwillemsen added a commit to jwillemsen/ATCD that referenced this pull request Apr 8, 2026
    * TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
    * TAO/tests/Transport_Idle_Timeout/Echo_i.h:
    * TAO/tests/Transport_Idle_Timeout/Idle_Transport_Timeout.mpc:
    * TAO/tests/Transport_Idle_Timeout/client.cpp:
    * TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
    * TAO/tests/Transport_Idle_Timeout/run_test.pl:
    * TAO/tests/Transport_Idle_Timeout/server.cpp:
    * TAO/tests/Transport_Idle_Timeout/svc.conf:
    * TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
    * TAO/tests/Transport_Idle_Timeout/test.idl:
      Added.

    * ACE/ace/Event_Handler_Handle_Timeout_Upcall.cpp:
    * ACE/ace/WFMO_Reactor.cpp:
    * TAO/NEWS:
    * TAO/bin/tao_orb_tests.lst:
    * TAO/docs/Options.html:
    * TAO/tao/Cache_Entries_T.inl:
    * TAO/tao/Resource_Factory.h:
    * TAO/tao/Transport.cpp:
    * TAO/tao/Transport.h:
    * TAO/tao/Transport_Cache_Manager_T.cpp:
    * TAO/tao/Transport_Cache_Manager_T.h:
    * TAO/tao/Transport_Cache_Manager_T.inl:
    * TAO/tao/Transport_Connector.cpp:
    * TAO/tao/Transport_Timer.cpp:
    * TAO/tao/Transport_Timer.h:
    * TAO/tao/default_resource.cpp:
    * TAO/tao/default_resource.h:
    * TAO/tao/tao.mpc:
jwillemsen added a commit that referenced this pull request Apr 10, 2026
[ace6tao2] Backport #2523, add `-ORBTransportIdleTimeout` to purge idle transports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Extend TAO with idle connection purging based on timeout

1 participant