[core][DO NOT MERGE] Profiling RDT stuff by Sparks0219 · Pull Request #62599 · ray-project/ray

Sparks0219 · 2026-04-14T07:06:29Z

Profiling stuff

Signed-off-by: Joshua Lee <[email protected]>

gemini-code-assist

Code Review

This pull request introduces performance profiling instrumentation for Ray Direct Transport (RDT) using NIXL. It adds timing measurements across key paths including serialization, object putting/getting, and NIXL-specific transfer operations, alongside a new profiling script rdt_profile.py. The review feedback identifies a thread-safety issue with the global timing dictionary and points out that certain metrics should be accumulated rather than overwritten to correctly capture overhead in batch operations. Additionally, a correction was suggested for the profiling script to ensure count-based metrics are excluded from duration summations.

gemini-code-assist · 2026-04-14T07:07:43Z

python/ray/_private/worker.py

 from ray.widgets import Template
 from ray.widgets.util import repr_with_fallback

+_rdt_profile_timings = {}


The global dictionary _rdt_profile_timings is not thread-safe. Ray workers often involve multiple threads (e.g., the main task execution thread and background IO or system threads). Concurrent modifications to this dictionary can lead to race conditions or RuntimeError if one thread iterates over the dictionary while another modifies it. Since this is used for profiling, it may also result in corrupted or interleaved timing data.

gemini-code-assist · 2026-04-14T07:07:43Z

python/ray/_private/serialization.py

+                _rdt_profile_timings["B_object_ref_reducer"] = (
+                    _time.perf_counter() - _reducer_start
+                )


The key "B_object_ref_reducer" is overwritten for every ObjectRef being serialized. If a collection (such as a list) containing multiple ObjectRefs is passed to ray.put, this timing will only reflect the duration of the last individual reduction rather than the cumulative time for the entire collection. Using an additive approach ensures the total overhead is captured.

_rdt_profile_timings["B_object_ref_reducer"] = ( _rdt_profile_timings.get("B_object_ref_reducer", 0) + (_time.perf_counter() - _reducer_start) )

gemini-code-assist · 2026-04-14T07:07:43Z

rdt_profile.py

+    overall_end = time.perf_counter()
+
+    # Compute overhead totals
+    sender_overhead = sum(v for k, v in sender_timings.items() if "iterations" not in k)


The overhead calculation incorrectly includes metrics that are counts rather than durations. While the print_timings function correctly identifies keys containing "count" as non-durations, this summation only filters for "iterations". This will skew the overhead percentage if count-based metrics are added to the timings dictionary.

Suggested change

sender_overhead = sum(v for k, v in sender_timings.items() if "iterations" not in k)

sender_overhead = sum(v for k, v in sender_timings.items() if "iterations" not in k and "count" not in k)

DEBUG

166d5ff

Signed-off-by: Joshua Lee <[email protected]>

gemini-code-assist bot reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][DO NOT MERGE] Profiling RDT stuff#62599

[core][DO NOT MERGE] Profiling RDT stuff#62599
Sparks0219 wants to merge 1 commit intoray-project:masterfrom
Sparks0219:joshlee/profile-rdt

Sparks0219 commented Apr 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	sender_overhead = sum(v for k, v in sender_timings.items() if "iterations" not in k)
	sender_overhead = sum(v for k, v in sender_timings.items() if "iterations" not in k and "count" not in k)

Conversation

Sparks0219 commented Apr 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant