[core][DO NOT MERGE] Profiling RDT stuff#62599
[core][DO NOT MERGE] Profiling RDT stuff#62599Sparks0219 wants to merge 1 commit intoray-project:masterfrom
Conversation
Signed-off-by: Joshua Lee <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request introduces performance profiling instrumentation for Ray Direct Transport (RDT) using NIXL. It adds timing measurements across key paths including serialization, object putting/getting, and NIXL-specific transfer operations, alongside a new profiling script rdt_profile.py. The review feedback identifies a thread-safety issue with the global timing dictionary and points out that certain metrics should be accumulated rather than overwritten to correctly capture overhead in batch operations. Additionally, a correction was suggested for the profiling script to ensure count-based metrics are excluded from duration summations.
| from ray.widgets import Template | ||
| from ray.widgets.util import repr_with_fallback | ||
|
|
||
| _rdt_profile_timings = {} |
There was a problem hiding this comment.
The global dictionary _rdt_profile_timings is not thread-safe. Ray workers often involve multiple threads (e.g., the main task execution thread and background IO or system threads). Concurrent modifications to this dictionary can lead to race conditions or RuntimeError if one thread iterates over the dictionary while another modifies it. Since this is used for profiling, it may also result in corrupted or interleaved timing data.
| _rdt_profile_timings["B_object_ref_reducer"] = ( | ||
| _time.perf_counter() - _reducer_start | ||
| ) |
There was a problem hiding this comment.
The key "B_object_ref_reducer" is overwritten for every ObjectRef being serialized. If a collection (such as a list) containing multiple ObjectRefs is passed to ray.put, this timing will only reflect the duration of the last individual reduction rather than the cumulative time for the entire collection. Using an additive approach ensures the total overhead is captured.
_rdt_profile_timings["B_object_ref_reducer"] = (
_rdt_profile_timings.get("B_object_ref_reducer", 0)
+ (_time.perf_counter() - _reducer_start)
)| overall_end = time.perf_counter() | ||
|
|
||
| # Compute overhead totals | ||
| sender_overhead = sum(v for k, v in sender_timings.items() if "iterations" not in k) |
There was a problem hiding this comment.
The overhead calculation incorrectly includes metrics that are counts rather than durations. While the print_timings function correctly identifies keys containing "count" as non-durations, this summation only filters for "iterations". This will skew the overhead percentage if count-based metrics are added to the timings dictionary.
| sender_overhead = sum(v for k, v in sender_timings.items() if "iterations" not in k) | |
| sender_overhead = sum(v for k, v in sender_timings.items() if "iterations" not in k and "count" not in k) |
Profiling stuff