Skip to content

[AIPROFCOMP-154] Refactor marker naming to include full pytorch operator hierarchy#3011

Draft
ggottipa-amd wants to merge 27 commits intousers/ggottipa/move-trace-processing-to-analyzefrom
users/ggottipa-amd/roctx-hierarchical-markers
Draft

[AIPROFCOMP-154] Refactor marker naming to include full pytorch operator hierarchy#3011
ggottipa-amd wants to merge 27 commits intousers/ggottipa/move-trace-processing-to-analyzefrom
users/ggottipa-amd/roctx-hierarchical-markers

Conversation

@ggottipa-amd
Copy link
Contributor

Motivation

Changing inject_roctx.py to save complete hierarchy in the marker name at the time of region creation. Enhancing the interception coverage for torch operations.

Technical Details

JIRA ID

AIPROFCOMP-154

Test Plan

Test Result

Submission Checklist

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the ROCTX marker injection system to include full PyTorch operator hierarchy in marker names. The changes enable tracking of the complete call stack by maintaining a global marker stack that builds hierarchical marker names using "/" separators.

Changes:

  • Introduces a global marker_stack list to track hierarchical call paths
  • Adds dispatcher-level interception for torch._C and torchvision._C operations
  • Implements auto-wrapping functions to instrument all methods in PyTorch submodules and external libraries (timm, transformers, lmdb, spacy)
  • Updates all existing wrapper functions to build and push hierarchical marker names
  • Adds instrument_all_torch_ops() function to wrap torch.ops operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/roctx-hierarchical-markers branch from c76043c to 4394c75 Compare February 3, 2026 07:30
@ggottipa-amd ggottipa-amd changed the base branch from develop to users/ggottipa/move-trace-processing-to-analyze February 3, 2026 09:44
@ggottipa-amd ggottipa-amd marked this pull request as draft February 4, 2026 10:43
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 4, 2026
@ggottipa-amd ggottipa-amd requested a review from Copilot February 4, 2026 12:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 25 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 119 to 132
def tv_dispatch_call_with_roctx(*args, **kwargs):
op_name = str(args[0]) if args else "vision_op"
full_marker_name = "/".join(marker_stack + [f"torchvision::{op_name}"])
marker_stack.append(f"torchvision::{op_name}")
rangePush(full_marker_name)
try:
return original_tv_dispatch_call(*args, **kwargs)
finally:
rangePop()
marker_stack.pop()
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to dispatch_call_with_roctx, this torchvision dispatcher wrapper doesn't use context_stack, creating inconsistent marker name format compared to other wrappers. Markers will lack the ":#..." context suffix expected by the parsing logic in utils.py.

Copilot uses AI. Check for mistakes.
Comment on lines +229 to +244
# # Avoid double-wrapping
# if hasattr(meth, "_is_roctx_wrapped"):
# continue
# def make_method_wrapper(orig_meth, cname, mname):
# def wrapper(self, *args, **kwargs):
# full_marker_name = "/".join(marker_stack + [f"{prefix}.{cname}.{mname}"])
# marker_stack.append(f"{prefix}.{cname}.{mname}")
# rangePush(full_marker_name)
# try:
# return orig_meth(self, *args, **kwargs)
# finally:
# rangePop()
# marker_stack.pop()
# wrapper._is_roctx_wrapped = True
# return wrapper
# try:
# setattr(class_obj, meth_name, make_method_wrapper(meth, name, meth_name))
# except Exception:
# pass # Some built-in methods can't be set

# def auto_wrap_all_submodules(parent_module, prefix):
# # Wrap the parent module itself
# auto_wrap_class_methods(parent_module, prefix)
# # Iterate over all submodules
# if hasattr(parent_module, "__path__"):
# for module_info in pkgutil.walk_packages(parent_module.__path__, prefix + "."):
# try:
# submod = importlib.import_module(module_info.name)
# auto_wrap_class_methods(submod, module_info.name)
# except Exception:
# pass # Some submodules may not import cleanly

Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# # Avoid double-wrapping
# if hasattr(meth, "_is_roctx_wrapped"):
# continue
# def make_method_wrapper(orig_meth, cname, mname):
# def wrapper(self, *args, **kwargs):
# full_marker_name = "/".join(marker_stack + [f"{prefix}.{cname}.{mname}"])
# marker_stack.append(f"{prefix}.{cname}.{mname}")
# rangePush(full_marker_name)
# try:
# return orig_meth(self, *args, **kwargs)
# finally:
# rangePop()
# marker_stack.pop()
# wrapper._is_roctx_wrapped = True
# return wrapper
# try:
# setattr(class_obj, meth_name, make_method_wrapper(meth, name, meth_name))
# except Exception:
# pass # Some built-in methods can't be set
# def auto_wrap_all_submodules(parent_module, prefix):
# # Wrap the parent module itself
# auto_wrap_class_methods(parent_module, prefix)
# # Iterate over all submodules
# if hasattr(parent_module, "__path__"):
# for module_info in pkgutil.walk_packages(parent_module.__path__, prefix + "."):
# try:
# submod = importlib.import_module(module_info.name)
# auto_wrap_class_methods(submod, module_info.name)
# except Exception:
# pass # Some submodules may not import cleanly
# (Commented-out legacy wrapper and auto_wrap_all_submodules code removed;
# refer to version control history if needed.)

Copilot uses AI. Check for mistakes.
Comment on lines +241 to +256
# marker_stack.pop()
# wrapper._is_roctx_wrapped = True
# return wrapper
# try:
# setattr(class_obj, meth_name, make_method_wrapper(meth, name, meth_name))
# except Exception:
# pass # Some built-in methods can't be set

# def auto_wrap_all_submodules(parent_module, prefix):
# # Wrap the parent module itself
# auto_wrap_class_methods(parent_module, prefix)
# # Iterate over all submodules
# if hasattr(parent_module, "__path__"):
# for module_info in pkgutil.walk_packages(parent_module.__path__, prefix + "."):
# try:
# submod = importlib.import_module(module_info.name)
# auto_wrap_class_methods(submod, module_info.name)
# except Exception:
# pass # Some submodules may not import cleanly

# # Save original sys.argv before auto-wrapping to prevent torch submodules from parsing workload arguments
# _saved_argv = sys.argv
# sys.argv = [sys.argv[0]] if sys.argv else ["inject_roctx.py"]

# # Auto-wrap all submodules of relevant libraries (taken from PyTorch examples)
# auto_wrap_all_submodules(torch, "torch")
# try:
# import torchvision
# auto_wrap_all_submodules(torchvision, "torchvision")
# except Exception:
# pass
# try:
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# marker_stack.pop()
# wrapper._is_roctx_wrapped = True
# return wrapper
# try:
# setattr(class_obj, meth_name, make_method_wrapper(meth, name, meth_name))
# except Exception:
# pass # Some built-in methods can't be set
# def auto_wrap_all_submodules(parent_module, prefix):
# # Wrap the parent module itself
# auto_wrap_class_methods(parent_module, prefix)
# # Iterate over all submodules
# if hasattr(parent_module, "__path__"):
# for module_info in pkgutil.walk_packages(parent_module.__path__, prefix + "."):
# try:
# submod = importlib.import_module(module_info.name)
# auto_wrap_class_methods(submod, module_info.name)
# except Exception:
# pass # Some submodules may not import cleanly
# # Save original sys.argv before auto-wrapping to prevent torch submodules from parsing workload arguments
# _saved_argv = sys.argv
# sys.argv = [sys.argv[0]] if sys.argv else ["inject_roctx.py"]
# # Auto-wrap all submodules of relevant libraries (taken from PyTorch examples)
# auto_wrap_all_submodules(torch, "torch")
# try:
# import torchvision
# auto_wrap_all_submodules(torchvision, "torchvision")
# except Exception:
# pass
# try:
# NOTE: Auto-wrapping utilities for ROCTX injection were previously implemented here.
# They have been intentionally removed to avoid maintaining large blocks of commented-out code.
# If auto-wrapping behavior is required in the future, it should be reintroduced as active,
# well-documented code rather than preserved in comments.
# try:

Copilot uses AI. Check for mistakes.
Comment on lines +303 to 290
# # Restore original sys.argv after all auto-wrapping is complete
# sys.argv = _saved_argv


Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# # Restore original sys.argv after all auto-wrapping is complete
# sys.argv = _saved_argv
# NOTE: Restoration of the original sys.argv after auto-wrapping is intentionally
# disabled. If needed in the future, reintroduce this behavior explicitly where
# command-line arguments are managed.

Copilot uses AI. Check for mistakes.
Comment on lines 150 to 151
except Exception:
pass
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
pass
except Exception as e:
console_warning(
"Failed to patch torch.distributed.all_reduce; leaving original implementation unmodified."
)
console_warning(str(e))

Copilot uses AI. Check for mistakes.
Comment on lines +189 to +174
except Exception:
pass
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
pass
except Exception as exc:
console_warning("torch trace", f"Failed to wrap torch.cuda.set_device with ROCTX: {exc}")

Copilot uses AI. Check for mistakes.
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/roctx-hierarchical-markers branch from 924d43b to 115acfb Compare February 4, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant