Multi-rank profiling with MPI Communication #3013

abchoudh-amd · 2026-02-02T14:30:18Z

Motivation

Technical Details

Added MPI-aware laplace equation
Added test case for MPI aware workload

JIRA ID

AIPROFCOMP-132

Test Plan

Run ctests
[ X] Run MPI aware workload

Test Result

[X ] Ctests passes
[X ] MPI aware workload can be profiled

Submission Checklist

[X ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

Adds multi-rank (MPI) awareness to ROCm Compute Profiler outputs and tests, including a new MPI-enabled Laplace sample workload and a new --output-directory mechanism with placeholder substitution.

Changes:

Introduces --output-directory (with %hostname%, %gpumodel%, %rank%, %env{...}%) and MPI-rank-based output directory isolation.
Adds MPI-enabled Laplace sample build/test path plus new multi-rank pytest coverage.
Updates docs/changelog to describe the new output directory behavior and MPI multi-rank profiling.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
projects/rocprofiler-compute/src/rocprof_compute_base.py	Implements output-directory placeholder replacement, rank detection integration, and refactors machine spec generation/load flow.
projects/rocprofiler-compute/src/utils/utils.py	Adds rank/env placeholder helpers (`get_rank`, `replace_rank`, `replace_env`).
projects/rocprofiler-compute/src/argparser.py	Adds `--output-directory` CLI option and deprecates `--path/--subpath` in help text.
projects/rocprofiler-compute/src/roofline.py	Adds a setter to update roofline workload directory at post-processing time.
projects/rocprofiler-compute/src/rocprof_compute_soc/soc_base.py	Updates roofline post-processing to use the (potentially rewritten) workload directory.
projects/rocprofiler-compute/sample/laplace_eqn.hip	Adds optional MPI support and halo-exchange scaffolding for an MPI-aware sample workload.
projects/rocprofiler-compute/tests/conftest.py	Updates test harness to use `--output-directory` and introduces knobs for skipping `--name` and using default output dir behavior.
projects/rocprofiler-compute/tests/test_profile_general.py	Adds output-directory placeholder tests and multi-rank MPI tests (including MPI-aware Laplace case).
projects/rocprofiler-compute/tests/CMakeLists.txt	Builds `mpi_aware_laplace_eqn` when MPI is available.
projects/rocprofiler-compute/CMakeLists.txt	Adds an MPI-driven ctest for multi-rank pytest execution.
projects/rocprofiler-compute/requirements-test.txt	Adds `mpi4py` and `pytest-mpi` as test dependencies.
projects/rocprofiler-compute/pyproject.toml	Adds `multi_rank` pytest marker.
projects/rocprofiler-compute/docs/how-to/use.rst	Documents new MPI-rank default output behavior and deprecation of `--path/--subpath`.
projects/rocprofiler-compute/docs/how-to/profile/mode.rst	Adds detailed output-directory configuration and multi-rank profiling documentation.
projects/rocprofiler-compute/CHANGELOG.md	Records new `--output-directory`, MPI rank detection, and deprecations.

Comments suppressed due to low confidence (1)

projects/rocprofiler-compute/src/rocprof_compute_soc/soc_base.py:738

Overridden method signature does not match call, where it is passed too many arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too many arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too many arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too many arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too many arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too many arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too many arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too many arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.
Overridden method signature does not match call, where it is passed too few arguments. Overriding method function demarcate.wrap_function matches the call.

    def analysis_setup(self, roofline_parameters: Optional[dict[str, Any]]) -> None:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-03T17:13:18Z

projects/rocprofiler-compute/CMakeLists.txt

+if(MPI_CXX_COMPILER)
+    message(STATUS "MPI CXX compiler found: ${MPI_CXX_COMPILER}")
+else()
+    message(FATAL_ERROR "MPI CXX compiler not found. MPI tests will be skipped.")
+endif()


CMake now hard-fails configuration when MPI isn’t found, but the message says "MPI tests will be skipped". This makes all builds require MPI even when tests aren’t being built/run. Suggest making MPI optional (e.g., find_package(MPI QUIET ...) and only adding MPI tests/targets when found) and/or guarding this behind the project’s test option so non-MPI environments can still configure/build.

Copilot · 2026-02-03T17:13:18Z

projects/rocprofiler-compute/tests/test_profile_general.py

+        # Check with hostname
+        hostname = "test_node"
+        with patch("socket.gethostname", return_value=hostname):
+            workload_base_dir = test_utils.get_output_dir(param_id="hostname")
+            workload_dir = os.path.join(workload_base_dir, "%hostname%")


This test’s hostname/gpumodel cases assume no MPI-rank env vars are set; however on HPC schedulers SLURM_PROCID/etc can be present even for non-MPI pytest runs, which would make the output-directory logic append /<rank> and break these assertions. Consider clearing the rank-related env vars at the start of the test (before the hostname/gpumodel checks) to make it environment-independent.

Copilot · 2026-02-03T17:13:18Z

projects/rocprofiler-compute/docs/how-to/profile/mode.rst

+subdirectory named by its rank to avoid output collisions.
+
+Example Usage
+-----


In reStructuredText, the underline length should be at least as long as the section title. Here Example Usage is underlined with only 5 dashes (-----), which can trigger Sphinx/RST formatting warnings/errors. Use an underline at least the length of the title (e.g., -------------) or switch to a consistent subsection adornment style used elsewhere in the doc.

Suggested change

-----

-------------

Copilot · 2026-02-03T17:13:19Z

projects/rocprofiler-compute/src/rocprof_compute_base.py

+                print(f"Path: {self.__args.path}")
+                print(f"Name: {self.__args.name}")
+                print(f"Output_directory: {self.__args.output_directory}")


The print(f"Path/Name/Output_directory...") lines look like leftover debug statements and can pollute CLI output and test logs. Prefer console_debug(...) (or remove entirely) so user-facing output stays clean, especially when followed by console_error/console_warning.

Suggested change

print(f"Path: {self.__args.path}")

print(f"Name: {self.__args.name}")

print(f"Output_directory: {self.__args.output_directory}")

console_debug(f"Path: {self.__args.path}")

console_debug(f"Name: {self.__args.name}")

console_debug(f"Output_directory: {self.__args.output_directory}")

Copilot · 2026-02-03T17:13:19Z

projects/rocprofiler-compute/sample/laplace_eqn.hip

    for (int j = 0; j < N; ++j) {
-      float x = 2.0f * (float)i / (float)N - 1.0f;
      float y = 2.0f * (float)j / (float)N - 1.0f;
+#endif // USE_MPI
+      float x = 2.0f * (float)i / (float)N - 1.0f;
      U[i * N + j] = 1.0f / (1.0f + x * x + y * y); // Initial guess


In MPI mode you allocate arrays with an extra halo (ny = N + 2) and initialize() loops j=1..N, but indexing still uses U[i * N + j] (stride N). This doesn’t match the allocated leading dimension and will overlap rows / misplace the intended halo. Use a leading-dimension variable (e.g., ld = USE_MPI ? (N + 2) : N) and index as U[i * ld + j] (and initialize halo cells intentionally).

Copilot · 2026-02-03T17:13:19Z

projects/rocprofiler-compute/src/rocprof_compute_base.py

+        self.generate_machine_specs()
+
        self.sanitize()


__init__() now calls self.generate_machine_specs() unconditionally. generate_machine_specs() executes ROCm/amd-smi probing when sysinfo is not provided, which makes offline analyze workflows (analyzing an existing workload on a non-ROCm host) fail early. Consider deferring machine-spec generation until profile mode, or using workload sysinfo.csv during analyze mode instead of probing local hardware.

Copilot · 2026-02-03T17:13:20Z

projects/rocprofiler-compute/sample/laplace_eqn.hip

+    HIP_CHECK(hipMemcpy(h_send_left[i], &d_U_olds[i][n_sizes[i]],
+                        n_sizes[i] * sizeof(float), hipMemcpyDeviceToHost));
+    HIP_CHECK(hipMemcpy(h_send_right[i], &d_U_olds[i][n_sizes[i] * (n_sizes[i] + 1)],
+                        n_sizes[i] * sizeof(float), hipMemcpyDeviceToHost));
+  }


The halo exchange copies contiguous buffers starting at offsets like &d_U_olds[i][n_sizes[i]] and &d_U_olds[i][n_sizes[i] * (n_sizes[i] + 1)]. For a 2D domain partitioned in one dimension, true halo boundaries are typically strided and require packing/unpacking (or hipMemcpy2D) using the correct leading dimension. As written, the exchanged data does not correspond to the stated neighboring-halo regions.

Copilot · 2026-02-03T17:13:20Z

projects/rocprofiler-compute/sample/laplace_eqn.hip

+  const int ny_small = n_small + 2;
+  const int ny_mid = n_mid + 2;
+  const int ny_large = n_large + 2;
+#else
+  const int ny_small = n_small;


In MPI mode you increase ny_* by 2 (halo) and later allocate/copy nx * ny elements, but the kernel launches still pass N and the kernel indexes an N x N domain. Unless the kernel is updated to use the halo/leading dimension, these extra halo elements (and the MPI halo exchanges) won’t influence the stencil computation.

Copilot · 2026-02-03T17:13:20Z

projects/rocprofiler-compute/pyproject.toml

    "iteration_multiplexing_stochastic",
    "noise_clamp",
    "torch_ops",
+    "multi_rank",
 ]


The new tests use @pytest.mark.mpi, but mpi is not listed in tool.pytest.ini_options.markers. This will raise PytestUnknownMarkWarning (or fail under strict marker settings). Add "mpi" to the markers list alongside multi_rank.

Copilot · 2026-02-03T17:13:21Z

projects/rocprofiler-compute/src/rocprof_compute_base.py


+    @demarcate
+    def load_soc_specs(self, sysinfo: Optional[dict] = None) -> None:
+        """Load OmniSoC instance for RocProfCompute run"""


load_soc_specs(self, sysinfo=...) no longer uses the sysinfo argument and instead relies on self.__mspec generated from the current host. In analyze mode, run_analysis() passes a sysinfo dict from sysinfo.csv (rocprof_compute_base.py:575), so this can load the wrong SoC (or fail on a machine without ROCm). Restore behavior by regenerating self.__mspec from sysinfo when provided (e.g., generate_machine_specs(self.__args, sysinfo)) before selecting the SoC module.

Suggested change

"""Load OmniSoC instance for RocProfCompute run"""

"""Load OmniSoC instance for RocProfCompute run

If sysinfo is provided (e.g., in analyze mode from sysinfo.csv),

regenerate the MachineSpecs from that data instead of using the

current host's characteristics.

"""

if sysinfo is not None:

# Regenerate machine specs based on the provided sysinfo rather than

# the current host. This is important for analyze mode where we may

# be running on a different machine than the one that produced the data.

self.__mspec = generate_machine_specs(self.__args, sysinfo)

abchoudh-amd and others added 30 commits January 27, 2026 23:17

Modified path parsing

eb0d5ae

Update --output-directory flag in conftest

1c4db1a

Output directory parmeterization

501f285

Test output path

b659ddf

Detect rank as default

1e1af17

Use console_error instead of sys exit

cce39a5

Updated conftest

e2f86c0

Added tests

c42e5c5

Use rank if MPI detected

3a51737

Updated Changelog

501d9ca

Fixed roofline in profiling mode

a79c233

WIP test

9e18477

Fixed test

bda1c4c

Fixed test

262b270

Fixed conftest

2647c0a

Fixed formatting

6690b7d

Multi rank profiling test

372e02a

Doc update

7db01de

Test update

ca70bf8

Doc update

0a39657

Fixed test

f991397

Updated docs

aaf097c

Updated test

cde51a9

Fixed roofline workload dir

e9dac70

Updated comment

4c7a620

CoPilot suggestions

15d9eba

Fixed typo

c73e05c

Fixed whitespace

9031aee

Added mult_rank marker for pytest

a888fcf

Removed MPI aware workload test

68fd474

abchoudh-amd added 7 commits February 2, 2026 04:23

Fixed cleaning test dir

0b1fc6e

Fixed formatting

9e903b5

Fixed tests

b94beb9

Fixed tests

f05eb45

Fixed formatting

537d005

Updated docs

e8a919d

Added multi rank ctest

0f7e4f8

github-actions bot added documentation Improvements or additions to documentation project: rocprofiler-compute labels Feb 2, 2026

systems-assistant bot added the organization: ROCm label Feb 2, 2026

abchoudh-amd force-pushed the users/abchoudh/mpi-communication-aware-bug branch from 183d90e to 7c80599 Compare February 3, 2026 08:12

Fix multi-rank test

7e435df

abchoudh-amd force-pushed the users/abchoudh/mpi-communication-aware-bug branch from c16b11f to c953de3 Compare February 3, 2026 12:57

Removed pytest-mpi and mpi4py dependencies

0eaf5f2

abchoudh-amd marked this pull request as ready for review February 3, 2026 16:56

abchoudh-amd requested review from a team and prbasyal-amd as code owners February 3, 2026 16:56

Copilot AI review requested due to automatic review settings February 3, 2026 16:56

Copilot started reviewing on behalf of abchoudh-amd February 3, 2026 16:57 View session

Back to original

5f03499

abchoudh-amd force-pushed the users/abchoudh/mpi-communication-aware-bug branch from d9ac823 to 643c0c2 Compare February 3, 2026 17:02

abchoudh-amd added 6 commits February 3, 2026 11:02

Parallel Laplace equation

c637e67

Laplace equation comments

3826ebd

Create separate MPI aware laplace test

12df779

Removed GPU-GPU comm

f849ccf

Added mpi comm test

3d41fdb

Added test for MPI communication

05e66ca

abchoudh-amd force-pushed the users/abchoudh/mpi-communication-aware-bug branch from 643c0c2 to 05e66ca Compare February 3, 2026 17:02

Copilot AI reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-rank profiling with MPI Communication #3013

Multi-rank profiling with MPI Communication #3013

Uh oh!

abchoudh-amd commented Feb 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        """Load OmniSoC instance for RocProfCompute run"""
+        """Load OmniSoC instance for RocProfCompute run
+        If sysinfo is provided (e.g., in analyze mode from sysinfo.csv),
+        regenerate the MachineSpecs from that data instead of using the
+        current host's characteristics.
+        """
+        if sysinfo is not None:
+            # Regenerate machine specs based on the provided sysinfo rather than
+            # the current host. This is important for analyze mode where we may
+            # be running on a different machine than the one that produced the data.
+            self.__mspec = generate_machine_specs(self.__args, sysinfo)

Multi-rank profiling with MPI Communication #3013

Are you sure you want to change the base?

Multi-rank profiling with MPI Communication #3013

Uh oh!

Conversation

abchoudh-amd commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

JIRA ID

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abchoudh-amd commented Feb 2, 2026 •

edited

Loading