Skip to content

Conversation

@jayhawk-commits
Copy link
Collaborator

No description provided.

@jayhawk-commits jayhawk-commits merged commit 1b17d23 into develop Apr 7, 2025
@jayhawk-commits jayhawk-commits deleted the joseph/repoName branch April 7, 2025 17:30
systems-assistant bot pushed a commit that referenced this pull request Jul 17, 2025
Rename to ROCm Systems Profiler (rocprof-sys)
systems-assistant bot pushed a commit that referenced this pull request Jul 22, 2025
Create rocm_ci_caller.yml init file to call shared workflow
systems-assistant bot pushed a commit that referenced this pull request Jul 22, 2025
systems-assistant bot pushed a commit that referenced this pull request Jul 22, 2025
systems-assistant bot pushed a commit that referenced this pull request Jul 22, 2025
Create kws_caller.yml and rocm_ci_caller.yml
systems-assistant bot pushed a commit that referenced this pull request Jul 22, 2025
Enabling per PR based KWS check and PSDB check
jayhawk-commits pushed a commit that referenced this pull request Aug 5, 2025
jayhawk-commits pushed a commit that referenced this pull request Aug 5, 2025
ammallya pushed a commit that referenced this pull request Aug 6, 2025
Rename to ROCm Systems Profiler (rocprof-sys)

[ROCm/rocprofiler-systems commit: f3c699e]
jayhawk-commits pushed a commit that referenced this pull request Aug 8, 2025
Create rocm_ci_caller.yml init file to call shared workflow

[ROCm/rocm_smi_lib commit: bb122ef]
jayhawk-commits pushed a commit that referenced this pull request Aug 11, 2025
Create kws_caller.yml and rocm_ci_caller.yml

[ROCm/rocminfo commit: fad2fcd]
jayhawk-commits pushed a commit that referenced this pull request Aug 11, 2025
jayhawk-commits pushed a commit that referenced this pull request Aug 11, 2025
Enabling per PR based KWS check and PSDB check

[ROCm/ROCR-Runtime commit: d70d3fb]
kcossett-amd added a commit to kcossett-amd/rocm-systems that referenced this pull request Oct 16, 2025
Co-authored-by: Pratik Basyal <[email protected]>
dgaliffiAMD added a commit that referenced this pull request Oct 21, 2025
…ument to avoid instrumenting around C "main" wrapper (#1322)

* Add check for Fortran main

* Comment change

* MAIN__ -> Fortran main

* Cray Compiler comment change

* Add changelog and troubleshooting comments

* Improve CHANGELOG.md message

* Change CHANGELOG msg to be in 7.2.0

* Apply review change #1

Co-authored-by: Pratik Basyal <[email protected]>

* Apply review change #2

Co-authored-by: Pratik Basyal <[email protected]>

* Apply review change #3

Co-authored-by: Pratik Basyal <[email protected]>

---------

Co-authored-by: Pratik Basyal <[email protected]>
Co-authored-by: David Galiffi <[email protected]>
ggottipa-amd pushed a commit that referenced this pull request Oct 31, 2025
…ument to avoid instrumenting around C "main" wrapper (#1322)

* Add check for Fortran main

* Comment change

* MAIN__ -> Fortran main

* Cray Compiler comment change

* Add changelog and troubleshooting comments

* Improve CHANGELOG.md message

* Change CHANGELOG msg to be in 7.2.0

* Apply review change #1

Co-authored-by: Pratik Basyal <[email protected]>

* Apply review change #2

Co-authored-by: Pratik Basyal <[email protected]>

* Apply review change #3

Co-authored-by: Pratik Basyal <[email protected]>

---------

Co-authored-by: Pratik Basyal <[email protected]>
Co-authored-by: David Galiffi <[email protected]>
ammallya pushed a commit that referenced this pull request Nov 17, 2025
The bug was reproduced like this.

In terminal #1, run command:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

In terminal #2, inject errors:
while true; do sudo amdgpuras -b 7 -s 1 -m 6 -t 2; sleep 2; done

The terminal #1 starts dumping cper entry information that it captures. After 20 entries have been captured, open terminal #3 and run same command as terminal #1:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

From terminal #3, there will be no output, even when terminal #1 continues capturing and printing information.

The fix:

Since we already have more than 20 CPER entries available in the GPU buffer, when we run the command from terminal #3 to start capturing from the beginning and pass 20 buffers to copy entries to, the C++ API returns a code saying there is more data available.

The Python CLI should not treat this as an error, but should continue to print what the API returned.

---------

Signed-off-by: Oosman Saeed <[email protected]>
ammallya pushed a commit that referenced this pull request Nov 18, 2025
The bug was reproduced like this.

In terminal #1, run command:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

In terminal #2, inject errors:
while true; do sudo amdgpuras -b 7 -s 1 -m 6 -t 2; sleep 2; done

The terminal #1 starts dumping cper entry information that it captures. After 20 entries have been captured, open terminal #3 and run same command as terminal #1:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

From terminal #3, there will be no output, even when terminal #1 continues capturing and printing information.

The fix:

Since we already have more than 20 CPER entries available in the GPU buffer, when we run the command from terminal #3 to start capturing from the beginning and pass 20 buffers to copy entries to, the C++ API returns a code saying there is more data available.

The Python CLI should not treat this as an error, but should continue to print what the API returned.

---------

Signed-off-by: Oosman Saeed <[email protected]>

[ROCm/amdsmi commit: 5b95d22]
ammallya pushed a commit that referenced this pull request Nov 21, 2025
The bug was reproduced like this.

In terminal #1, run command:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

In terminal #2, inject errors:
while true; do sudo amdgpuras -b 7 -s 1 -m 6 -t 2; sleep 2; done

The terminal #1 starts dumping cper entry information that it captures. After 20 entries have been captured, open terminal #3 and run same command as terminal #1:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

From terminal #3, there will be no output, even when terminal #1 continues capturing and printing information.

The fix:

Since we already have more than 20 CPER entries available in the GPU buffer, when we run the command from terminal #3 to start capturing from the beginning and pass 20 buffers to copy entries to, the C++ API returns a code saying there is more data available.

The Python CLI should not treat this as an error, but should continue to print what the API returned.

---------

Signed-off-by: Oosman Saeed <[email protected]>

[ROCm/amdsmi commit: 5b95d22]
@jayhawk-commits jayhawk-commits mentioned this pull request Jan 5, 2026
8 tasks
jharryma pushed a commit that referenced this pull request Jan 7, 2026
#2349)

* [RDC] Optimize RDC counter sampling with greedy packing algorithm (#1590)

* Optimize RDC counter sampling with greedy packing algorithm

This change significantly reduces the number of rocprofiler-sdk sample calls
by implementing a greedy packing algorithm that groups multiple counters into
the minimal number of hardware profiles.

Key improvements:
- Implement greedy packing algorithm to combine counters into minimal profiles
- Add ProfileSet structure to manage packed counter configurations
- Cache packed profile sets for reuse across queries
- Group telemetry field requests by GPU for bulk processing
- Reduce sample calls by ~35% (from 100 to 65 for typical workloads)

Performance impact:
- 13 counters now packed into 3 profiles (77% compression)
- Reduces overhead from profile creation and context switching
- More efficient utilization of hardware counter resources

Implementation details:
- Added create_profiles_for_counters() using greedy algorithm
- Added sample_counters_with_packing() for bulk sampling
- Modified telemetry layer to use rocp_lookup_bulk()
- Preserves all field transformations and special handling

Testing shows successful packing with expected performance gains.
No functional changes to external APIs or behavior.

Co-Authored-By: Ben Welton <[email protected]>

* Address PR review feedback

This commit addresses all review comments from the initial PR:

1. Fix division by zero risk in debug logging
   - Added check for empty counters vector before calculating compression ratio
   - Avoids potential division by zero when logging profile creation stats

2. Improve thread safety for statistics tracking
   - Changed static uint64_t to std::atomic<uint64_t> for thread-safe counters
   - Prevents race conditions in multi-threaded sampling scenarios

3. Remove unused variable
   - Removed unused profile_index variable that was incremented but never used
   - Cleaned up dead code

4. Clean up code formatting
   - Removed extra blank lines for consistency
   - Applied formatting fixes across modified files

5. Refactor code duplication between rocp_lookup and rocp_lookup_bulk
   - Created apply_field_transformation() helper function
   - Eliminates ~70 lines of duplicated switch statement logic
   - Centralizes field transformation logic in single location
   - Makes future maintenance easier

6. Document non-rocprofiler metrics handling
   - Added comments explaining how bulk lookup handles special cases
   - Clarifies that non-profiler fields like KFD_ID are handled in transformation

All changes maintain backward compatibility and pass compilation.

Co-Authored-By: Ben Welton <[email protected]>

---------

Co-authored-by: Ben Welton <[email protected]>
Co-authored-by: Adam Pryor <[email protected]>

* [rdc] maintain counter cache per agent

---------

Co-authored-by: Benjamin Welton <[email protected]>
Co-authored-by: Ben Welton <[email protected]>
Co-authored-by: Mythreya <[email protected]>
Co-authored-by: chiranjeevi-amd <[email protected]>
ammallya pushed a commit that referenced this pull request Jan 21, 2026
silence warnings in functional testsuite
ammallya pushed a commit that referenced this pull request Jan 21, 2026
silence warnings in functional testsuite

[ROCm/rocshmem commit: a9f2eff]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants