Skip to content

Add multi-GPU track support#5220

Open
dreveman wants to merge 1 commit intomainfrom
dev/reveman/multiple-gpus
Open

Add multi-GPU track support#5220
dreveman wants to merge 1 commit intomainfrom
dev/reveman/multiple-gpus

Conversation

@dreveman
Copy link
Collaborator

@dreveman dreveman commented Mar 21, 2026

Add gpu_id dimension to render stage tracks in the trace processor,
matching what gpu counters already do. In the UI, extract the gpu_id
dimension and create per-GPU sub-groups within each track group when
multiple GPUs are present. This follows the pattern already used by
the GPU Frequency plugin, which already handles multiple GPUs by
creating per-GPU tracks under a shared "GPU Frequency" sub-group.

Single GPU (unchanged):
  GPU
    Gpu 0 Frequency       [_____----____--]
    Counters
      CounterX            [____------__--_]
    Hardware Queues
      Queue 0             [event1] [event2]
    GPU Memory            [____-----------]

Multiple GPUs:
  GPU
    GPU Frequency
      Gpu 0 Frequency     [_____----____--]
      Gpu 1 Frequency     [___----_-------]
    Counters
      GPU 0 Counters
        CounterX          [____------__--_]
        CounterY          [__----_____--__]
      GPU 1 Counters
        CounterX          [__--------__--_]
        CounterY          [__----__-__--__]
    Hardware Queues
      GPU 0 Hardware Queues
        Queue 0           [event1] [event2]
        Queue 1           [event5] [event6]
      GPU 1 Hardware Queues
        Queue 0           [event7] [event8]
        Queue 1           [event0] [event1]
    GPU Memory            [____-----------]

Bug: #5097

@dreveman dreveman requested a review from a team as a code owner March 21, 2026 05:18
@github-actions
Copy link

github-actions bot commented Mar 21, 2026

@LalitMaganti
Copy link
Member

Before we do anything else the key thing I think we need here is into introduce a new "GPU" table and "ugpu" concept. Multi-machine traces are very much a thing now and if we're going to do this properly we should follow the example of ucpu and actually model this in a way we can support it.

The only reason we didn't bother until now is that our support was very minimal and enough to get by. But now with CLs like this, I would take the time go model this properly.

Add gpu_id dimension to render stage tracks in the trace processor,
matching what gpu counters already do. In the UI, extract the gpu_id
dimension and create per-GPU sub-groups within each track group when
multiple GPUs are present. This follows the pattern already used by
the GPU Frequency plugin, which already handles multiple GPUs by
creating per-GPU tracks under a shared "GPU Frequency" sub-group.

Single GPU (unchanged):
  GPU
    Gpu 0 Frequency       [_____----____--]
    Counters
      CounterX            [____------__--_]
    Hardware Queues
      Queue 0             [event1] [event2]
    GPU Memory            [____-----------]

Multiple GPUs:
  GPU
    GPU Frequency
      Gpu 0 Frequency     [_____----____--]
      Gpu 1 Frequency     [___----_-------]
    Counters
      GPU 0 Counters
        CounterX          [____------__--_]
        CounterY          [__----_____--__]
      GPU 1 Counters
        CounterX          [__--------__--_]
        CounterY          [__----__-__--__]
    Hardware Queues
      GPU 0 Hardware Queues
        Queue 0           [event1] [event2]
        Queue 1           [event5] [event6]
      GPU 1 Hardware Queues
        Queue 0           [event7] [event8]
        Queue 1           [event0] [event1]
    GPU Memory            [____-----------]
@dreveman dreveman force-pushed the dev/reveman/multiple-gpus branch from e2c3372 to 21bab83 Compare March 21, 2026 16:06
@dreveman
Copy link
Collaborator Author

Before we do anything else the key thing I think we need here is into introduce a new "GPU" table and "ugpu" concept. Multi-machine traces are very much a thing now and if we're going to do this properly we should follow the example of ucpu and actually model this in a way we can support it.

The only reason we didn't bother until now is that our support was very minimal and enough to get by. But now with CLs like this, I would take the time go model this properly.

I'll take a look at ucpu. I very much want to support the multi-machine case and was planning to tackle that next but can look at what we need there first instead.

Note that I changed this a bit in the last version of this PR. It's now just the minimal changes needed to not have counters and renderstages be completely broken when you have multiple GPUs. I left the GPU Frequency logic unchanged as that has multi-gpu support already and I tried to make counters and renderstages as consistent with that as possible. Maybe now that this simpler we can land it before we tackle the multi-machine use-case? Either way is fine with me though. Something like this would be nice temporarily if the multi-machine use-case takes some to get right though.


// For multi-GPU traces, create per-GPU sub-groups within each
// group (e.g., "GPU 0 Counters" inside "Counters").
if (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not pollute the trace processor track plugin with custom code like this. If you need this, you should remove it from this plugin and write a new plugin (or some existing GPU plugin) and add this there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. We already have a plugin for GPU Frequency. I'll see how things look if I add one for GPU Counters and one for GPU Hardware Queues.

@LalitMaganti
Copy link
Member

I'm fine to land something temporary if you're going to follow up with a proper multi-machine suitable solution based on ugpu.

@dreveman
Copy link
Collaborator Author

I'm fine to land something temporary if you're going to follow up with a proper multi-machine suitable solution based on ugpu.

I'll follow up with multi-machine support for sure as that's critical to me. Took a brief look at ucpu and not completely convinced that we need to introduce a ugpu concept. Maybe it makes sense longer term if end up building a lot of things that needs to be aware of the specific gpu but it might be simpler to just require code that needs to be aware to use gpu_id and machine_id as key. I'll try both ways and we can decide.

@LalitMaganti
Copy link
Member

The problem with having dural keys is that it's significantly less efficient especially when doing joins between tables.

@dreveman
Copy link
Collaborator Author

The problem with having dural keys is that it's significantly less efficient especially when doing joins between tables.

Ack, I'll focus on doing this using a new ugpu then. machine_id * 4096 + gpu_id will likely work well for this too. just need to make sure no one uses gpu_ids greater or equal to 4096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants