Skip to content

Conversation

@kaixuanliu
Copy link
Contributor

No description provided.

@Rocketknight1
Copy link
Member

cc @IlyasMoutawwakil maybe?

@kaixuanliu kaixuanliu marked this pull request as draft December 23, 2025 01:23
@kaixuanliu kaixuanliu marked this pull request as ready for review December 23, 2025 09:16
Signed-off-by: Liu, Kaixuan <[email protected]>
@kaixuanliu
Copy link
Contributor Author

@IlyasMoutawwakil Hi, pls help review when you are available, thx!

type=str,
default="cuda",
help="Device to run benchmarks on (cuda, xpu, cpu). If not specified, will auto-detect.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent, the docstring says if not specified, will auto detect, but the default is set to cuda rather than auto

self.logger = logger if logger is not None else logging.getLogger(__name__)
self.device_type = device_type

# Detect available accelerators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean "detect the number of available accelerators"?

device_type = None
if hasattr(torch, "cuda") and torch.cuda.is_available():
device_type = "cuda"
elif hasattr(torch, "xpu") and torch.xpu.is_available():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_torch_xpu_available from transformers


def __init__(self) -> None:
self.device_name, self.device_memory_total = get_device_name_and_memory_total()
def __init__(self, device_type: str = "cuda") -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default to None?

self.gpu_name, self.gpu_memory_total_gb = get_device_name_and_memory_total()
# Auto-detect device type
device_type = None
if hasattr(torch, "cuda") and torch.cuda.is_available():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suppose torch.cuda.is_available() is enough

Args:
device_type: The type of device to query ('cuda', 'xpu', etc.)
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems device_type is unnecessary, just detect the available device and use torch_accelerator_module like:
`device_type = torch.accelerator.current_accelerator().type if is_torch_accelerator_available() else "cuda"

torch_accelerator_module = getattr(torch, device_type, torch.cuda)

torch_accelerator_module .get_device_properties(0).name
...
`

f"Time generate done in {e2e_latency:.2f} seconds. Memory usage: {torch.cuda.memory_allocated() / 1024**2:.2f} MB"
)

# Get memory usage based on device type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use device_type to get torch_accelerator_module?

"""Profile the latency of a call to model.generate() with the given (inputs) and (max_new_tokens)."""
# Build activities list based on available devices
activities = [torch.profiler.ProfilerActivity.CPU]
if hasattr(torch, "xpu") and torch.xpu.is_available():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_torch_xpu_available from transformers

@kaixuanliu
Copy link
Contributor Author

Close this PR first.

@kaixuanliu kaixuanliu closed this Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants