Replies: 8 comments 20 replies
-
|
I had a similar problem with my R9700. I can't help you fix it, per se, but I can tell you that the way I solved the problem was to completely uninstall ROCm and just use Vulkan (installed from kisak's repo). Vulkan is far faster than ROCm for the R9700 anyway. |
Beta Was this translation helpful? Give feedback.
-
|
I'm running dual R9700s (ASUS ProArt X870E-Creator, Ryzen 9 9900X) and hit the exact same segfault. Your card is stuck in deep sleep — sclk: 1 MHz, mclk: 96 MHz is basically a dead card trying to run inference. Kernel → 6.17+ (Debian backports or manual install from kernel.org) echo high | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level Lock mclk to highest level (~1258 MHz):cat /sys/class/drm/card0/device/pp_dpm_mclk If still broken, try AMDVLK as fallback instead of RADV. Also watch out for your Raphael iGPU (gfx1036, GPU[1]) — make sure it's not interfering with power management on the discrete card. |
Beta Was this translation helpful? Give feedback.
-
apt install -t trixie-backports linux-image-amd64 linux-headers-amd64Upgrading: Installing: Installing dependencies: Suggested packages: Summary: Notice: Ignoring file 'amdgpu-install_7.2.70200-1_all.deb' in directory '/etc/apt/sources.list.d/' as it has an invalid filename extension will give this a try maybe it will work |
Beta Was this translation helpful? Give feedback.
-
|
still got the segmentation fault and it is on the runtime HIP. switched to vulkan and got 4043 tokens 33s 119.96 t/s |
Beta Was this translation helpful? Give feedback.
-
|
followed it till the library but dont know what else to do for the ROCm "llama-server[5227] general protection fault ip:7f82fb269f4b sp:7ffc0152db10 error:0 in libamdhip64.so.7.2.70200[269f4b,7f82fb024000+41c000]" Mar 23 11:17:50 zeus kernel: [drm] Initialized amdgpu 3.64.0 for 0000:7a:00.0 on minor 1 For help, type "help". system_info: n_threads = 16 (n_threads_batch = 16) / 32 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | Running without SSL Thread 1 "llama-server" received signal SIGSEGV, Segmentation fault. Thread 38 (Thread 0x7ffe85ff36c0 (LWP 9103) "llama-server"): Thread 37 (Thread 0x7ffe867f46c0 (LWP 9102) "llama-server"): Thread 36 (Thread 0x7ffe86ff56c0 (LWP 9101) "llama-server"): Thread 35 (Thread 0x7ffe877f66c0 (LWP 9100) "llama-server"): Thread 34 (Thread 0x7ffe87ff76c0 (LWP 9099) "llama-server"): Thread 33 (Thread 0x7ffe887f86c0 (LWP 9098) "llama-server"): Thread 32 (Thread 0x7ffe88ff96c0 (LWP 9097) "llama-server"): Thread 31 (Thread 0x7ffe897fa6c0 (LWP 9096) "llama-server"): Thread 30 (Thread 0x7ffe89ffb6c0 (LWP 9095) "llama-server"): --Type for more, q to quit, c to continue without paging-- Thread 28 (Thread 0x7ffe8affd6c0 (LWP 9093) "llama-server"): Thread 27 (Thread 0x7ffe8b7fe6c0 (LWP 9092) "llama-server"): Thread 26 (Thread 0x7ffe8bfff6c0 (LWP 9091) "llama-server"): Thread 25 (Thread 0x7ffe908646c0 (LWP 9090) "llama-server"): Thread 24 (Thread 0x7ffe910656c0 (LWP 9089) "llama-server"): Thread 23 (Thread 0x7ffe91ffb6c0 (LWP 9088) "llama-server"): Thread 22 (Thread 0x7ffe927fc6c0 (LWP 9087) "llama-server"): Thread 21 (Thread 0x7ffe92ffd6c0 (LWP 9086) "llama-server"): Thread 20 (Thread 0x7ffe937fe6c0 (LWP 9085) "llama-server"): Thread 19 (Thread 0x7ffe93fff6c0 (LWP 9084) "llama-server"): Thread 18 (Thread 0x7ffe98df96c0 (LWP 9083) "llama-server"): Thread 17 (Thread 0x7ffe995fa6c0 (LWP 9082) "llama-server"): Thread 16 (Thread 0x7ffe99dfb6c0 (LWP 9081) "llama-server"): Thread 15 (Thread 0x7ffe9a5fc6c0 (LWP 9080) "llama-server"): Thread 14 (Thread 0x7ffe9adfd6c0 (LWP 9079) "llama-server"): Thread 13 (Thread 0x7ffe9b5fe6c0 (LWP 9078) "llama-server"): Thread 12 (Thread 0x7ffe9bdff6c0 (LWP 9077) "llama-server"): Thread 11 (Thread 0x7fffa09fe6c0 (LWP 9076) "llama-server"): Thread 10 (Thread 0x7fffa1ffb6c0 (LWP 9075) "llama-server"): Thread 9 (Thread 0x7fffa27fc6c0 (LWP 9074) "llama-server"): Thread 8 (Thread 0x7fffa2ffd6c0 (LWP 9073) "llama-server"): Thread 7 (Thread 0x7fffa37fe6c0 (LWP 9072) "llama-server"): Thread 6 (Thread 0x7fffa3fff6c0 (LWP 9071) "llama-server"): Thread 3 (Thread 0x7fffa11ff6c0 (LWP 9068) "llama-server"): Thread 2 (Thread 0x7fffa9bff6c0 (LWP 9067) "llama-server"): Thread 1 (Thread 0x7ffff6b01340 (LWP 9064) "llama-server"): Quit anyway? (y or n) y |
Beta Was this translation helpful? Give feedback.
-
|
from chatgpt: What changed With the cleaned loader stack: the /opt/amdgpu userspace mix is gone So the failure is not just warmup, not just --fit, and not just the /opt/amdgpu library contamination. Your earlier gdb backtrace already showed the fault chain: libamdhip64.so.7 That means the crash is still in the HIP runtime / ggml HIP execution path, specifically around fused RMS norm / graph capture / first real graph execution, not in generic model loading. |
Beta Was this translation helpful? Give feedback.
-
|
I'm running llama.cpp without issues with ROCm on my R9700. But I suggest to choose a different distro, Debian support for the R9700 is probably really bad. My setup:
It just worked out of the box. |
Beta Was this translation helpful? Give feedback.
-
|
The Core Problem Your GPU is gfx1201 (RDNA 4) — this is extremely new silicon and ROCm/HIP kernel support is still catching up. The segfault happens right here: 1 That's after all tensors are loaded to VRAM successfully, at the point where it tries to execute the first GPU compute operation. This strongly suggests the HIP kernels aren't working correctly for gfx1201. On top of that, you're running Qwen3.5-35B-A3B which is a MoE + Mamba/SSM hybrid — one of the most complex model architectures. The SSM (State Space Model) kernels are relatively new in llama.cpp and may not have gfx1201 codepaths at all yet. Also worth noting: your GPU reports VMM: no which can cause issues with large allocations.
You're defaulting to 262K context which allocates a 5.1 GB KV cache. Try: 1 This alone might get you past the segfault. Before debugging further, test with something basic like a Llama-3-8B-Q4_K_M or Qwen2.5-7B (non-MoE, non-Mamba). This tells you whether:
1 or if that doesn't work: 1 This forces ROCm to treat your GPU as a different (more supported) architecture. It's a common workaround for new AMD GPUs. Results vary — it might work, might crash differently, but it's worth testing. 1 If you installed a pre-built binary, it might not include gfx1201 kernels. You're on build 8368 — gfx1201 support patches may have landed after that. Pull latest and rebuild. ROCm 7.2 is very new itself. Sometimes pairing bleeding edge GPU + bleeding edge driver = double the bugs. ROCm 6.3 might actually have better stability for your use case. Your hardware is fine — this is a software support issue. gfx1201 is so new that the ROCm runtime and/or llama.cpp's HIP kernels don't fully support it yet. The Mamba/SSM compute kernels are the most likely failure point since they're the newest code in llama.cpp. If a simple model (non-MoE, non-SSM) works with -c 4096, then you've confirmed it's specifically the Qwen3.5 Mamba architecture that's broken on your GPU, and it's worth opening a GitHub issue with that finding. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
hi guys,
has anyone success with the AMD Radeon™ AI PRO R9700
i got an ASUS one from microcenter but it looks like it has issues.
bellow is want i did find so far.
/opt/llama.cpp/build/bin/llama-server --model /opt/Qwen3.5-35B-A3B-Q4_K_M_GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf -b 1024 -np 1 --fit off --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 --port 8001 --jinja --host 0.0.0.0 --no-warmup
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 32624 MiB):
Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, VRAM: 32624 MiB
build: 8368 (9e2e219) with GNU 12.2.0 for Linux x86_64
system info: n_threads = 16, n_threads_batch = 16, total_threads = 32
system_info: n_threads = 16 (n_threads_batch = 16) / 32 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
Running without SSL
init: using 31 threads for HTTP server
start: binding port with default address family
main: loading model
srv load_model: loading model '/opt/llama-projects/Qwen3.5-35B-A3B-Q4_K_M_GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf'
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon AI PRO R9700) (0000:03:00.0) - 32548 MiB free
llama_model_loader: loaded meta data with 52 key-value pairs and 733 tensors from /opt/llama-projects/Qwen3.5-35B-A3B-Q4_K_M_GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen35moe
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_k i32 = 20
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000
llama_model_loader: - kv 5: general.name str = Qwen3.5-35B-A3B
llama_model_loader: - kv 6: general.basename str = Qwen3.5-35B-A3B
llama_model_loader: - kv 7: general.quantized_by str = Unsloth
llama_model_loader: - kv 8: general.size_label str = 35B-A3B
llama_model_loader: - kv 9: general.license str = apache-2.0
llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-3...
llama_model_loader: - kv 11: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 12: general.base_model.count u32 = 1
llama_model_loader: - kv 13: general.base_model.0.name str = Qwen3.5 35B A3B
llama_model_loader: - kv 14: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 15: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3.5-3...
llama_model_loader: - kv 16: general.tags arr[str,2] = ["unsloth", "image-text-to-text"]
llama_model_loader: - kv 17: qwen35moe.block_count u32 = 40
llama_model_loader: - kv 18: qwen35moe.context_length u32 = 262144
llama_model_loader: - kv 19: qwen35moe.embedding_length u32 = 2048
llama_model_loader: - kv 20: qwen35moe.attention.head_count u32 = 16
llama_model_loader: - kv 21: qwen35moe.attention.head_count_kv u32 = 2
llama_model_loader: - kv 22: qwen35moe.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
llama_model_loader: - kv 23: qwen35moe.rope.freq_base f32 = 10000000.000000
llama_model_loader: - kv 24: qwen35moe.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 25: qwen35moe.expert_count u32 = 256
llama_model_loader: - kv 26: qwen35moe.expert_used_count u32 = 8
llama_model_loader: - kv 27: qwen35moe.attention.key_length u32 = 256
llama_model_loader: - kv 28: qwen35moe.attention.value_length u32 = 256
llama_model_loader: - kv 29: qwen35moe.expert_feed_forward_length u32 = 512
llama_model_loader: - kv 30: qwen35moe.expert_shared_feed_forward_length u32 = 512
llama_model_loader: - kv 31: qwen35moe.ssm.conv_kernel u32 = 4
llama_model_loader: - kv 32: qwen35moe.ssm.state_size u32 = 128
llama_model_loader: - kv 33: qwen35moe.ssm.group_count u32 = 16
llama_model_loader: - kv 34: qwen35moe.ssm.time_step_rank u32 = 32
llama_model_loader: - kv 35: qwen35moe.ssm.inner_size u32 = 4096
llama_model_loader: - kv 36: qwen35moe.full_attention_interval u32 = 4
llama_model_loader: - kv 37: qwen35moe.rope.dimension_count u32 = 64
llama_model_loader: - kv 38: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 39: tokenizer.ggml.pre str = qwen35
llama_model_loader: - kv 40: tokenizer.ggml.tokens arr[str,248320] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 41: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 42: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 43: tokenizer.ggml.eos_token_id u32 = 248046
llama_model_loader: - kv 44: tokenizer.ggml.padding_token_id u32 = 248055
llama_model_loader: - kv 45: tokenizer.chat_template str = {%- set image_count = namespace(value...
llama_model_loader: - kv 46: general.quantization_version u32 = 2
llama_model_loader: - kv 47: general.file_type u32 = 15
llama_model_loader: - kv 48: quantize.imatrix.file str = Qwen3.5-35B-A3B-GGUF/imatrix_unsloth....
llama_model_loader: - kv 49: quantize.imatrix.dataset str = unsloth_calibration_Qwen3.5-35B-A3B.txt
llama_model_loader: - kv 50: quantize.imatrix.entries_count u32 = 510
llama_model_loader: - kv 51: quantize.imatrix.chunks_count u32 = 80
llama_model_loader: - type f32: 301 tensors
llama_model_loader: - type q8_0: 60 tensors
llama_model_loader: - type q4_K: 165 tensors
llama_model_loader: - type q5_K: 60 tensors
llama_model_loader: - type q6_K: 67 tensors
llama_model_loader: - type mxfp4: 80 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 19.74 GiB (4.89 BPW)
load: 0 unused tokens
load: printing all EOG tokens:
load: - 248044 ('<|endoftext|>')
load: - 248046 ('<|im_end|>')
load: - 248063 ('<|fim_pad|>')
load: - 248064 ('<|repo_name|>')
load: - 248065 ('<|file_sep|>')
load: special tokens cache size = 33
load: token to piece cache size = 1.7581 MB
print_info: arch = qwen35moe
print_info: vocab_only = 0
print_info: no_alloc = 0
print_info: n_ctx_train = 262144
print_info: n_embd = 2048
print_info: n_embd_inp = 2048
print_info: n_layer = 40
print_info: n_head = 16
print_info: n_head_kv = 2
print_info: n_rot = 64
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 256
print_info: n_embd_head_v = 256
print_info: n_gqa = 8
print_info: n_embd_k_gqa = 512
print_info: n_embd_v_gqa = 512
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 0
print_info: n_expert = 256
print_info: n_expert_used = 8
print_info: n_expert_groups = 0
print_info: n_group_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 40
print_info: rope scaling = linear
print_info: freq_base_train = 10000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 262144
print_info: rope_yarn_log_mul = 0.0000
print_info: rope_finetuned = unknown
print_info: mrope sections = [11, 11, 10, 0]
print_info: ssm_d_conv = 4
print_info: ssm_d_inner = 4096
print_info: ssm_d_state = 128
print_info: ssm_dt_rank = 32
print_info: ssm_n_group = 16
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 35B.A3B
print_info: model params = 34.66 B
print_info: general.name = Qwen3.5-35B-A3B
print_info: vocab type = BPE
print_info: n_vocab = 248320
print_info: n_merges = 247587
print_info: BOS token = 11 ','
print_info: EOS token = 248046 '<|im_end|>'
print_info: EOT token = 248046 '<|im_end|>'
print_info: PAD token = 248055 '<|vision_pad|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 248060 '<|fim_prefix|>'
print_info: FIM SUF token = 248062 '<|fim_suffix|>'
print_info: FIM MID token = 248061 '<|fim_middle|>'
print_info: FIM PAD token = 248063 '<|fim_pad|>'
print_info: FIM REP token = 248064 '<|repo_name|>'
print_info: FIM SEP token = 248065 '<|file_sep|>'
print_info: EOG token = 248044 '<|endoftext|>'
print_info: EOG token = 248046 '<|im_end|>'
print_info: EOG token = 248063 '<|fim_pad|>'
print_info: EOG token = 248064 '<|repo_name|>'
print_info: EOG token = 248065 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 39 repeating layers to GPU
load_tensors: offloaded 41/41 layers to GPU
load_tensors: CPU_Mapped model buffer size = 272.81 MiB
load_tensors: ROCm0 model buffer size = 19939.68 MiB
..................................................................................................
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|im_end|> logit bias = -inf
common_init_result: added <|fim_pad|> logit bias = -inf
common_init_result: added <|repo_name|> logit bias = -inf
common_init_result: added <|file_sep|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 262144
llama_context: n_ctx_seq = 262144
llama_context: n_batch = 1024
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 10000000.0
llama_context: freq_scale = 1
llama_context: ROCm_Host output buffer size = 0.95 MiB
llama_kv_cache: ROCm0 KV buffer size = 5120.00 MiB
llama_kv_cache: size = 5120.00 MiB (262144 cells, 10 layers, 1/1 seqs), K (f16): 2560.00 MiB, V (f16): 2560.00 MiB
llama_memory_recurrent: ROCm0 RS buffer size = 62.81 MiB
llama_memory_recurrent: size = 62.81 MiB ( 1 cells, 40 layers, 1 seqs), R (f32): 2.81 MiB, S (f32): 60.00 MiB
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve: ROCm0 compute buffer size = 804.02 MiB
sched_reserve: ROCm_Host compute buffer size = 520.02 MiB
sched_reserve: graph nodes = 3729
sched_reserve: graph splits = 2
sched_reserve: reserve took 77.28 ms, sched copies = 1
srv load_model: initializing slots, n_slots = 1
Segmentation fault
/opt/rocm/bin/rocminfo
ROCk module version 6.16.13 is loaded
HSA System Attributes
Runtime Version: 1.18
Runtime Ext Version: 1.15
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
XNACK enabled: NO
DMAbuf Support: YES
VMM Support: YES
==========
HSA Agents
Agent 1
Name: AMD Ryzen 9 9950X3D 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 9950X3D 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 49152(0xc000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4300
BDFID: 0
Internal Node ID: 0
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 261395552(0xf949460) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 261395552(0xf949460) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 261395552(0xf949460) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 261395552(0xf949460) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx1201
Uuid: GPU-b44207ff2cd402f4
Marketing Name: AMD Radeon AI PRO R9700
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 8192(0x2000) KB
L3: 65536(0x10000) KB
Chip ID: 30033(0x7551)
ASIC Revision: 1(0x1)
Cacheline Size: 256(0x100)
Max Clock Freq. (MHz): 2350
BDFID: 768
Internal Node ID: 1
Compute Unit: 64
SIMDs per CU: 2
Shader Engines: 4
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 2147483647(0x7fffffff)
y 65535(0xffff)
z 65535(0xffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 128
SDMA engine uCode:: 662
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 33406976(0x1fdc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1201
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 2147483647(0x7fffffff)
y 65535(0xffff)
z 65535(0xffff)
FBarrier Max Size: 32
ISA 2
Name: amdgcn-amd-amdhsa--gfx12-generic
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 2147483647(0x7fffffff)
y 65535(0xffff)
z 65535(0xffff)
FBarrier Max Size: 32
*** Done ***
rocm-smi --showuse --showclocks --showpower
============================ ROCm System Management Interface ============================
WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status
=============================== Current clock frequencies ================================
GPU[0] : dcefclk clock level: 1: (219Mhz)
GPU[0] : fclk clock level: 1: (582Mhz)
GPU[0] : mclk clock level: 0: (96Mhz)
GPU[0] : sclk clock level: 1: (59Mhz)
GPU[0] : socclk clock level: 0: (417Mhz)
GPU[0] : pcie clock level: 2 (32.0GT/s x16)
GPU[1] : mclk clock level: 0: (1800Mhz)
GPU[1] : sclk clock level: 0: (600Mhz)
GPU[1] : socclk clock level: 1: (1200Mhz)
=================================== Power Consumption ====================================
GPU[0] : Average Graphics Package Power (W): 15.0
GPU[1] : Current Socket Graphics Package Power (W): 0.012
=================================== % time GPU is busy ===================================
GPU[0] : GPU use (%): 2
GPU[1] : GPU use (%): 0
================================== End of ROCm SMI Log ===================================
echo on | sudo tee /sys/bus/pci/devices/0000:03:00.0/power/control
/opt/rocm/bin/rocm-smi
WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status
========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
0 1 0x7551, 32448 26.0°C 15.0W N/A, N/A, 0 1Mhz 96Mhz 29.8% auto 300.0W 0% 0%
1 2 0x13c0, 2327 38.0°C 0.012W N/A, N/A, 0 N/A 1800Mhz 0% auto N/A 63% 0%
=============================================== End of ROCm SMI Log ================================================
/opt/rocm/bin/rocm-smi --showproductname --showbus --showuniqueid
ls -l /sys/class/drm/
find /sys -path 'amdgpuruntime_status' -o -path 'drm/card/device/power/runtime_status' 2>/dev/null | xargs -r -I{} sh -c 'echo === {}; cat {}'
dmesg | grep -iE 'amdgpu|kfd|drm' | tail -n 200
============================ ROCm System Management Interface ============================
WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status
======================================= Unique ID ========================================
GPU[0] : Unique ID: 0xb44207ff2cd402f4
GPU[1] : Unique ID: 0x0
======================================= PCI Bus ID =======================================
GPU[0] : PCI Bus: 0000:03:00.0
GPU[1] : PCI Bus: 0000:7A:00.0
====================================== Product Info ======================================
GPU[0] : Card Series: AMD Radeon AI PRO R9700
GPU[0] : Card Model: 0x7551
GPU[0] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0] : Card SKU: G287BP00
GPU[0] : Subsystem ID: 0x0626
GPU[0] : Device Rev: 0xc0
GPU[0] : Node ID: 1
GPU[0] : GUID: 32448
GPU[0] : GFX Version: gfx1201
GPU[1] : Card Series: AMD Radeon Graphics
GPU[1] : Card Model: 0x13c0
GPU[1] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[1] : Card SKU: RAPHAEL
GPU[1] : Subsystem ID: 0x7e59
GPU[1] : Device Rev: 0xc9
GPU[1] : Node ID: 2
GPU[1] : GUID: 2327
GPU[1] : GFX Version: gfx1036
================================== End of ROCm SMI Log ===================================
hipconfig --full
HIP version: 7.2.26015-fc0010cf6a
==hipconfig
HIP_PATH :/opt/rocm-7.2.0
ROCM_PATH :/opt/rocm-7.2.0
HIP_COMPILER :clang
HIP_PLATFORM :amd
HIP_RUNTIME :rocclr
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-7.2.0/include -I/include
==hip-clang
HIP_CLANG_PATH :/opt/rocm-7.2.0/lib/llvm/bin
AMD clang version 22.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-7.2.0 26014 7b800a19466229b8479a78de19143dc33c3ab9b5)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-7.2.0/lib/llvm/bin
Configuration file: /opt/rocm-7.2.0/lib/llvm/bin/clang++.cfg
AMD LLVM version 22.0.0git
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: znver5
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :
-O3
hip-clang-ldflags :
--driver-mode=g++ -O3 --hip-link
== Environment Variables
PATH =/root/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
== Linux Kernel
Hostname :
Linux ******* 6.1.0-42-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.159-1 (2025-12-30) x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 12 (bookworm)
Release: 12
Codename: bookworm
lspci | grep -i vga
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 7551 (rev c0)
7a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Granite Ridge [Radeon Graphics] (rev c9)
Beta Was this translation helpful? Give feedback.
All reactions