-
Notifications
You must be signed in to change notification settings - Fork 301
Description
Disclaimer
- I have read and understood the disclaimer.
Application version
0.5.2
System version
0.2.7
Device model
JetKVM
Extension model
None
Remote device Hardware
Minisforum UM773 Lite
Remote device OS
Ubuntu 24.04.3 LTS
Bug description
Summary
On systems with AMD GPUs, the kernel intermittently becomes unresponsive following display events. Kernel logs show repeated AMDGPU display manager workqueue stalls (dm_irq_work_func) associated with EDID read failures. In some cases this leads to a full system hang requiring a hard reboot.
Description
This issue has been observed on two separate machines with identical hardware models and the same KVM setup. On both systems, kernel logs show AMDGPU display-related errors around the time of the incident, including:
amdgpu: [drm] *ERROR* No EDID readworkqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us- Peripheral reset messages such as
sr ... Power-on or device reset occurred
In some occurrences, the dm_irq_work_func workqueue hog warning appears and the system recovers. In other cases, the system becomes completely unresponsive (no local input) and requires a power cycle. There is no clean shutdown recorded.
The issue appears to be triggered by display events such as monitor sleep/wake or display switching. Both systems are connected to a JetKVM, and EDID read failures are logged near the time of the stall.
This behavior suggests a deadlock or prolonged stall in the AMDGPU display manager workqueue when handling EDID or hotplug events.
Expected behavior
Display EDID or hotplug failures should be handled gracefully without prolonged kernel workqueue stalls or system hangs.
Actual behavior
The AMDGPU display manager workqueue (dm_irq_work_func) repeatedly hogs the CPU, and in some cases the system becomes fully unresponsive and must be rebooted.
Reproducibility
Intermittent. But reproduced on two identical systems.
Environment
- Distribution:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.3 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
- Kernel version:
$ uname -r
6.14.0-37-generic
- GPU model:
$ lspci -nnk | grep -A3 -E 'VGA|Display'
34:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] [1002:1681] (rev 0a)
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] [1002:1681]
Kernel driver in use: amdgpu
Kernel modules: amdgpu
Logs
system 1
Dec 29 04:30:10 kernel: amdgpu 0000:34:00.0: [drm] *ERROR* No EDID read.
Dec 29 04:30:11 kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
Dec 29 04:30:12 kernel: sr 0:0:0:0: Power-on or device reset occurred
system 2
Jan 01 05:55:59 kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 4 times
Jan 01 05:56:00 kernel: sr 0:0:0:0: Power-on or device reset occurred
Jan 05 21:22:21 kernel: amdgpu 0000:34:00.0: [drm] *ERROR* No EDID read.
Jan 05 21:22:22 kernel: sr 0:0:0:0: Power-on or device reset occurred
Jan 06 16:44:30 kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 11 times
Jan 06 16:44:30 kernel: sr 0:0:0:0: Power-on or device reset occurred