Skip to content

Support for AMD ROCm compatible GPUs.#559

Open
cupertinomiranda wants to merge 1 commit intoostris:mainfrom
cupertinomiranda:main
Open

Support for AMD ROCm compatible GPUs.#559
cupertinomiranda wants to merge 1 commit intoostris:mainfrom
cupertinomiranda:main

Conversation

@cupertinomiranda
Copy link

This patch adds support for AMD ROCm capable GPUs to the project, for Linux at the moment.

Hope you find the implementation sane and complete.
I have been using it for the last couple of days with great success.

dashboard job1

@dkspwndj
Copy link

dkspwndj commented Dec 2, 2025

No plan at windows radeon support?

@cupertinomiranda
Copy link
Author

cupertinomiranda commented Dec 2, 2025

No plan at windows radeon support?

If by Radeon support you mean any card that has ROCm support, then the answer is yes. However, I will need some time to setup a windows environment to try it. It would be nice if someone with a Windows setup could contribute it, considering that it could be a very easy patch.

It works with other low end cards as well. It is in no way limited to the AI PRO R9700.


async function getAMDGpuStats(isWindows: boolean) {
// Execute command
const command = 'amd-smi static --json && echo ";" && amd-smi metric --json';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command will fail on systems with an inactive AMD iGPU in addition to an AMD discrete GPU.

The amd-smi metric JSON returns an empty usage attribute in gpu_data for an inactive iGPU looking like:

"usage": "N/A",

@ptelder
Copy link

ptelder commented Dec 7, 2025

There's an issue for systems with an inactive iGPU on line 147 of ui/src/app/api/gpu/route.ts. The JSON returned for an inactive iGPU has an empty usage attribute.

Here's what the first chunk of an iGPU looks like:

"gpu_data": [
    {
        "gpu": 1,
        "usage": "N/A",
        "power": {
            "socket_power": "N/A",
            "gfx_voltage": "N/A",
            "soc_voltage": "N/A",
            "mem_voltage": "N/A",
            "throttle_status": "N/A",
            "power_management": "N/A"
        },

The missing data kills the parsing and prevents other GPUs from being detected.

@cupertinomiranda
Copy link
Author

Thanks for the comment. Will change the patch with a renewed improved version.

@cupertinomiranda cupertinomiranda force-pushed the main branch 2 times, most recently from 3c65577 to b75d940 Compare December 14, 2025 17:48
@peyloride
Copy link

Update, latest bitsandbytes packages now include rocm support. I was able to run with it and your changes.

@TawusGames
Copy link

TawusGames commented Jan 18, 2026

After manually editing all the changes, I get an error like this when I define a new job. I couldn’t understand what the problem is.

Running 1 job
Error running job: Failed to import diffusers.models.autoencoders.autoencoder_tiny because of the following error (look up to see its traceback):
name 'logger' is not defined
Error running on_error: cannot access local variable 'job' where it is not associated with a value
========================================
Result:
 - 0 completed jobs
 - 1 failure
========================================
Traceback (most recent call last):
Traceback (most recent call last):
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 87, in _update_torch_safe_globals
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 87, in _update_torch_safe_globals
        from torchao.dtypes.uintx.uint4_layout import UInt4Tensorfrom torchao.dtypes.uintx.uint4_layout import UInt4Tensor
ModuleNotFoundErrorModuleNotFoundError: : No module named 'torchao.dtypes.uintx.uint4_layout'No module named 'torchao.dtypes.uintx.uint4_layout'
During handling of the above exception, another exception occurred:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1016, in _get_module
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1016, in _get_module
        return importlib.import_module("." + module_name, self.__name__)return importlib.import_module("." + module_name, self.__name__)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)return _bootstrap._gcd_import(name[level:], package, level)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/autoencoders/__init__.py", line 1, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/autoencoders/__init__.py", line 1, in <module>
        from .autoencoder_asym_kl import AsymmetricAutoencoderKLfrom .autoencoder_asym_kl import AsymmetricAutoencoderKL
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/autoencoders/autoencoder_asym_kl.py", line 22, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/autoencoders/autoencoder_asym_kl.py", line 22, in <module>
        from ..modeling_utils import ModelMixinfrom ..modeling_utils import ModelMixin
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/modeling_utils.py", line 41, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/modeling_utils.py", line 41, in <module>
        from ..quantizers import DiffusersAutoQuantizer, DiffusersQuantizerfrom ..quantizers import DiffusersAutoQuantizer, DiffusersQuantizer
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/__init__.py", line 16, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/__init__.py", line 16, in <module>
        from .auto import DiffusersAutoQuantizerfrom .auto import DiffusersAutoQuantizer
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/auto.py", line 35, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/auto.py", line 35, in <module>
        from .torchao import TorchAoHfQuantizerfrom .torchao import TorchAoHfQuantizer
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/__init__.py", line 15, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/__init__.py", line 15, in <module>
        from .torchao_quantizer import TorchAoHfQuantizerfrom .torchao_quantizer import TorchAoHfQuantizer
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 108, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 108, in <module>
        _update_torch_safe_globals()_update_torch_safe_globals()
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 93, in _update_torch_safe_globals
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 93, in _update_torch_safe_globals
        logger.warning(logger.warning(
        ^^^^^^^^^^^^
NameErrorNameError: : name 'logger' is not definedname 'logger' is not defined
The above exception was the direct cause of the following exception:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 120, in <module>
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 120, in <module>
        main()main()
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 108, in main
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 108, in main
        raise eraise e
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 95, in main
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 95, in main
        job = get_job(config_file, args.name)job = get_job(config_file, args.name)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/system/Depolama/ai-toolkit/toolkit/job.py", line 28, in get_job
  File "/run/media/system/Depolama/ai-toolkit/toolkit/job.py", line 28, in get_job
        from jobs import ExtensionJobfrom jobs import ExtensionJob
  File "/run/media/system/Depolama/ai-toolkit/jobs/__init__.py", line 1, in <module>
  File "/run/media/system/Depolama/ai-toolkit/jobs/__init__.py", line 1, in <module>
        from .BaseJob import BaseJobfrom .BaseJob import BaseJob
  File "/run/media/system/Depolama/ai-toolkit/jobs/BaseJob.py", line 5, in <module>
  File "/run/media/system/Depolama/ai-toolkit/jobs/BaseJob.py", line 5, in <module>
        from jobs.process import BaseProcessfrom jobs.process import BaseProcess
  File "/run/media/system/Depolama/ai-toolkit/jobs/process/__init__.py", line 6, in <module>
  File "/run/media/system/Depolama/ai-toolkit/jobs/process/__init__.py", line 6, in <module>
        from .TrainVAEProcess import TrainVAEProcessfrom .TrainVAEProcess import TrainVAEProcess
  File "/run/media/system/Depolama/ai-toolkit/jobs/process/TrainVAEProcess.py", line 18, in <module>
  File "/run/media/system/Depolama/ai-toolkit/jobs/process/TrainVAEProcess.py", line 18, in <module>
        from toolkit.image_utils import show_tensorsfrom toolkit.image_utils import show_tensors
  File "/run/media/system/Depolama/ai-toolkit/toolkit/image_utils.py", line 14, in <module>
  File "/run/media/system/Depolama/ai-toolkit/toolkit/image_utils.py", line 14, in <module>
        from diffusers import AutoencoderTinyfrom diffusers import AutoencoderTiny
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1007, in __getattr__
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1007, in __getattr__
        value = getattr(module, name)value = getattr(module, name)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1006, in __getattr__
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1006, in __getattr__
        module = self._get_module(self._class_to_module[name])module = self._get_module(self._class_to_module[name])
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1018, in _get_module
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1018, in _get_module
        raise RuntimeError(raise RuntimeError(
RuntimeErrorRuntimeError: : Failed to import diffusers.models.autoencoders.autoencoder_tiny because of the following error (look up to see its traceback):
name 'logger' is not definedFailed to import diffusers.models.autoencoders.autoencoder_tiny because of the following error (look up to see its traceback):
name 'logger' is not defined

Screenshot_2026-01-18_14-24-03

@MrDrMcCoy
Copy link

@cupertinomiranda Thank you for putting this together! Unfortunately, I could not get it to work. After completing the build and running the UI, it fails like so:

image

With this repeating in the console:

[UI] Error fetching GPU stats: TypeError: Cannot read properties of undefined (reading 'value')
[UI]     at <unknown> (.next/server/app/api/gpu/route.js:1:3398)
[UI]     at Array.map (<anonymous>)
[UI]     at y (.next/server/app/api/gpu/route.js:1:2968)
[UI]     at async d (.next/server/app/api/gpu/route.js:1:1313)

Would you be so kind as to contribute a Dockerfile for ROCm? I have the same PRO 9700 XT card as you.

@cupertinomiranda
Copy link
Author

cupertinomiranda commented Jan 22, 2026

Would you be so kind as to contribute a Dockerfile for ROCm? I have the same PRO 9700 XT card as you.

Sorry, but that is not my territory, would not know where to start.

@cupertinomiranda cupertinomiranda force-pushed the main branch 2 times, most recently from 8fa5eba to 53e91ee Compare January 29, 2026 10:41
@cupertinomiranda
Copy link
Author

This is a pull request, not an issue, or a reddit post.

Please! If you have any solutions, contribute them as a pull request on my own forked repo.
Do not post any random .txt files with solutions, this will just add to nothing ever be done.
No wonder why the author of ai-toolkit is not taking any pull requests.

@tannisroot
Copy link

tannisroot commented Jan 30, 2026

This is a pull request, not an issue, or a reddit post.

Please! If you have any solutions, contribute them as a pull request on my own forked repo.
Do not post any random .txt files with solutions, this will just add to nothing ever be done.
No wonder why the author of ai-toolkit is not taking any pull requests.

Sorry, I didn't mean this to befome a support section, I merely provided feedback for the PR because for me, it doesn't work out of the box due to "amd-smi" issues with current instructions, and I just wanted the issue to not be present when it's merged, as other users may encounter it after it's merged and then have to burden ai-toolkit devs.
For what it's worth, the conversation resulted in finding out that rocm-sdk from TheRock repo (rocm-sdk-core to be specifically) actually ships an amd-smi binary that works with ai-toolkit as I was successfully able to train a Lora with your PR and it.
Also, the issue hasn't been looked at in over a month, I doubt our messages had anything to do with it not being reviewed atm.
But still, thank you very much for adding this change! I hope it does get merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants