Support for AMD ROCm compatible GPUs. by cupertinomiranda · Pull Request #559 · ostris/ai-toolkit

cupertinomiranda · 2025-12-02T22:08:15Z

This patch adds support for AMD ROCm capable GPUs to the project, for Linux at the moment.

Hope you find the implementation sane and complete.
I have been using it for the last couple of days with great success.

dkspwndj · 2025-12-02T22:12:23Z

No plan at windows radeon support?

cupertinomiranda · 2025-12-02T22:28:24Z

No plan at windows radeon support?

If by Radeon support you mean any card that has ROCm support, then the answer is yes. However, I will need some time to setup a windows environment to try it. It would be nice if someone with a Windows setup could contribute it, considering that it could be a very easy patch.

It works with other low end cards as well. It is in no way limited to the AI PRO R9700.

ptelder · 2025-12-07T07:51:20Z

ui/src/app/api/gpu/route.ts

+
+async function getAMDGpuStats(isWindows: boolean) {
+  // Execute command
+  const command = 'amd-smi static --json && echo ";" && amd-smi metric --json';


This command will fail on systems with an inactive AMD iGPU in addition to an AMD discrete GPU.

The amd-smi metric JSON returns an empty usage attribute in gpu_data for an inactive iGPU looking like:

"usage": "N/A",

ptelder · 2025-12-07T07:54:16Z

There's an issue for systems with an inactive iGPU on line 147 of ui/src/app/api/gpu/route.ts. The JSON returned for an inactive iGPU has an empty usage attribute.

Here's what the first chunk of an iGPU looks like:

"gpu_data": [
    {
        "gpu": 1,
        "usage": "N/A",
        "power": {
            "socket_power": "N/A",
            "gfx_voltage": "N/A",
            "soc_voltage": "N/A",
            "mem_voltage": "N/A",
            "throttle_status": "N/A",
            "power_management": "N/A"
        },

The missing data kills the parsing and prevents other GPUs from being detected.

cupertinomiranda · 2025-12-07T09:37:37Z

Thanks for the comment. Will change the patch with a renewed improved version.

peyloride · 2025-12-23T18:41:59Z

Update, latest bitsandbytes packages now include rocm support. I was able to run with it and your changes.

TawusGames · 2026-01-18T11:22:54Z

After manually editing all the changes, I get an error like this when I define a new job. I couldn’t understand what the problem is.

Running 1 job
Error running job: Failed to import diffusers.models.autoencoders.autoencoder_tiny because of the following error (look up to see its traceback):
name 'logger' is not defined
Error running on_error: cannot access local variable 'job' where it is not associated with a value
========================================
Result:
 - 0 completed jobs
 - 1 failure
========================================
Traceback (most recent call last):
Traceback (most recent call last):
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 87, in _update_torch_safe_globals
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 87, in _update_torch_safe_globals
        from torchao.dtypes.uintx.uint4_layout import UInt4Tensorfrom torchao.dtypes.uintx.uint4_layout import UInt4Tensor
ModuleNotFoundErrorModuleNotFoundError: : No module named 'torchao.dtypes.uintx.uint4_layout'No module named 'torchao.dtypes.uintx.uint4_layout'
During handling of the above exception, another exception occurred:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1016, in _get_module
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1016, in _get_module
        return importlib.import_module("." + module_name, self.__name__)return importlib.import_module("." + module_name, self.__name__)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)return _bootstrap._gcd_import(name[level:], package, level)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/autoencoders/__init__.py", line 1, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/autoencoders/__init__.py", line 1, in <module>
        from .autoencoder_asym_kl import AsymmetricAutoencoderKLfrom .autoencoder_asym_kl import AsymmetricAutoencoderKL
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/autoencoders/autoencoder_asym_kl.py", line 22, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/autoencoders/autoencoder_asym_kl.py", line 22, in <module>
        from ..modeling_utils import ModelMixinfrom ..modeling_utils import ModelMixin
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/modeling_utils.py", line 41, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/models/modeling_utils.py", line 41, in <module>
        from ..quantizers import DiffusersAutoQuantizer, DiffusersQuantizerfrom ..quantizers import DiffusersAutoQuantizer, DiffusersQuantizer
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/__init__.py", line 16, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/__init__.py", line 16, in <module>
        from .auto import DiffusersAutoQuantizerfrom .auto import DiffusersAutoQuantizer
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/auto.py", line 35, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/auto.py", line 35, in <module>
        from .torchao import TorchAoHfQuantizerfrom .torchao import TorchAoHfQuantizer
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/__init__.py", line 15, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/__init__.py", line 15, in <module>
        from .torchao_quantizer import TorchAoHfQuantizerfrom .torchao_quantizer import TorchAoHfQuantizer
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 108, in <module>
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 108, in <module>
        _update_torch_safe_globals()_update_torch_safe_globals()
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 93, in _update_torch_safe_globals
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/quantizers/torchao/torchao_quantizer.py", line 93, in _update_torch_safe_globals
        logger.warning(logger.warning(
        ^^^^^^^^^^^^
NameErrorNameError: : name 'logger' is not definedname 'logger' is not defined
The above exception was the direct cause of the following exception:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 120, in <module>
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 120, in <module>
        main()main()
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 108, in main
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 108, in main
        raise eraise e
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 95, in main
  File "/run/media/system/Depolama/ai-toolkit/run.py", line 95, in main
        job = get_job(config_file, args.name)job = get_job(config_file, args.name)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/system/Depolama/ai-toolkit/toolkit/job.py", line 28, in get_job
  File "/run/media/system/Depolama/ai-toolkit/toolkit/job.py", line 28, in get_job
        from jobs import ExtensionJobfrom jobs import ExtensionJob
  File "/run/media/system/Depolama/ai-toolkit/jobs/__init__.py", line 1, in <module>
  File "/run/media/system/Depolama/ai-toolkit/jobs/__init__.py", line 1, in <module>
        from .BaseJob import BaseJobfrom .BaseJob import BaseJob
  File "/run/media/system/Depolama/ai-toolkit/jobs/BaseJob.py", line 5, in <module>
  File "/run/media/system/Depolama/ai-toolkit/jobs/BaseJob.py", line 5, in <module>
        from jobs.process import BaseProcessfrom jobs.process import BaseProcess
  File "/run/media/system/Depolama/ai-toolkit/jobs/process/__init__.py", line 6, in <module>
  File "/run/media/system/Depolama/ai-toolkit/jobs/process/__init__.py", line 6, in <module>
        from .TrainVAEProcess import TrainVAEProcessfrom .TrainVAEProcess import TrainVAEProcess
  File "/run/media/system/Depolama/ai-toolkit/jobs/process/TrainVAEProcess.py", line 18, in <module>
  File "/run/media/system/Depolama/ai-toolkit/jobs/process/TrainVAEProcess.py", line 18, in <module>
        from toolkit.image_utils import show_tensorsfrom toolkit.image_utils import show_tensors
  File "/run/media/system/Depolama/ai-toolkit/toolkit/image_utils.py", line 14, in <module>
  File "/run/media/system/Depolama/ai-toolkit/toolkit/image_utils.py", line 14, in <module>
        from diffusers import AutoencoderTinyfrom diffusers import AutoencoderTiny
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1007, in __getattr__
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1007, in __getattr__
        value = getattr(module, name)value = getattr(module, name)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1006, in __getattr__
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1006, in __getattr__
        module = self._get_module(self._class_to_module[name])module = self._get_module(self._class_to_module[name])
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1018, in _get_module
  File "/run/media/system/Depolama/ai-toolkit/venv/lib/python3.12/site-packages/diffusers/utils/import_utils.py", line 1018, in _get_module
        raise RuntimeError(raise RuntimeError(
RuntimeErrorRuntimeError: : Failed to import diffusers.models.autoencoders.autoencoder_tiny because of the following error (look up to see its traceback):
name 'logger' is not definedFailed to import diffusers.models.autoencoders.autoencoder_tiny because of the following error (look up to see its traceback):
name 'logger' is not defined

MrDrMcCoy · 2026-01-22T05:28:08Z

@cupertinomiranda Thank you for putting this together! Unfortunately, I could not get it to work. After completing the build and running the UI, it fails like so:

With this repeating in the console:

[UI] Error fetching GPU stats: TypeError: Cannot read properties of undefined (reading 'value')
[UI]     at <unknown> (.next/server/app/api/gpu/route.js:1:3398)
[UI]     at Array.map (<anonymous>)
[UI]     at y (.next/server/app/api/gpu/route.js:1:2968)
[UI]     at async d (.next/server/app/api/gpu/route.js:1:1313)

Would you be so kind as to contribute a Dockerfile for ROCm? I have the same PRO 9700 XT card as you.

cupertinomiranda · 2026-01-22T13:52:05Z

Would you be so kind as to contribute a Dockerfile for ROCm? I have the same PRO 9700 XT card as you.

Sorry, but that is not my territory, would not know where to start.

cupertinomiranda · 2026-01-30T08:52:13Z

This is a pull request, not an issue, or a reddit post.

Please! If you have any solutions, contribute them as a pull request on my own forked repo.
Do not post any random .txt files with solutions, this will just add to nothing ever be done.
No wonder why the author of ai-toolkit is not taking any pull requests.

tannisroot · 2026-01-30T10:03:18Z

This is a pull request, not an issue, or a reddit post.

Please! If you have any solutions, contribute them as a pull request on my own forked repo.
Do not post any random .txt files with solutions, this will just add to nothing ever be done.
No wonder why the author of ai-toolkit is not taking any pull requests.

Sorry, I didn't mean this to befome a support section, I merely provided feedback for the PR because for me, it doesn't work out of the box due to "amd-smi" issues with current instructions, and I just wanted the issue to not be present when it's merged, as other users may encounter it after it's merged and then have to burden ai-toolkit devs.
For what it's worth, the conversation resulted in finding out that rocm-sdk from TheRock repo (rocm-sdk-core to be specifically) actually ships an amd-smi binary that works with ai-toolkit as I was successfully able to train a Lora with your PR and it.
Also, the issue hasn't been looked at in over a month, I doubt our messages had anything to do with it not being reviewed atm.
But still, thank you very much for adding this change! I hope it does get merged.

cupertinomiranda force-pushed the main branch from 4ee414d to 6a52af4 Compare December 2, 2025 22:19

ptelder reviewed Dec 7, 2025

View reviewed changes

cupertinomiranda force-pushed the main branch 2 times, most recently from 3c65577 to b75d940 Compare December 14, 2025 17:48

cupertinomiranda closed this Jan 22, 2026

cupertinomiranda reopened this Jan 22, 2026

cupertinomiranda force-pushed the main branch 2 times, most recently from 8fa5eba to 53e91ee Compare January 29, 2026 10:41

Add support for AMD ROCm compatible GPUs

9aeb17f

cupertinomiranda force-pushed the main branch from 53e91ee to 9aeb17f Compare January 29, 2026 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for AMD ROCm compatible GPUs.#559

Support for AMD ROCm compatible GPUs.#559
cupertinomiranda wants to merge 1 commit intoostris:mainfrom
cupertinomiranda:main

cupertinomiranda commented Dec 2, 2025

Uh oh!

dkspwndj commented Dec 2, 2025

Uh oh!

cupertinomiranda commented Dec 2, 2025 •

edited

Loading

Uh oh!

ptelder Dec 7, 2025

Uh oh!

ptelder commented Dec 7, 2025

Uh oh!

cupertinomiranda commented Dec 7, 2025

Uh oh!

peyloride commented Dec 23, 2025

Uh oh!

TawusGames commented Jan 18, 2026 •

edited

Loading

Uh oh!

MrDrMcCoy commented Jan 22, 2026

Uh oh!

cupertinomiranda commented Jan 22, 2026 •

edited

Loading

Uh oh!

cupertinomiranda commented Jan 30, 2026

Uh oh!

tannisroot commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

cupertinomiranda commented Dec 2, 2025

Uh oh!

dkspwndj commented Dec 2, 2025

Uh oh!

cupertinomiranda commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ptelder Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

ptelder commented Dec 7, 2025

Uh oh!

cupertinomiranda commented Dec 7, 2025

Uh oh!

peyloride commented Dec 23, 2025

Uh oh!

TawusGames commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MrDrMcCoy commented Jan 22, 2026

Uh oh!

cupertinomiranda commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cupertinomiranda commented Jan 30, 2026

Uh oh!

tannisroot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

cupertinomiranda commented Dec 2, 2025 •

edited

Loading

TawusGames commented Jan 18, 2026 •

edited

Loading

cupertinomiranda commented Jan 22, 2026 •

edited

Loading

tannisroot commented Jan 30, 2026 •

edited

Loading