Skip to content

Conversation

@ivanfioravanti
Copy link

MPS Support on Apple Silicon devices. This should solve #127
Many changes had to be applied to make this work properly.

…ilicon

This commit adds full support for training AI models on Apple Silicon Macs using MPS,
including fixes for multiprocessing tensor sharing issues and UI compatibility.

Major changes:
- Created toolkit/device_utils.py with comprehensive MPS device management
- Fixed PyTorch multiprocessing issues on MPS by forcing num_workers=0
- Added Apple Silicon GPU detection in UI API
- Fixed UI client-side errors with type checking and error handling
- Added Apple Silicon-optimized training configs for Flux and Z-Image

Key fixes:
- Resolved "_share_filename_: only available on CPU" error
- Fixed .toFixed() errors on string values from MPS API
- Added defensive JSON parsing with fallbacks
- Updated all training processes to use device-specific dataloader settings
- Update existing job configuration from adamw8bit to adamw for MPS compatibility
- Add device-aware optimizer selection in UI (MPS shows compatible optimizers only)
- Update default optimizer to use adamw on Mac, adamw8bit on CUDA
- Add backend validation to automatically convert 8-bit optimizers for MPS
- Fix multiple UI components with .toFixed() errors on string values
- Add comprehensive MPS device utilities and detection
@ivanfioravanti
Copy link
Author

@jaretburkett you are the boss here, up to you if you want to integrate MPS support or not. For sure from PyTorch 2.9 things has improved a lot.

@jaretburkett
Copy link
Contributor

I looked over the changes, I don't see anything that looks like it would cause an issue. I'll run some tests on it to make sure. @ivanfioravanti, how much ram on mac was needed to train z-image-turbo? I have a 24GB macbook I can test on.

@ivanfioravanti
Copy link
Author

Let me try today. Keep you posted.

@ivanfioravanti
Copy link
Author

~30GB I can grant you access to an M3 Ultra 512GB, but next week.

@FritzTheCatfish
Copy link

Excellent. What settings are you using? I had to disable transformations to get going using PyTorch 2.7.0.

@lorand93
Copy link

lorand93 commented Jan 15, 2026

Why is this not yet merged? is it waiting on more testing/feedback?

@giovanni-amati
Copy link

Can we merge this? Does it work?

@ivanfioravanti
Copy link
Author

Yes it worked, I fixed a conflict in the README file just now. @jaretburkett how would you like to proceed?

@lorand93
Copy link

lorand93 commented Jan 22, 2026

sharing my experience: I run it successfully on an M4 pro with 48gb ram, trained already multiple LoRAs 👍

@ManasInd
Copy link

@ivanfioravanti Great work on enabling support for MPS.

I tried to train the LoRA for z-image turbo on an M4 Pro 64GB and got stuck on two things. Seems like something specific to my setup so sharing the details here.

  1. When the config contains samples section, the training job gets stuck in the sample generation step itself. I could only proceed with job after I removed that part.
  2. I trained LoRA for 1000 steps using 15 images. I tried using the LoRA in ComfyUI, mflux but do not see any influence in the image.

Sharing the config I used:
job: extension
config:
name: my_first_lora_v1
process:

  • type: sd_trainer
    training_folder: /Volumes/ExtStorage/Data/ai-toolkit/output
    sqlite_db_path: /Volumes/ExtStorage/Data/ai-toolkit/aitk_db.db
    device: mps
    trigger_word: oilpainting
    performance_log_every: 10
    network:
    type: lora
    linear: 16
    linear_alpha: 16
    save:
    dtype: float16
    save_every: 250
    max_step_saves_to_keep: 4
    push_to_hub: false
    datasets:
    • folder_path: /Volumes/ExtStorage/Data/ai-toolkit/datasets/oilpaintingsamples
      caption_ext: .txt
      caption_data: true
      train:
      batch_size: 1
      gradient_accumulation_steps: 8
      precision: bfloat16
      mixed_precision: true
      steps: 1000
      lr: 0.0001
      optimizer: adamw
      lr_scheduler: constant
      noise_scheduler: flowmatch
      timestep_type: weighted
      guidance_scale: 1
      sample_steps: 8
      gradient_checkpointing: true
      unload_text_encoder: false
      logging:
      log_every: 1
      use_ui_logger: true
      model:
      name_or_path: Tongyi-MAI/Z-Image-Turbo
      quantize: true
      qtype: float8
      quantize_te: true
      qtype_te: float8
      arch: zimage:turbo
      low_vram: true
      model_kwargs: {}
      layer_offloading: true
      layer_offloading_text_encoder_percent: 1
      layer_offloading_transformer_percent: 1
      assistant_lora_path: ostris/zimage_turbo_training_adapter/zimage_turbo_training_adapter_v1.safetensors
      attention_slice_size: 1
      vae_slicing: true
      memory:
      enable_memory_efficient_attention: true
      use_scaled_dot_product_attention: true
      advanced:
      diff_output_preservation: false
      blank_prompt_preservation: false
      meta:
      name: my_first_lora_v1
      version: '1.0'

@lorand93 As the training worked on your M4 Pro, request you to share a working config which you tried on your device? That will help me identify if there is something missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants