Skip to content

Add MPS (Metal) GPU acceleration for Apple Silicon#1744

Open
Maanas-Verma wants to merge 3 commits intohacksider:mainfrom
Maanas-Verma:feat/mps-gpu-acceleration
Open

Add MPS (Metal) GPU acceleration for Apple Silicon#1744
Maanas-Verma wants to merge 3 commits intohacksider:mainfrom
Maanas-Verma:feat/mps-gpu-acceleration

Conversation

@Maanas-Verma
Copy link
Copy Markdown

@Maanas-Verma Maanas-Verma commented Apr 7, 2026

Summary

  • 4-5x FPS improvement on Apple Silicon Macs: from 0.8–0.9 FPS (CPU) to 3.4–4.4 FPS (MPS GPU) in live mode
  • New modules/mps_session.py: drop-in replacement for onnxruntime session that converts the ONNX model to PyTorch and runs inference on Metal Performance Shaders (MPS) GPU
  • Auto-detects Apple Silicon — no CLI flags needed, falls back to CoreML/CPU on other platforms
  • Fix black camera preview on macOS caused by fit_image_to_size crash when tkinter window reports 1x1 size before rendering
  • Fix process_frame_v2 crash: get_one_face() called with 2 args instead of 1
  • Fix face detection det_size (640→320) that caused source face detection to silently fail on certain images
  • Fix deprecated CoreML provider options for onnxruntime 1.17+
  • Force CPU for InsightFace face analyser (dynamic input shapes incompatible with CoreML)

Benchmark (Tested on Apple M1 Pro)

Branch FPS Backend
main 0.8–0.9 ONNX CPU
This PR 3.4–4.4 PyTorch MPS (Metal GPU)

Neural net inference alone: 1.300s → 0.117s (11x faster)

Tested on MacBook Pro with Apple M1 Pro chip, macOS, Python 3.11

Changes

  • modules/mps_session.pynew file, PyTorch MPS session with onnxruntime-compatible interface
  • modules/processors/frame/face_swapper.py — try MPS first on Apple Silicon, fix CoreML options, fix get_one_face call
  • modules/face_analyser.py — force CPU provider, reduce det_size to 320
  • modules/ui.py — guard fit_image_to_size against zero dimensions, add try/except to display loop

New dependencies (Apple Silicon only)

  • torch — PyTorch with MPS backend
  • onnx2torch — ONNX to PyTorch model conversion

Test plan

  • Verify live mode face swap works on Apple Silicon Mac (M1 Pro)
  • Verify FPS improvement over main branch
  • Verify face swap still works on CUDA/CPU (no regression)
  • Verify camera preview doesn't show black screen on first open
  • Verify source face detection works with AI-generated portraits

🤖 Generated with Claude Code

Summary by Sourcery

Add a PyTorch MPS-backed inference path for Apple Silicon and harden face processing and UI preview behavior.

New Features:

  • Introduce an MPSSession wrapper that runs ONNX models via PyTorch on Apple MPS as a drop-in replacement for onnxruntime sessions.
  • Enable the face swapper to use MPS on Apple Silicon when available, falling back to existing CoreML/CUDA/CPU backends otherwise.

Bug Fixes:

  • Fix process_frame_v2 using get_one_face with an incorrect argument signature, preventing a runtime crash when no face map is present.
  • Prevent crashes and black preview frames by guarding fit_image_to_size and the display loop against zero or invalid widget dimensions.
  • Force the face analyser to use CPU only and reduce detection size to avoid CoreML incompatibilities and missed detections.

Enhancements:

  • Simplify CoreML execution provider options to match supported configuration on current onnxruntime versions.

Maanas-Verma and others added 2 commits April 6, 2026 20:57
- Add PyTorch MPS backend for face swapper inference (11x speedup over CPU)
- New modules/mps_session.py: drop-in replacement for onnxruntime session
  that converts ONNX model to PyTorch and runs on Metal GPU
- Fix black camera preview on macOS: guard fit_image_to_size against
  zero dimensions when tkinter window reports 1x1 before rendering
- Add try/except to display loop so transient errors don't kill it
- Fix CoreML provider options for onnxruntime 1.24+ (remove deprecated
  RequireStaticShapes, SpecializationStrategy, etc.)
- Force CPU for InsightFace face detection (dynamic shapes incompatible
  with CoreML)

Performance on Apple Silicon (face swap neural net):
  CPU:     1.300s -> MPS: 0.117s (11x faster)
  CoreML:  0.270s -> MPS: 0.117s (2.3x faster)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Reduce det_size from (640,640) to (320,320) in face analyser — the
  larger size paradoxically misses faces in certain images (e.g.
  AI-generated portraits), causing the swap to silently not apply
- Fix get_one_face() call in process_frame_v2 that passed 2 args
  when the function only accepts 1

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Apr 7, 2026

Reviewer's Guide

Adds a PyTorch MPS-backed inference path for the face swapper on Apple Silicon, with a new onnxruntime-compatible MPSSession wrapper, and fixes several stability/performance issues in face detection and UI preview handling.

Sequence diagram for face swapper backend selection with MPS fallback

sequenceDiagram
    actor User
    participant UI as UIModule
    participant FS as FaceSwapperModule
    participant MS as MPSSessionClass
    participant IF as InsightFaceINSwapper
    participant ORT as OnnxruntimeSession

    User->>UI: startLiveMode()
    UI->>FS: get_face_swapper()

    alt FACE_SWAPPER is None
        FS->>FS: detectAppleSilicon()
        alt AppleSiliconAndMPSAvailable
            FS->>MS: MPSSession(model_path)
            MS-->>FS: mps_session
            FS->>IF: INSwapper(model_file, session=mps_session)
            IF-->>FS: FACE_SWAPPER (MPS-backed)
        else MPSUnavailableOrError
            FS->>FS: buildProvidersConfig()
            FS->>ORT: get_model(model_path, providers_config)
            ORT-->>FS: FACE_SWAPPER (CoreML/CUDA/CPU)
        end
        FS-->>UI: FACE_SWAPPER
    else FACE_SWAPPER already cached
        FS-->>UI: FACE_SWAPPER
    end

    UI-->>User: live face swap frames (GPU or fallback)
Loading

Class diagram for new MPSSession and related types

classDiagram
    class MPSSession {
        - model_path
        - _model
        - _providers
        - _provider_options
        - _inputs
        - _outputs
        + MPSSession(model_path, providers)
        + get_inputs() _FakeIO[]
        + get_outputs() _FakeIO[]
        + get_providers() string[]
        + run(output_names, input_feed, run_options) numpy_array[]
    }

    class _FakeIO {
        + name
        + shape
        + _FakeIO(name, shape)
    }

    class MPSModuleAPI {
        + is_mps_available() bool
    }

    MPSSession "*" o-- "*" _FakeIO : uses
    MPSModuleAPI <.. MPSSession : checksAvailability
Loading

File-Level Changes

Change Details Files
Introduce MPSSession as an onnxruntime-compatible PyTorch MPS backend and wire it into the face swapper on Apple Silicon.
  • Add modules/mps_session.py implementing a minimal onnxruntime.InferenceSession-compatible wrapper around a converted PyTorch model running on MPS, including input/output metadata discovery and warmup.
  • Gate MPS usage behind runtime Apple Silicon and MPS-availability checks, exposing is_mps_available() for callers.
  • Prefer MPS for the insightface INSwapper session on Apple Silicon, with graceful fallback to existing onnxruntime providers when MPS is unavailable or fails to initialize.
modules/mps_session.py
modules/processors/frame/face_swapper.py
Simplify and modernize onnxruntime provider configuration while keeping CUDA optimizations and correcting CoreML options for newer onnxruntime versions.
  • Replace the detailed CoreMLExecutionProvider option set with a minimal configuration using MLProgram and ALL compute units for Apple Silicon.
  • Retain explicit CUDAExecutionProvider configuration while leaving other providers unchanged in the fallback path.
modules/processors/frame/face_swapper.py
Fix bugs in frame processing and face analysis that caused crashes or missed detections, and adjust detection settings for better robustness.
  • Correct process_frame_v2 to call get_one_face with only the processed frame, matching the function signature.
  • Force the face analyser to use CPUExecutionProvider only, avoiding CoreML incompatibilities with dynamic input shapes.
  • Reduce the face analyser det_size from (640, 640) to (320, 320) to avoid silent source face detection failures on some images.
modules/processors/frame/face_swapper.py
modules/face_analyser.py
Harden the UI image fitting and preview pipeline to avoid crashes and black previews when the window reports invalid dimensions.
  • Guard fit_image_to_size against zero/None/small dimensions and discard resize attempts that would result in <1 pixel in any dimension.
  • Wrap the preview frame scaling and conversion logic in a try/except block to prevent a single failure from breaking the display loop.
modules/ui.py

Possibly linked issues

  • #Fix GPU usage on MPS: The PR introduces MPS GPU acceleration on Apple Silicon, directly resolving the issue of CPU-only inference and low FPS.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The broad except Exception: pass in _display_next_frame will silently hide UI and processing errors; consider narrowing the exception type or at least logging the exception so rendering issues can be diagnosed.
  • In MPSSession.__init__, the import onnx inside the method is not guarded like the torch/onnx2torch imports; if onnx is missing this will raise at runtime on Apple Silicon—consider wrapping it in a similar try/except and cleanly disabling MPS in that case.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The broad `except Exception: pass` in `_display_next_frame` will silently hide UI and processing errors; consider narrowing the exception type or at least logging the exception so rendering issues can be diagnosed.
- In `MPSSession.__init__`, the `import onnx` inside the method is not guarded like the `torch`/`onnx2torch` imports; if `onnx` is missing this will raise at runtime on Apple Silicon—consider wrapping it in a similar try/except and cleanly disabling MPS in that case.

## Individual Comments

### Comment 1
<location path="modules/mps_session.py" line_range="75-84" />
<code_context>
+    def get_providers(self):
+        return self._providers
+
+    def run(self, output_names, input_feed, run_options=None):
+        tensors = []
+        for inp in self._inputs:
+            arr = input_feed[inp.name]
+            t = _torch.from_numpy(arr).to("mps")
+            tensors.append(t)
+
+        with _torch.no_grad():
+            out = self._model(*tensors)
+            _torch.mps.synchronize()
+
+        if isinstance(out, _torch.Tensor):
+            return [out.cpu().numpy()]
+        return [o.cpu().numpy() for o in out]
</code_context>
<issue_to_address>
**issue (bug_risk):** MPSSession.run ignores the requested output_names and always returns all outputs, which may break onnxruntime compatibility.

This implementation ignores `output_names` and always returns all model outputs in model order. Callers (e.g., `INSWapper`) that expect subset selection or ordering based on `output_names` may get incorrect results. Please map model outputs to their names and return them in the order specified by `output_names`, falling back to all outputs only when it is `None`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread modules/mps_session.py
- Narrow except clause in _display_next_frame to specific exception
  types (cv2.error, ValueError, RuntimeError) and log the error
  instead of silently swallowing it
- Guard onnx import at module level alongside torch/onnx2torch so
  a missing onnx package cleanly disables MPS instead of crashing
- Respect output_names parameter in MPSSession.run() to return
  outputs in the caller-requested order for full onnxruntime compat

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@Maanas-Verma
Copy link
Copy Markdown
Author

hi @hacksider,
I don't have a NVIDIA GPU, can anyone check it on CUDA/CPU (no regression) on behalf of me.
Thanks 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant