Add MPS (Metal) GPU acceleration for Apple Silicon by Maanas-Verma · Pull Request #1744 · hacksider/Deep-Live-Cam

Maanas-Verma · 2026-04-07T12:12:17Z

Summary

4-5x FPS improvement on Apple Silicon Macs: from 0.8–0.9 FPS (CPU) to 3.4–4.4 FPS (MPS GPU) in live mode
New modules/mps_session.py: drop-in replacement for onnxruntime session that converts the ONNX model to PyTorch and runs inference on Metal Performance Shaders (MPS) GPU
Auto-detects Apple Silicon — no CLI flags needed, falls back to CoreML/CPU on other platforms
Fix black camera preview on macOS caused by fit_image_to_size crash when tkinter window reports 1x1 size before rendering
Fix process_frame_v2 crash: get_one_face() called with 2 args instead of 1
Fix face detection det_size (640→320) that caused source face detection to silently fail on certain images
Fix deprecated CoreML provider options for onnxruntime 1.17+
Force CPU for InsightFace face analyser (dynamic input shapes incompatible with CoreML)

Benchmark (Tested on Apple M1 Pro)

Branch	FPS	Backend
`main`	0.8–0.9	ONNX CPU
This PR	3.4–4.4	PyTorch MPS (Metal GPU)

Neural net inference alone: 1.300s → 0.117s (11x faster)

Tested on MacBook Pro with Apple M1 Pro chip, macOS, Python 3.11

Changes

modules/mps_session.py — new file, PyTorch MPS session with onnxruntime-compatible interface
modules/processors/frame/face_swapper.py — try MPS first on Apple Silicon, fix CoreML options, fix get_one_face call
modules/face_analyser.py — force CPU provider, reduce det_size to 320
modules/ui.py — guard fit_image_to_size against zero dimensions, add try/except to display loop

New dependencies (Apple Silicon only)

torch — PyTorch with MPS backend
onnx2torch — ONNX to PyTorch model conversion

Test plan

Verify live mode face swap works on Apple Silicon Mac (M1 Pro)
Verify FPS improvement over main branch
Verify face swap still works on CUDA/CPU (no regression)
Verify camera preview doesn't show black screen on first open
Verify source face detection works with AI-generated portraits

🤖 Generated with Claude Code

Summary by Sourcery

Add a PyTorch MPS-backed inference path for Apple Silicon and harden face processing and UI preview behavior.

New Features:

Introduce an MPSSession wrapper that runs ONNX models via PyTorch on Apple MPS as a drop-in replacement for onnxruntime sessions.
Enable the face swapper to use MPS on Apple Silicon when available, falling back to existing CoreML/CUDA/CPU backends otherwise.

Bug Fixes:

Fix process_frame_v2 using get_one_face with an incorrect argument signature, preventing a runtime crash when no face map is present.
Prevent crashes and black preview frames by guarding fit_image_to_size and the display loop against zero or invalid widget dimensions.
Force the face analyser to use CPU only and reduce detection size to avoid CoreML incompatibilities and missed detections.

Enhancements:

Simplify CoreML execution provider options to match supported configuration on current onnxruntime versions.

- Add PyTorch MPS backend for face swapper inference (11x speedup over CPU) - New modules/mps_session.py: drop-in replacement for onnxruntime session that converts ONNX model to PyTorch and runs on Metal GPU - Fix black camera preview on macOS: guard fit_image_to_size against zero dimensions when tkinter window reports 1x1 before rendering - Add try/except to display loop so transient errors don't kill it - Fix CoreML provider options for onnxruntime 1.24+ (remove deprecated RequireStaticShapes, SpecializationStrategy, etc.) - Force CPU for InsightFace face detection (dynamic shapes incompatible with CoreML) Performance on Apple Silicon (face swap neural net): CPU: 1.300s -> MPS: 0.117s (11x faster) CoreML: 0.270s -> MPS: 0.117s (2.3x faster) Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Reduce det_size from (640,640) to (320,320) in face analyser — the larger size paradoxically misses faces in certain images (e.g. AI-generated portraits), causing the swap to silently not apply - Fix get_one_face() call in process_frame_v2 that passed 2 args when the function only accepts 1 Co-Authored-By: Claude Opus 4.6 <[email protected]>

sourcery-ai · 2026-04-07T12:12:25Z

Reviewer's Guide

Adds a PyTorch MPS-backed inference path for the face swapper on Apple Silicon, with a new onnxruntime-compatible MPSSession wrapper, and fixes several stability/performance issues in face detection and UI preview handling.

Sequence diagram for face swapper backend selection with MPS fallback

sequenceDiagram
    actor User
    participant UI as UIModule
    participant FS as FaceSwapperModule
    participant MS as MPSSessionClass
    participant IF as InsightFaceINSwapper
    participant ORT as OnnxruntimeSession

    User->>UI: startLiveMode()
    UI->>FS: get_face_swapper()

    alt FACE_SWAPPER is None
        FS->>FS: detectAppleSilicon()
        alt AppleSiliconAndMPSAvailable
            FS->>MS: MPSSession(model_path)
            MS-->>FS: mps_session
            FS->>IF: INSwapper(model_file, session=mps_session)
            IF-->>FS: FACE_SWAPPER (MPS-backed)
        else MPSUnavailableOrError
            FS->>FS: buildProvidersConfig()
            FS->>ORT: get_model(model_path, providers_config)
            ORT-->>FS: FACE_SWAPPER (CoreML/CUDA/CPU)
        end
        FS-->>UI: FACE_SWAPPER
    else FACE_SWAPPER already cached
        FS-->>UI: FACE_SWAPPER
    end

    UI-->>User: live face swap frames (GPU or fallback)

Class diagram for new MPSSession and related types

classDiagram
    class MPSSession {
        - model_path
        - _model
        - _providers
        - _provider_options
        - _inputs
        - _outputs
        + MPSSession(model_path, providers)
        + get_inputs() _FakeIO[]
        + get_outputs() _FakeIO[]
        + get_providers() string[]
        + run(output_names, input_feed, run_options) numpy_array[]
    }

    class _FakeIO {
        + name
        + shape
        + _FakeIO(name, shape)
    }

    class MPSModuleAPI {
        + is_mps_available() bool
    }

    MPSSession "*" o-- "*" _FakeIO : uses
    MPSModuleAPI <.. MPSSession : checksAvailability

File-Level Changes

Change	Details	Files
Introduce MPSSession as an onnxruntime-compatible PyTorch MPS backend and wire it into the face swapper on Apple Silicon.	Add modules/mps_session.py implementing a minimal onnxruntime.InferenceSession-compatible wrapper around a converted PyTorch model running on MPS, including input/output metadata discovery and warmup. Gate MPS usage behind runtime Apple Silicon and MPS-availability checks, exposing is_mps_available() for callers. Prefer MPS for the insightface INSwapper session on Apple Silicon, with graceful fallback to existing onnxruntime providers when MPS is unavailable or fails to initialize.	`modules/mps_session.py` `modules/processors/frame/face_swapper.py`
Simplify and modernize onnxruntime provider configuration while keeping CUDA optimizations and correcting CoreML options for newer onnxruntime versions.	Replace the detailed CoreMLExecutionProvider option set with a minimal configuration using MLProgram and ALL compute units for Apple Silicon. Retain explicit CUDAExecutionProvider configuration while leaving other providers unchanged in the fallback path.	`modules/processors/frame/face_swapper.py`
Fix bugs in frame processing and face analysis that caused crashes or missed detections, and adjust detection settings for better robustness.	Correct process_frame_v2 to call get_one_face with only the processed frame, matching the function signature. Force the face analyser to use CPUExecutionProvider only, avoiding CoreML incompatibilities with dynamic input shapes. Reduce the face analyser det_size from (640, 640) to (320, 320) to avoid silent source face detection failures on some images.	`modules/processors/frame/face_swapper.py` `modules/face_analyser.py`
Harden the UI image fitting and preview pipeline to avoid crashes and black previews when the window reports invalid dimensions.	Guard fit_image_to_size against zero/None/small dimensions and discard resize attempts that would result in <1 pixel in any dimension. Wrap the preview frame scaling and conversion logic in a try/except block to prevent a single failure from breaking the display loop.	`modules/ui.py`

Possibly linked issues

#Fix GPU usage on MPS: The PR introduces MPS GPU acceleration on Apple Silicon, directly resolving the issue of CPU-only inference and low FPS.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The broad except Exception: pass in _display_next_frame will silently hide UI and processing errors; consider narrowing the exception type or at least logging the exception so rendering issues can be diagnosed.
In MPSSession.__init__, the import onnx inside the method is not guarded like the torch/onnx2torch imports; if onnx is missing this will raise at runtime on Apple Silicon—consider wrapping it in a similar try/except and cleanly disabling MPS in that case.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The broad `except Exception: pass` in `_display_next_frame` will silently hide UI and processing errors; consider narrowing the exception type or at least logging the exception so rendering issues can be diagnosed.
- In `MPSSession.__init__`, the `import onnx` inside the method is not guarded like the `torch`/`onnx2torch` imports; if `onnx` is missing this will raise at runtime on Apple Silicon—consider wrapping it in a similar try/except and cleanly disabling MPS in that case.

## Individual Comments

### Comment 1
<location path="modules/mps_session.py" line_range="75-84" />
<code_context>
+    def get_providers(self):
+        return self._providers
+
+    def run(self, output_names, input_feed, run_options=None):
+        tensors = []
+        for inp in self._inputs:
+            arr = input_feed[inp.name]
+            t = _torch.from_numpy(arr).to("mps")
+            tensors.append(t)
+
+        with _torch.no_grad():
+            out = self._model(*tensors)
+            _torch.mps.synchronize()
+
+        if isinstance(out, _torch.Tensor):
+            return [out.cpu().numpy()]
+        return [o.cpu().numpy() for o in out]
</code_context>
<issue_to_address>
**issue (bug_risk):** MPSSession.run ignores the requested output_names and always returns all outputs, which may break onnxruntime compatibility.

This implementation ignores `output_names` and always returns all model outputs in model order. Callers (e.g., `INSWapper`) that expect subset selection or ordering based on `output_names` may get incorrect results. Please map model outputs to their names and return them in the order specified by `output_names`, falling back to all outputs only when it is `None`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

- Narrow except clause in _display_next_frame to specific exception types (cv2.error, ValueError, RuntimeError) and log the error instead of silently swallowing it - Guard onnx import at module level alongside torch/onnx2torch so a missing onnx package cleanly disables MPS instead of crashing - Respect output_names parameter in MPSSession.run() to return outputs in the caller-requested order for full onnxruntime compat Co-Authored-By: Claude Opus 4.6 <[email protected]>

Maanas-Verma · 2026-04-07T12:28:34Z

hi @hacksider,
I don't have a NVIDIA GPU, can anyone check it on CUDA/CPU (no regression) on behalf of me.
Thanks 😊

Maanas-Verma and others added 2 commits April 6, 2026 20:57

sourcery-ai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread modules/mps_session.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MPS (Metal) GPU acceleration for Apple Silicon#1744

Add MPS (Metal) GPU acceleration for Apple Silicon#1744
Maanas-Verma wants to merge 3 commits intohacksider:mainfrom
Maanas-Verma:feat/mps-gpu-acceleration

Maanas-Verma commented Apr 7, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Apr 7, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Maanas-Verma commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Maanas-Verma commented Apr 7, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark (Tested on Apple M1 Pro)

Changes

New dependencies (Apple Silicon only)

Test plan

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for face swapper backend selection with MPS fallback

Class diagram for new MPSSession and related types

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Maanas-Verma commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Maanas-Verma commented Apr 7, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Apr 7, 2026 •

edited

Loading