Add MPS (Metal) GPU acceleration for Apple Silicon#1744
Open
Maanas-Verma wants to merge 3 commits intohacksider:mainfrom
Open
Add MPS (Metal) GPU acceleration for Apple Silicon#1744Maanas-Verma wants to merge 3 commits intohacksider:mainfrom
Maanas-Verma wants to merge 3 commits intohacksider:mainfrom
Conversation
- Add PyTorch MPS backend for face swapper inference (11x speedup over CPU) - New modules/mps_session.py: drop-in replacement for onnxruntime session that converts ONNX model to PyTorch and runs on Metal GPU - Fix black camera preview on macOS: guard fit_image_to_size against zero dimensions when tkinter window reports 1x1 before rendering - Add try/except to display loop so transient errors don't kill it - Fix CoreML provider options for onnxruntime 1.24+ (remove deprecated RequireStaticShapes, SpecializationStrategy, etc.) - Force CPU for InsightFace face detection (dynamic shapes incompatible with CoreML) Performance on Apple Silicon (face swap neural net): CPU: 1.300s -> MPS: 0.117s (11x faster) CoreML: 0.270s -> MPS: 0.117s (2.3x faster) Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Reduce det_size from (640,640) to (320,320) in face analyser — the larger size paradoxically misses faces in certain images (e.g. AI-generated portraits), causing the swap to silently not apply - Fix get_one_face() call in process_frame_v2 that passed 2 args when the function only accepts 1 Co-Authored-By: Claude Opus 4.6 <[email protected]>
Contributor
Reviewer's GuideAdds a PyTorch MPS-backed inference path for the face swapper on Apple Silicon, with a new onnxruntime-compatible MPSSession wrapper, and fixes several stability/performance issues in face detection and UI preview handling. Sequence diagram for face swapper backend selection with MPS fallbacksequenceDiagram
actor User
participant UI as UIModule
participant FS as FaceSwapperModule
participant MS as MPSSessionClass
participant IF as InsightFaceINSwapper
participant ORT as OnnxruntimeSession
User->>UI: startLiveMode()
UI->>FS: get_face_swapper()
alt FACE_SWAPPER is None
FS->>FS: detectAppleSilicon()
alt AppleSiliconAndMPSAvailable
FS->>MS: MPSSession(model_path)
MS-->>FS: mps_session
FS->>IF: INSwapper(model_file, session=mps_session)
IF-->>FS: FACE_SWAPPER (MPS-backed)
else MPSUnavailableOrError
FS->>FS: buildProvidersConfig()
FS->>ORT: get_model(model_path, providers_config)
ORT-->>FS: FACE_SWAPPER (CoreML/CUDA/CPU)
end
FS-->>UI: FACE_SWAPPER
else FACE_SWAPPER already cached
FS-->>UI: FACE_SWAPPER
end
UI-->>User: live face swap frames (GPU or fallback)
Class diagram for new MPSSession and related typesclassDiagram
class MPSSession {
- model_path
- _model
- _providers
- _provider_options
- _inputs
- _outputs
+ MPSSession(model_path, providers)
+ get_inputs() _FakeIO[]
+ get_outputs() _FakeIO[]
+ get_providers() string[]
+ run(output_names, input_feed, run_options) numpy_array[]
}
class _FakeIO {
+ name
+ shape
+ _FakeIO(name, shape)
}
class MPSModuleAPI {
+ is_mps_available() bool
}
MPSSession "*" o-- "*" _FakeIO : uses
MPSModuleAPI <.. MPSSession : checksAvailability
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The broad
except Exception: passin_display_next_framewill silently hide UI and processing errors; consider narrowing the exception type or at least logging the exception so rendering issues can be diagnosed. - In
MPSSession.__init__, theimport onnxinside the method is not guarded like thetorch/onnx2torchimports; ifonnxis missing this will raise at runtime on Apple Silicon—consider wrapping it in a similar try/except and cleanly disabling MPS in that case.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The broad `except Exception: pass` in `_display_next_frame` will silently hide UI and processing errors; consider narrowing the exception type or at least logging the exception so rendering issues can be diagnosed.
- In `MPSSession.__init__`, the `import onnx` inside the method is not guarded like the `torch`/`onnx2torch` imports; if `onnx` is missing this will raise at runtime on Apple Silicon—consider wrapping it in a similar try/except and cleanly disabling MPS in that case.
## Individual Comments
### Comment 1
<location path="modules/mps_session.py" line_range="75-84" />
<code_context>
+ def get_providers(self):
+ return self._providers
+
+ def run(self, output_names, input_feed, run_options=None):
+ tensors = []
+ for inp in self._inputs:
+ arr = input_feed[inp.name]
+ t = _torch.from_numpy(arr).to("mps")
+ tensors.append(t)
+
+ with _torch.no_grad():
+ out = self._model(*tensors)
+ _torch.mps.synchronize()
+
+ if isinstance(out, _torch.Tensor):
+ return [out.cpu().numpy()]
+ return [o.cpu().numpy() for o in out]
</code_context>
<issue_to_address>
**issue (bug_risk):** MPSSession.run ignores the requested output_names and always returns all outputs, which may break onnxruntime compatibility.
This implementation ignores `output_names` and always returns all model outputs in model order. Callers (e.g., `INSWapper`) that expect subset selection or ordering based on `output_names` may get incorrect results. Please map model outputs to their names and return them in the order specified by `output_names`, falling back to all outputs only when it is `None`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Narrow except clause in _display_next_frame to specific exception types (cv2.error, ValueError, RuntimeError) and log the error instead of silently swallowing it - Guard onnx import at module level alongside torch/onnx2torch so a missing onnx package cleanly disables MPS instead of crashing - Respect output_names parameter in MPSSession.run() to return outputs in the caller-requested order for full onnxruntime compat Co-Authored-By: Claude Opus 4.6 <[email protected]>
Author
|
hi @hacksider, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
modules/mps_session.py: drop-in replacement for onnxruntime session that converts the ONNX model to PyTorch and runs inference on Metal Performance Shaders (MPS) GPUfit_image_to_sizecrash when tkinter window reports 1x1 size before renderingprocess_frame_v2crash:get_one_face()called with 2 args instead of 1det_size(640→320) that caused source face detection to silently fail on certain imagesBenchmark (Tested on Apple M1 Pro)
mainNeural net inference alone: 1.300s → 0.117s (11x faster)
Changes
modules/mps_session.py— new file, PyTorch MPS session with onnxruntime-compatible interfacemodules/processors/frame/face_swapper.py— try MPS first on Apple Silicon, fix CoreML options, fixget_one_facecallmodules/face_analyser.py— force CPU provider, reduce det_size to 320modules/ui.py— guardfit_image_to_sizeagainst zero dimensions, add try/except to display loopNew dependencies (Apple Silicon only)
torch— PyTorch with MPS backendonnx2torch— ONNX to PyTorch model conversionTest plan
🤖 Generated with Claude Code
Summary by Sourcery
Add a PyTorch MPS-backed inference path for Apple Silicon and harden face processing and UI preview behavior.
New Features:
Bug Fixes:
Enhancements: