This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a model zoo of PyTorch → Core ML conversions plus matching iOS sample apps. Each model has two halves that should usually be edited together:
- A Python conversion script in
conversion_scripts/convert_*.pythat produces one or more.mlpackagedirectories. - A SwiftUI demo app in
sample_apps/<Name>Demo/(orcreative_apps/for unreleased / experimental ones) that consumes those mlpackages.
The README is the public model index — the long markdown lists were intentionally moved out of per-model docs into the README, with conversion notes living in docs/coreml_conversion_notes.md and per-app READMEs.
conversion_scripts/— oneconvert_<model>.pyper model. Most scripts vendor the upstream repo as a sibling subdirectory (e.g.conversion_scripts/MatAnyone/) andsys.path.insertit before importing. They typically build a smallnn.Modulewrapper,torch.jit.traceit, and callcoremltools.convert.sample_apps/— published SwiftUI demos (one Xcode project per model). Each app's.mlpackages live inside the app's source directory so they get bundled at build time.creative_apps/andcreative_models/— experimental / unreleased apps and the mlpackages they depend on.converted_models/— standalone mlpackages without a dedicated demo app.docs/coreml_conversion_notes.md— READ THIS BEFORE STARTING ANY NEW CONVERSION. It is the institutional memory for every non-obvious bug we have hit (FP16 overflow, ANE memory limits, MPS slice on singleton dim, rank-5 limit, ImageNet normalization, Swin patches, INT8 NaN sanitisation, etc.). New conversion lessons should be appended here.README.md— the public-facing model zoo index. Keep per-model entries terse; long conversion notes go into the per-app README anddocs/coreml_conversion_notes.md, not here.
There is no project-level build system. Each conversion script is standalone Python; each iOS app is a standalone Xcode project.
# Run a conversion (from repo root). Most scripts take --help.
python conversion_scripts/convert_<model>.py --help
# Open an iOS sample app
open sample_apps/<Name>Demo/<Name>Demo.xcodeprojPer CLAUDE global rules: do not run iOS builds locally — building/testing happens on the user's device. Do not commit .mlpackage files, xcuserdata/, or other build artifacts.
These patterns recur across nearly every conversion script and sample app — knowing them up front saves hours.
- Vendor the upstream repo, monkey-patch any ops coremltools cannot trace (
torch.prod,torch.roll, in-place mask assignment,nn.utils.weight_norm, etc.) — see the table indocs/coreml_conversion_notes.mdunder "Unsupported Operations in coremltools". - Build a small
nn.Modulewrapper that exposes the minimum sub-graph the app actually needs, and bake any preprocessing (e.g. ImageNet mean/std normalization) into the wrapper becauseImageType(scale=...)cannot do per-channel normalization. torch.jit.trace→coremltools.convertwithcompute_precision=ct.precision.FLOAT16as the default. Drop to FP32 only when FP16 attention overflow is confirmed (Swin/SinSR, Stable Audio DiT).- Verify parity against the PyTorch reference (
(coreml_out - torch_out).abs().max()is the bare minimum) before declaring victory.
For seq2seq, video, and feedback-loop models we deliberately split the network into several stateless mlpackages and keep all bookkeeping in Swift. Examples:
- Florence-2: VisionEncoder + TextEncoder + Decoder, no KV cache (seq2seq). The decoder re-runs the full sequence each step because adding KV cache doubles weight memory and complicates the Swift driver — only worth it past ~100 output tokens.
- MatAnyone: encoder + mask_encoder + read_first + read + decoder. The per-frame ring buffer (
mem_key,mem_shrinkage,mem_msk_value,mem_valid,sensory,obj_memory, etc.) is owned by Swift; each mlpackage is a pure function of its inputs. Seesample_apps/MatAnyoneDemo/README.mdfor the exact tensor shapes. - Hyper-SD: text encoder + chunked UNet (2 mlpackages from Apple's
chunk_mlprogram) + VAE decoder, driven by a Swift TCD scheduler.
When extending or debugging one of these pipelines, treat the Swift driver as the source of truth for shapes and ordering — the mlpackages are intentionally dumb.
Loading two 100MB CoreML models simultaneously can OOM on real devices. The pattern is sequential load → predict → copy output → release (see docs/coreml_conversion_notes.md "Multi-Model Memory Management"). Always memcpy MLMultiArray outputs out of model-internal buffers before the model goes out of scope.
- Vision transformers ≥ 768×768 must use
.cpuOnly(ANE E5 buffer limits — Florence-2, etc.). - FP16 Swin / DiT attention overflows on GPU/ANE → either FP32 +
.cpuOnlyor INT8 +.cpuAndGPU. - Some Core ML graphs that slice a singleton dimension crash on iOS GPU with
subRange.start = -1 vs length 1(MPS slice bug). MatAnyone'sread/read_firstship.cpuOnlyfor this reason. Do not flip them to GPU without re-running the converter to drop the singleton dim. - Most other models default to
.cpuAndNeuralEngineor.all.
ANE pads rows for SIMD alignment, so MLMultiArray.dataPointer is not C-contiguous on .all / .cpuAndNeuralEngine. Always read via array.strides. Basic Pitch's broken note detection (docs/coreml_conversion_notes.md "Music Transcription") is the canonical example.
Florence-2 uses ImageNet, SigLIP uses (0.5, 0.5), RMBG uses (0.5, 1.0), and RMBG additionally requires post-sigmoid min-max stretch. Always check the original repo's preprocessing and bake it into the wrapper.
- Code comments and UI strings are English only. (Scripts and READMEs frequently reference Japanese / Chinese upstream repos, but our code stays in English.)
- Do not commit
.mlpackagefiles, build artifacts,xcuserdata/, or anything else that bloats the repo. Models are distributed via GitHub Releases / Google Drive links from the README. - Do not include
claudein commit messages or as committer. Do not add Claude co-author trailers. - Build testing is done on the user's physical device — do not attempt to run
xcodebuildor simulator builds as part of a task. - When adding a new conversion lesson, append it to
docs/coreml_conversion_notes.mdrather than scattering it across per-app READMEs.