Skip to content

feat: add MLX backend for native Apple Silicon abliteration#226

Open
overtimepog wants to merge 6 commits intop-e-w:masterfrom
overtimepog:feat/mlx-backend
Open

feat: add MLX backend for native Apple Silicon abliteration#226
overtimepog wants to merge 6 commits intop-e-w:masterfrom
overtimepog:feat/mlx-backend

Conversation

@overtimepog
Copy link

@overtimepog overtimepog commented Mar 14, 2026

Summary

  • Adds a complete MLX backend enabling heretic to run abliteration natively on Apple Silicon using the MLX framework and mlx-lm
  • Supports direct use of MLX-format quantized models (e.g. 4-bit safetensors from mlx-community) without conversion, leveraging Metal GPU acceleration and unified memory
  • Auto-detects MLX models when backend = "auto" (default), or can be explicitly set with --backend mlx

Changes

New files

  • src/heretic/mlx_model.py (~660 lines) — Full MLX model backend implementing the same interface as the PyTorch Model class:
    • Model loading via mlx_lm.load()
    • Text generation with greedy decoding
    • Hidden state extraction via custom layer-by-layer forward pass
    • Log-probability extraction for KL divergence evaluation
    • Abliteration via dequantize→modify→requantize (handles both regular Linear and MoE SwitchLinear/QuantizedSwitchLinear)
    • Weight reset by restoring saved originals
    • Model saving (safetensors + config + tokenizer)
    • Streaming chat for interactive testing
  • tests/test_mlx_model.py — 31 tests (6 unit + 25 integration) covering all operations

Modified files

  • config.py — Added Backend enum (AUTO/PYTORCH/MLX) with default AUTO
  • main.py — Added resolve_backend(), create_model() factory, MLX device detection, and MLX-aware save/upload flows
  • utils.py — Added MLX memory reporting (get_peak_memory, get_active_memory)
  • pyproject.toml — Added [mlx] optional dependency group, pytest marker config
  • config.default.toml — Documented the new backend setting

Design decisions

  • Torch at the boundary: MLX arrays internally, converted to torch.Tensor at the interface boundary so evaluator.py and main.py work unchanged
  • No PEFT dependency for MLX: Direct weight modification with saved originals for reset (instead of LoRA adapters)
  • Dequantize→modify→requantize for abliteration on quantized weights, processing one module at a time to limit memory spikes
  • Zero changes to existing PyTorch code path: The Backend.PYTORCH path is completely unchanged

MoE support

Tested with Qwen3-Coder-30B-A3B-Instruct-MLX-4bit (128 experts, 8 active per token, 48 layers). Handles:

  • SwitchGLU / QuantizedSwitchLinear stacked expert weights
  • Per-expert abliteration via mx.einsum for efficient vectorized computation
  • Both global and per-layer refusal directions

Test plan

  • Unit tests pass (6/6) — Backend enum, Settings field, MLXModel interface, model detection
  • Integration tests pass (25/25) — Loading, layer introspection, generation, residuals, logprobs, abliteration, reset, chat, save
  • Full suite run 3 consecutive times with 31/31 passing each time
  • Verify on a non-Apple system that the PyTorch path is unaffected (MLX imports are lazy)
  • Test with additional MLX model architectures (Llama, Mistral, Phi, etc.)

Usage

# Install with MLX support
pip install 'heretic-llm[mlx]'

# Auto-detect MLX model format
heretic /path/to/mlx-model

# Or explicitly specify backend
heretic --backend mlx /path/to/mlx-model

🤖 Generated with Claude Code

Adds a complete MLX backend that enables heretic to run abliteration
natively on Apple Silicon using the MLX framework and mlx-lm library.
This allows direct use of MLX-format quantized models (e.g. 4-bit
safetensors) without conversion, taking advantage of Metal GPU
acceleration and unified memory.

Key changes:
- New `MLXModel` class (`mlx_model.py`) implementing the full model
  interface: loading, generation, hidden state extraction, logprob
  computation, abliteration via dequantize-modify-requantize, weight
  reset, model saving, and streaming chat
- `Backend` enum (AUTO/PYTORCH/MLX) in config with auto-detection
  of MLX model format
- Backend resolution and model factory in main.py with MLX device
  detection and memory reporting
- MLX optional dependency group: `pip install heretic-llm[mlx]`
- 31 tests (unit + integration) covering all operations including
  MoE architecture support (tested with Qwen3-Coder-30B-A3B MoE)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the heretic tool by introducing a native MLX backend for Apple Silicon. This integration allows users with Apple Silicon devices to run abliteration tasks with MLX-optimized models, benefiting from Metal GPU acceleration and unified memory architecture. The changes include automatic backend detection, a new MLX model implementation that mirrors the existing PyTorch interface, and robust support for quantized and Mixture-of-Experts models, all while ensuring the existing PyTorch functionality remains untouched.

Highlights

  • MLX Backend Integration: A complete MLX backend was added, enabling native abliteration on Apple Silicon using the MLX framework and mlx-lm.
  • Quantized Model Support: The system now directly supports MLX-format quantized models (e.g., 4-bit safetensors) without conversion, leveraging Metal GPU acceleration and unified memory.
  • Automatic Backend Detection: The backend setting now defaults to "auto", which detects MLX models, or can be explicitly set to "mlx".
  • Interface Compatibility: The new MLX backend implements the same public interface as the existing PyTorch Model class, ensuring transparent usage for evaluator.py and main.py.
  • Comprehensive Testing: The MLX backend is supported by 31 new tests (6 unit, 25 integration) covering loading, generation, abliteration, and other core functionalities.
  • MoE Support: The MLX backend includes support for Mixture-of-Experts (MoE) models, such as Qwen3-Coder-30B-A3B-Instruct-MLX-4bit, handling stacked expert weights and per-expert abliteration.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • config.default.toml
    • Documented the new backend configuration option, including "auto", "pytorch", and "mlx" choices.
  • pyproject.toml
    • Added mlx and mlx-lm to an optional [mlx] dependency group and configured a pytest marker for integration tests.
  • src/heretic/config.py
    • Defined a Backend enum (AUTO, PYTORCH, MLX) and integrated it into the Settings model.
  • src/heretic/main.py
    • Implemented resolve_backend for automatic or explicit backend selection, a create_model factory, MLX device information display, and adapted model saving/uploading logic for MLX.
  • src/heretic/mlx_model.py
    • Added a new module containing the MLXModel class, which provides MLX-native implementations for model loading, text generation, hidden state extraction, log-probability calculation, abliteration (dequantize->modify->requantize), weight reset, and model saving.
  • src/heretic/utils.py
    • Extended memory reporting to include MLX-specific peak and active memory usage.
  • tests/test_mlx_model.py
    • Added a new test file with unit tests for backend configuration and interface, and integration tests for MLX model loading, layer introspection, generation, residuals, log-probabilities, abliteration, reset, chat streaming, and saving.
Activity
  • The author, overtimepog, initiated this pull request to add MLX backend support.
  • The pull request includes a comprehensive test suite with 6 unit tests and 25 integration tests, all of which are reported as passing.
  • The author has confirmed that the full test suite passed 3 consecutive times.
  • Remaining tasks include verifying the PyTorch path on non-Apple systems and testing with additional MLX model architectures.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

overtimepog and others added 5 commits March 13, 2026 20:42
- Fix LM head projection for models with tie_word_embeddings=true
  (e.g. Llama, SmolLM) that reuse the embedding matrix instead of
  having a separate lm_head layer
- Add 14 PyTorch fallback tests verifying MLX imports are lazy and
  the PyTorch code path is unaffected when MLX is unavailable
- Tested with SmolLM2-135M-Instruct (LlamaForCausalLM, dense) in
  addition to Qwen3-Coder-30B-A3B (Qwen3MoeForCausalLM, MoE)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Backend

The existing code uses `backend` for the JournalFileBackend object.
The new resolve_backend() return value was also named `backend`,
causing it to be overwritten before create_model() was called.
Renamed to `model_backend` to avoid the conflict.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace full-tensor dequantization with per-expert processing:
each of the 128 experts is dequantized, modified, and requantized
individually, keeping peak memory bounded. Reduces abliteration
memory spike from 16+ GB (OOM on 32GB systems) to ~5 GB.

Also fix deprecated mx.metal.* calls to use mx.get_peak_memory()
and mx.get_active_memory() directly.

Adds memory budget integration tests (test_mlx_memory.py).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Monkey-patch mx.metal.device_info/get_peak_memory/get_active_memory
to their current mx.* equivalents before mlx_lm uses them, silencing
C++ deprecation warnings that bypass Python's warnings module.

Add run_mlx.sh convenience script for running heretic with MLX backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use mlx_lm.batch_generate() for response generation instead of
  sequential per-prompt mlx_lm.generate() calls
- Add _forward_logits_only() that uses the model's optimized __call__
  for logprob computation instead of the slower layer-by-layer pass
- These two changes eliminate the biggest bottlenecks in the
  abliteration pipeline (prefix check, evaluation, refusal counting)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant