feat: add MLX backend for native Apple Silicon abliteration by overtimepog · Pull Request #226 · p-e-w/heretic

overtimepog · 2026-03-14T00:27:21Z

Summary

Adds a complete MLX backend enabling heretic to run abliteration natively on Apple Silicon using the MLX framework and mlx-lm
Supports direct use of MLX-format quantized models (e.g. 4-bit safetensors from mlx-community) without conversion, leveraging Metal GPU acceleration and unified memory
Auto-detects MLX models when backend = "auto" (default), or can be explicitly set with --backend mlx

Changes

New files

src/heretic/mlx_model.py (~660 lines) — Full MLX model backend implementing the same interface as the PyTorch Model class:
- Model loading via mlx_lm.load()
- Text generation with greedy decoding
- Hidden state extraction via custom layer-by-layer forward pass
- Log-probability extraction for KL divergence evaluation
- Abliteration via dequantize→modify→requantize (handles both regular Linear and MoE SwitchLinear/QuantizedSwitchLinear)
- Weight reset by restoring saved originals
- Model saving (safetensors + config + tokenizer)
- Streaming chat for interactive testing
tests/test_mlx_model.py — 31 tests (6 unit + 25 integration) covering all operations

Modified files

config.py — Added Backend enum (AUTO/PYTORCH/MLX) with default AUTO
main.py — Added resolve_backend(), create_model() factory, MLX device detection, and MLX-aware save/upload flows
utils.py — Added MLX memory reporting (get_peak_memory, get_active_memory)
pyproject.toml — Added [mlx] optional dependency group, pytest marker config
config.default.toml — Documented the new backend setting

Design decisions

Torch at the boundary: MLX arrays internally, converted to torch.Tensor at the interface boundary so evaluator.py and main.py work unchanged
No PEFT dependency for MLX: Direct weight modification with saved originals for reset (instead of LoRA adapters)
Dequantize→modify→requantize for abliteration on quantized weights, processing one module at a time to limit memory spikes
Zero changes to existing PyTorch code path: The Backend.PYTORCH path is completely unchanged

MoE support

Tested with Qwen3-Coder-30B-A3B-Instruct-MLX-4bit (128 experts, 8 active per token, 48 layers). Handles:

SwitchGLU / QuantizedSwitchLinear stacked expert weights
Per-expert abliteration via mx.einsum for efficient vectorized computation
Both global and per-layer refusal directions

Test plan

Unit tests pass (6/6) — Backend enum, Settings field, MLXModel interface, model detection
Integration tests pass (25/25) — Loading, layer introspection, generation, residuals, logprobs, abliteration, reset, chat, save
Full suite run 3 consecutive times with 31/31 passing each time
Verify on a non-Apple system that the PyTorch path is unaffected (MLX imports are lazy)
Test with additional MLX model architectures (Llama, Mistral, Phi, etc.)

Usage

# Install with MLX support
pip install 'heretic-llm[mlx]'

# Auto-detect MLX model format
heretic /path/to/mlx-model

# Or explicitly specify backend
heretic --backend mlx /path/to/mlx-model

🤖 Generated with Claude Code

Adds a complete MLX backend that enables heretic to run abliteration natively on Apple Silicon using the MLX framework and mlx-lm library. This allows direct use of MLX-format quantized models (e.g. 4-bit safetensors) without conversion, taking advantage of Metal GPU acceleration and unified memory. Key changes: - New `MLXModel` class (`mlx_model.py`) implementing the full model interface: loading, generation, hidden state extraction, logprob computation, abliteration via dequantize-modify-requantize, weight reset, model saving, and streaming chat - `Backend` enum (AUTO/PYTORCH/MLX) in config with auto-detection of MLX model format - Backend resolution and model factory in main.py with MLX device detection and memory reporting - MLX optional dependency group: `pip install heretic-llm[mlx]` - 31 tests (unit + integration) covering all operations including MoE architecture support (tested with Qwen3-Coder-30B-A3B MoE) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-03-14T00:27:40Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the heretic tool by introducing a native MLX backend for Apple Silicon. This integration allows users with Apple Silicon devices to run abliteration tasks with MLX-optimized models, benefiting from Metal GPU acceleration and unified memory architecture. The changes include automatic backend detection, a new MLX model implementation that mirrors the existing PyTorch interface, and robust support for quantized and Mixture-of-Experts models, all while ensuring the existing PyTorch functionality remains untouched.

Highlights

MLX Backend Integration: A complete MLX backend was added, enabling native abliteration on Apple Silicon using the MLX framework and mlx-lm.
Quantized Model Support: The system now directly supports MLX-format quantized models (e.g., 4-bit safetensors) without conversion, leveraging Metal GPU acceleration and unified memory.
Automatic Backend Detection: The backend setting now defaults to "auto", which detects MLX models, or can be explicitly set to "mlx".
Interface Compatibility: The new MLX backend implements the same public interface as the existing PyTorch Model class, ensuring transparent usage for evaluator.py and main.py.
Comprehensive Testing: The MLX backend is supported by 31 new tests (6 unit, 25 integration) covering loading, generation, abliteration, and other core functionalities.
MoE Support: The MLX backend includes support for Mixture-of-Experts (MoE) models, such as Qwen3-Coder-30B-A3B-Instruct-MLX-4bit, handling stacked expert weights and per-expert abliteration.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

config.default.toml
- Documented the new backend configuration option, including "auto", "pytorch", and "mlx" choices.
pyproject.toml
- Added mlx and mlx-lm to an optional [mlx] dependency group and configured a pytest marker for integration tests.
src/heretic/config.py
- Defined a Backend enum (AUTO, PYTORCH, MLX) and integrated it into the Settings model.
src/heretic/main.py
- Implemented resolve_backend for automatic or explicit backend selection, a create_model factory, MLX device information display, and adapted model saving/uploading logic for MLX.
src/heretic/mlx_model.py
- Added a new module containing the MLXModel class, which provides MLX-native implementations for model loading, text generation, hidden state extraction, log-probability calculation, abliteration (dequantize->modify->requantize), weight reset, and model saving.
src/heretic/utils.py
- Extended memory reporting to include MLX-specific peak and active memory usage.
tests/test_mlx_model.py
- Added a new test file with unit tests for backend configuration and interface, and integration tests for MLX model loading, layer introspection, generation, residuals, log-probabilities, abliteration, reset, chat streaming, and saving.

Activity

The author, overtimepog, initiated this pull request to add MLX backend support.
The pull request includes a comprehensive test suite with 6 unit tests and 25 integration tests, all of which are reported as passing.
The author has confirmed that the full test suite passed 3 consecutive times.
Remaining tasks include verifying the PyTorch path on non-Apple systems and testing with additional MLX model architectures.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2026-03-14T00:27:52Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

- Fix LM head projection for models with tie_word_embeddings=true (e.g. Llama, SmolLM) that reuse the embedding matrix instead of having a separate lm_head layer - Add 14 PyTorch fallback tests verifying MLX imports are lazy and the PyTorch code path is unaffected when MLX is unavailable - Tested with SmolLM2-135M-Instruct (LlamaForCausalLM, dense) in addition to Qwen3-Coder-30B-A3B (Qwen3MoeForCausalLM, MoE) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…Backend The existing code uses `backend` for the JournalFileBackend object. The new resolve_backend() return value was also named `backend`, causing it to be overwritten before create_model() was called. Renamed to `model_backend` to avoid the conflict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace full-tensor dequantization with per-expert processing: each of the 128 experts is dequantized, modified, and requantized individually, keeping peak memory bounded. Reduces abliteration memory spike from 16+ GB (OOM on 32GB systems) to ~5 GB. Also fix deprecated mx.metal.* calls to use mx.get_peak_memory() and mx.get_active_memory() directly. Adds memory budget integration tests (test_mlx_memory.py). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Monkey-patch mx.metal.device_info/get_peak_memory/get_active_memory to their current mx.* equivalents before mlx_lm uses them, silencing C++ deprecation warnings that bypass Python's warnings module. Add run_mlx.sh convenience script for running heretic with MLX backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use mlx_lm.batch_generate() for response generation instead of sequential per-prompt mlx_lm.generate() calls - Add _forward_logits_only() that uses the model's optimized __call__ for logprob computation instead of the slower layer-by-layer pass - These two changes eliminate the biggest bottlenecks in the abliteration pipeline (prefix check, evaluation, refusal counting) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

overtimepog and others added 5 commits March 13, 2026 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add MLX backend for native Apple Silicon abliteration#226

feat: add MLX backend for native Apple Silicon abliteration#226
overtimepog wants to merge 6 commits intop-e-w:masterfrom
overtimepog:feat/mlx-backend

overtimepog commented Mar 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

overtimepog commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New files

Modified files

Design decisions

MoE support

Test plan

Usage

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

overtimepog commented Mar 14, 2026 •

edited

Loading