Skip to content

bindings/python: free-threaded Python (3.14t) support#2041

Merged
ArthurZucker merged 11 commits into
mainfrom
freethreaded-python-support
Apr 27, 2026
Merged

bindings/python: free-threaded Python (3.14t) support#2041
ArthurZucker merged 11 commits into
mainfrom
freethreaded-python-support

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker commented Apr 27, 2026

Adds dedicated 3.14t support to the python bindings without breaking the regular CPython API surface.
The release workflow was updated but we did not specify "use_gil" -> it would be a pointless release.

Key changes:

  • Wrap PyTokenizer's inner Tokenizer in std::sync::RwLock.

  • Each #[pymodule] is now declared as #[pymodule(gil_used = false))]

  • Promote abi3 from a hardcoded pyo3 dep-feature to a project-level cargo feature (default on). Allows building without abi3 on free-threaded Python (maturin develop --no-default-features --features ext-module) — abi3 / limited API is not available under free-threading.

  • Add bindings/python/tests/test_freethreaded.py: stress tests racing N encoders against M setters on the same Tokenizer. All pass on 3.14t (4/4) and regular 3.14 (3 pass + 1 skip for the 3.14t-specific GIL check).

Building 3.14t wheels:

maturin develop --release --no-default-features --features ext-module

Adds dedicated 3.14t support to the python bindings without breaking
the regular CPython API surface.

Key changes:

- Wrap PyTokenizer's inner Tokenizer in std::sync::RwLock<Tokenizer>.
  Setters take &self + write guard; readers take a read guard. This
  removes the per-pyclass `&mut self` borrow check that races under
  free-threaded Python (`RuntimeError: Already borrowed`) and replaces
  it with a stdlib RwLock that admits concurrent encode operations
  while serializing mutations.

- Each #[pymodule] is now declared as
  `#[cfg_attr(Py_GIL_DISABLED, pymodule(gil_used = false))]` /
  `#[cfg_attr(not(Py_GIL_DISABLED), pymodule)]`. 3.14t builds opt
  into Py_MOD_GIL_NOT_USED so importing tokenizers does not re-enable
  the GIL; regular CPython behaviour is unchanged.

- Add bindings/python/build.rs calling pyo3_build_config::use_pyo3_cfgs().
  PyO3 detects free-threaded Python and emits Py_GIL_DISABLED on its
  own crate, but cargo's rustc-cfg directives don't propagate to
  dependents — use_pyo3_cfgs re-emits them so our cfg_attr fires.

- Promote `abi3` from a hardcoded pyo3 dep-feature to a project-level
  cargo feature (default on). Allows building without abi3 on
  free-threaded Python (`maturin develop --no-default-features
  --features ext-module`) — abi3 / limited API is not available
  under free-threading.

- Add bindings/python/docs/free-threading-audit.md walking through
  every mutation surface (single-field setter, top-level swap,
  compound mutation, sequence components, trainer-during-train,
  encode hot path) with verdicts and audit-trail references.

- Add bindings/python/tests/test_freethreaded.py: stress tests racing
  N encoders against M setters on the same Tokenizer. All pass on
  3.14t (4/4) and regular 3.14 (3 pass + 1 skip for the 3.14t-specific
  GIL check).

- Update README and __init__.py docstring describing the 3.14t
  behaviour and the documented compound-mutation caveat
  (`tokenizer.post_processor.special_tokens = X` is two Python steps,
  not atomic — same class as `dict[k]=v` racing `dict.clear()`).

Building 3.14t wheels:

  maturin develop --release --no-default-features --features ext-module

Regular CPython wheels are unchanged — keep `default-features = true`.
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker and others added 2 commits April 27, 2026 07:50
Drop the cfg_attr(Py_GIL_DISABLED, …) gating on every #[pymodule].
PyO3 0.28's `pymodule(gil_used = false)` emits the Py_mod_gil slot
only when the target Python recognizes it (3.13+); on older Python
versions the slot is simply not emitted. So always declaring
`gil_used = false` is a no-op on 3.10–3.12, the right thing on 3.13,
and the load-bearing thing on 3.14t.

Verified by building a single abi3 wheel and importing it on stock
CPython 3.10 / 3.11 / 3.12 / 3.13 (all clean: import + setter work)
and re-running the 3.14t stress suite (still 4/4 passing, GIL stays
off as before).
@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

/benchmark

Mechanical: run `cargo fmt` over tokenizer.rs (the .read().unwrap() /
.write().unwrap() chains it produced were too long for one line) and
`ruff format` over test_freethreaded.py.

No behavioural change. 3.14t stress suite still 4/4 passing; abi3
wheel still imports cleanly on 3.10–3.13.

The .pyi stub regeneration from `make style` is intentionally NOT
included — the current pipeline emits stubs without docstrings (the
`stub.py` enrichment step the README documents isn't actually wired
up), so re-running it on this branch would shrink every .pyi by ~80%
and lose all the inline doc text. Pre-existing issue, separate PR.
@github-actions
Copy link
Copy Markdown

Python Benchmark Results

Commit: ff6841844e1509fc17b2ef949f18e46f9d391e14

Python Benchmarks

@github-actions
Copy link
Copy Markdown

Rust Benchmark Results

Commit: ff6841844e1509fc17b2ef949f18e46f9d391e14

Rust Benchmarks

ArthurZucker and others added 6 commits April 27, 2026 15:01
Output of `cargo run --manifest-path ./tools/stub-gen/Cargo.toml`
against the current FT-aware build.
Two related changes that fix the silent docstring-stripping in the
stub generation pipeline.

1. bindings/python/Cargo.toml: pin pyo3 back to 0.28.2 (was 0.28.3).
   The Makefile's `make style` injects a `[patch.crates-io]` block in
   `.cargo/config.toml` pointing pyo3 at git rev 2ba9cda5 (which is
   pyo3 0.28.2 with the introspection metadata pyo3-introspection
   needs to read docstrings out of the cdylib). Cargo only honours a
   patch when the requested version matches, so requiring 0.28.3 in
   our deps caused cargo to silently ignore the patch — the cdylib
   then built against vanilla 0.28.3 from crates.io, with no
   docstring metadata for pyo3-introspection to find.

2. tools/stub-gen/src/main.rs: walk the introspected module and
   abort if no docstrings are present anywhere. The previous
   behaviour was to write out 7 docstring-less stubs and exit
   successfully, which only got noticed when the .pyi diff in a
   PR was -2800 lines. The new check fails loudly with a pointer
   at `[patch.crates-io]` drift, which is the root cause when this
   regresses.

3. py_src/tokenizers/*.pyi: regenerated against the patched build,
   so the docstring contents are back in.
The 3.14t job in `python.yml` was hitting `SystemError: init function
of tokenizers returned uninitialized object` because the install step
ran `pip install -e .[dev]`, which goes through maturin's PEP 660
editable path and keeps the `abi3` cargo feature on regardless of the
target interpreter. Free-threaded Python can't load an abi3 extension
(no limited API), so the resulting .so failed to initialize.

Fix the install + test steps to detect free-threading and switch
build/test behavior:

- Install: use `maturin develop --release` directly. On a GIL-enabled
  interpreter, defaults are fine (abi3 on). On free-threaded, pass
  `--no-default-features --features ext-module` so the abi3 cargo
  feature is dropped and the resulting wheel is `cp314t`-tagged
  rather than abi3-tagged.

- Run tests: `make test` runs `cargo test --no-default-features`
  which uses pyo3's `auto-initialize` and links libpython. Free-
  threaded Python on the macOS runner doesn't ship libpython3.14t
  in the framework path, so on 3.14t we run only `make test-py` and
  skip the cargo half.

- Makefile: split `test` into `test-py` (just pytest) and `test-rs`
  (cargo test); keep the original `test` target as `test-py + test-rs`
  for parity. Lets CI pick the appropriate subset per interpreter
  without duplicating the test command line.

Verified locally on 3.14t: 195 pytest items pass (4 new
test_freethreaded.py stress tests included). The 2 documentation-test
failures are a pre-existing truncated `tokenizer-wiki.json` fixture
issue, unrelated to this PR.
`make check-style` runs the stub-gen tool, which calls
`maturin develop --release` with the default cargo features (abi3 on)
and then imports the cdylib for introspection. abi3 extensions can't
load on free-threaded Python, so on 3.14t the import fails with the
familiar `SystemError: init function of tokenizers returned
uninitialized object`.

Style checks (rustfmt, ruff, ty, stub-gen) are matrix-invariant, so
gate to a single canonical combo (ubuntu-latest + 3.14) — avoids the
3.14t failure and also drops 3 redundant runs from the matrix.
…pply

Even with the dependency requirement at "0.28.2" (i.e. ^0.28.2),
cargo's resolver picks the highest matching version on crates.io —
0.28.3 — and the patched git source at rev 2ba9cda5 has manifest
version 0.28.2, so the patch's source-version pair doesn't match the
resolved 0.28.3 and cargo emits:

    warning: patch `pyo3 v0.28.2 (...)` was not used in the crate graph

The Makefile's `cargo update` doesn't downgrade — it only refreshes
within the existing requirement. Pinning exactly (`=0.28.2`) forces
the resolver to that version, which then matches the patch's source.

Switched the three pyo3 dep entries:
  pyo3                 0.28.2 -> =0.28.2
  pyo3-ffi             0.28   -> =0.28.2
  pyo3-build-config    0.28   -> =0.28.2  (build-dep)

Verified: `make style` now shows
    Docstring coverage: 188/483 items carry a docstring
with no "patch was not used" warning, and the regenerated stubs are
docstring-rich. 3.14t stress suite still 4/4 passing.
@ArthurZucker ArthurZucker marked this pull request as ready for review April 27, 2026 07:55
Copy link
Copy Markdown
Member

@McPatate McPatate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only thing I'm mad about is that you unwrap everywhere, other than that lgtm! 🔥

Comment thread bindings/python/src/decoders.rs Outdated
Comment thread bindings/python/src/tokenizer.rs Outdated
Comment thread bindings/python/src/tokenizer.rs Outdated
Comment thread bindings/python/src/tokenizer.rs Outdated
Two follow-ups on the PyTokenizer locking work, per review.

1. Wrap the inner tokenizer in Arc<RwLock<…>> instead of just RwLock<…>.
   Restores the cheap Clone semantics the pre-RwLock PyTokenizer had:
   `clone()` is now a refcount bump rather than a deep copy of the
   entire Tokenizer (model, normalizer, post-processor, etc.). Matches
   how component wrappers elsewhere in the bindings already share
   their inner state.

2. Stop unwrapping lock acquisitions; propagate errors instead. Add
   `read_inner()` / `write_inner()` helpers that map a poisoned RwLock
   to a `PyException` and return `PyResult<RwLock*Guard>`. Every call
   site goes through them with `?`, including the one in decoders.rs
   used by `step_decode_stream`.

Methods that previously returned a plain type and now use one of the
helpers were widened to `PyResult<T>` accordingly. PyO3 treats `T`
and `PyResult<T>` identically on the Python side, so there's no
public API change — just an explicit failure path for the (rare)
case of lock poisoning, instead of an opaque process panic.

Verified: 190 regular tests pass on CPython 3.14, 4/4 stress tests
pass on 3.14t. The Arc::clone is observable as a faster `t = clone(t)`
for any caller that does it.
@ArthurZucker ArthurZucker force-pushed the freethreaded-python-support branch from cd7c0b2 to b652b1a Compare April 27, 2026 09:10
@ArthurZucker ArthurZucker merged commit decd8e0 into main Apr 27, 2026
35 checks passed
@ArthurZucker ArthurZucker deleted the freethreaded-python-support branch April 27, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants