Transformers.js V4: Native WebGPU EP, repo restructuring, and more! #1382

xenova · 2025-07-31T03:16:21Z

This is the official, long-awaited PR that introduces Transformers.js V4.

See benchmarks

https://huggingface.co/onnx-community/all-MiniLM-L6-v2-ONNX:

https://huggingface.co/onnx-community/bge-base-en-v1.5-ONNX:

Improved repository formatting & consistency
- Organize modeling code into separate folders (./src/models/), grouped by model type -- models.js is getting pretty large!
- Formatting will happen later to keep the number of changes between v3 and v4 lower.
Documentation improvements and updates (particularly for older models).
Fix typing issues
React native support
New models:
- Chatterbox. Closes Add support for Chatterbox #1434
- Mamba/Mamba2 (e.g., GraniteMoeHybrid)
- LFM2-MoE
- HunYuanDenseV1
- Apertus
- Olmo3
- GPT-OSS
- FalconH1
- Youtu-LLM

Other issues:

* ONNX Runtime improvements (experimental native webgpu; fix iOS) (#1231) * customize the wasm paths * update implementation * allow using 'webgpu' in nodejs binding * update version of onnxruntime-node * Upgrade onnxruntime-web to same version as onnxruntime-node * Update list of supported devices --------- Co-authored-by: Joshua Lochner <[email protected]> * customize the wasm paths (#1250) * customize the wasm paths * update implementation * [internal] Add is_decoder option to session retrieval for preferred output location * Update tests * Formatting * Bump ort versions * Bump onnxruntime-node version * Bump versions * Bump ORT versions * Bump versions * Only check webgpu fp16 for non-node environments * Fix * Assume node supports webgpu * Update ORT node support comment * Relax test strictness * Update conversion script versions * Downgrade onnxslim * cleanup * Update package-lock.json * Update onnxruntime versions * Update post-build script * Use built-in session release function * Call garbage collection after each tokenizer test * Do not double-throw error * Fix race-condition in build process with file removal * Update versions * Bump jinja version * [version] Update to 3.6.3 * Bump jinja version to support new features * [version] Update to 3.6.3 * Add support for LFM2 models (#1367) * Use prefix in lfm2 output location (#1369) * Update package-lock.json * Run `npm audit fix` * Add special tokens in text-generation pipeline if tokenizer requires (#1370) * Add special tokens in text-generation pipeline if tokenizer requires * Fix logits processors tests * Update bundles.test.js * Update comment * Formatting * Add support for ModernBERT Decoder (#1371) * Use from/to buffer instead of string Actually fixes #1343 * Add support for Voxtral (#1373) * Support longform voxtral processing (#1375) * [version] Update to 3.7.0 * Add support for Arcee (#1377) * Optimize tensor.slice() (#1381) * Optimize tensor.slice() The performance of executing `tensor.slice()` is super poor, especially for the 'logits' tensor with large dimensions. ``` const logits = outputs.logits.slice(null, -1, null);` ``` This is because currently implementation of the `slice` method manually iterates through each element and calculate indices which is a big time consuming if the tensor shape is large. For cases like `slice(null, -1, null)`, where the slicing operation is contiguous along certain dimensions, which can be optimized by bulk copy by using `TypeArray.subarray()` and `TypeArray.set()`. * nit * Add a few more tensor slice unit tests --------- Co-authored-by: Joshua Lochner <[email protected]> --------- Co-authored-by: Yulong Wang <[email protected]> Co-authored-by: Wanming Lin <[email protected]>

* suppress console.error while creating InferenceSession * changed console suppress if not one of the misleading errors * set default logSeverityLevel and also match the ONNX_WEB.env.logLevel * indentation * small fix * some clean-up * Apply suggestions from code review Co-authored-by: Joshua Lochner <[email protected]> * added LOG_LEVELS to the top of the file --------- Co-authored-by: Joshua Lochner <[email protected]>

#1471) * added wasm cache * some refactoring of the hub.js and caching of the wasm factory * fixed comment * added string as cache return * fixes after review * Only return if match is found * Return response even if cache doesn't exist Don't throw error if we can't open cache or load file from cache, but we are able to make the request. --------- Co-authored-by: Joshua Lochner <[email protected]> Co-authored-by: Joshua Lochner <[email protected]>

xmcp · 2025-12-17T05:01:48Z

Hi @xenova ,

The two benchmark figures in this PR show pretty impressive speed improvement. Is this v4 vs v3? Or should we do something to the onnx file to achieve that speedup?

I tried the https://huggingface.co/Xenova/bge-small-zh-v1.5 model with the v4 branch + WebGPU FP16, but did not observe notable performance improvements over v3 at any batch size. So I am curious if I missed some steps. E.g., should I run the convert.py on the base BAAI/bge-small-zh-v1.5 model to make a new "optimized" version for that?

* added blob check before cahing wasm or mjs file * added propper handling for absolute/relative URLs * clean up * removed unneeded url check * use isValidUrl

* added esuild * fixed stream and stream/promises import * changes after review * Delete webpack.config.js * Bump esbuild version --------- Co-authored-by: Joshua Lochner <[email protected]>

* started refactoring * started refactoring * started refactoring * added model class files * added model class files * added model class files * all model classes in their own files * refactored PreTrainedModel * refactoring done, lets fix bugs * added model-registry * removed dev file * changed casing * refactored MODEL_TYPE_CONFIG * fixed tests * small refactoring * moved model loader to its own file * fixed ts errors * big structure refactoring * fixed build * renamed _base/pre-trained-model.js and _base/output.js * small casing changes * Update src/models/ernie4_5/modeling_ernie4_5.js Co-authored-by: Joshua Lochner <[email protected]> * refactored models/utils.js * fixed double MODEL_FOR_ definitions with registerTaskMappings helper * auto/image_processing_auto.js export * auto/image_processing_auto.js export * Improve model mapping setup * Fix LlavaPreTrainedModel * Move llava_onevision to separate files * Add missing exports * Update jinja version * Fix default class mapping * Simplify registerTaskMappings * Update registry.js * Formatting in src/models * Formatting in src * Move model-specific ModelOutput to respective modeling files * Final cleanup * Cleanup model exports * Fix Tensor type re-export * Clean up registry exports * Cleanup * Simplify loadResourceFile * Use positional arguments for repo id and filename * Update global library exports * Remove ts-expect-error * Formatting * let -> const --------- Co-authored-by: Joshua Lochner <[email protected]> Co-authored-by: Joshua Lochner <[email protected]>

xenova · 2026-01-14T22:08:37Z

Hi @xmcp 👋 I've made an optimized export for that model at https://huggingface.co/onnx-community/bge-small-zh-v1.5-ONNX, which -- if you use and run on webgpu -- you should see performance gains for!

* Remove legacy tokenizer tests * Update utils.test.js * Update unit tests * Refactor tokenizers.js * Update streamers.js * Update imports * Initial tokenization migration * Remove unnecessary index.js files * Do not export PretrainedMixin * Refactor models.js/tokenizers.js * Update imports * Update more imports * Update folder structure and imports * Delete tokenization_code_xlm_roberta.js * Fix import path * Fix typos * Update import path

* Remove legacy tokenizer tests * Update utils.test.js * Update unit tests * Refactor tokenizers.js * Update streamers.js * Update imports * Initial tokenization migration * Remove unnecessary index.js files * Do not export PretrainedMixin * Refactor models.js/tokenizers.js * Update imports * Update more imports * Update folder structure and imports * Delete tokenization_code_xlm_roberta.js * Add support for FalconH1

* Setup test case * Update audio-classification.js * Update text-to-audio pipeline: implementation and types * Implement tensor repeat and tile operations * Optimize randn implementation * Update automatic-speech-recognition.js * Update question-answering.js * Update image-classification.js * Update image-to-image.js * Update text-classification.js * Update depth-estimation.js * Update background-removal.js * Update image-segmentation JSDoc * Fix DQA JSDoc * Remove unused type * Update ObjectDetectionPipeline types * Update Text2TextGenerationPipeline types (and subclasses) * Update TokenClassificationPipeline types * Update image-to-text.js * Remove useless constructors * Update zero-shot pipeline types * Update image-segmentation.js * Create pipeline tests for type checking * Update tsconfig.json * Update onnx.js * Use defined types * Support passing speaker embeddings tensor directly

* switched to pnpm workspaces * updated github actions * added comments * Update tensor.js * Formatting * Update tsconfig.json * Update tsconfig.json * fixed circular reference error in pipelines/zero-shot-audio-classification.js * Post-tsconfig updates * Move transformers.js docs to package folder * Move additional tests * JSDoc update * Version bumps * Update incorrect test * Update test_modeling_musicgen.js * Update test_modeling_musicgen.js * Update test_modeling_musicgen.js * fixed broken symlink * fixes after review * Remove old conversion scripts Users should use onnxruntime-genai or optimum directly * Update .prettierrc * Formatting * Update readme/docs * Move build scripts to parent folder * Remove unused tests * Remove old compare function * Fix JSDoc * Update generate.js * Update inline descriptions * Bump versions * Update node imports * Add module header to FileCache.js * JSDoc updates * Update tensor.js * Move prettier config to package.json key * Update FileCache.js * Remove unused import * Remove non-existent file include * Prefer non-default exports * Update doc module exports * Update docs generation script * merged tsconfigs and added contributing.md * Update path_to_docs * Formatting * Formatting * Formatting * Update prettier usage * Remove <code> tags from headers * Swap docs-preview and docs-build commands * ONNXRUNTIME_NODE_INSTALL=skip for doc-builder * Update buildAll.mjs * Update index --------- Co-authored-by: Joshua Lochner <[email protected]>

HuggingFaceDocBuilderDev · 2026-01-29T04:03:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

huggingface/transformers#43166

xenova and others added 30 commits December 23, 2024 14:10

Move adaptive retrieval demo

1f1c645

Move code completion demo

363aefa

Move florence-2 demo

6c96600

Move depth anything demo

c4b3a76

Move tokenizer playground demo

29bdc5b

Move node audio processing demo

98958ff

Move remove background web demo

ac5b758

Move webgpu whisper demo

b3ac89e

Move depth estimation video demo

48b46d6

Move vanilla js demo

a361b38

Move node demo

1634399

Move whisper word timestamps demo

4f7b7f4

Move musicgen web demo

b0ac200

Move cross encoder demo

4ccb405

Move text-to-speech client demo

bc006e5

Move video object detection demo

7cdea27

Move video background removal demo

b053db2

Move Segment Anything demo

eacdbd1

Move zero-shot classification demo

e890716

Move browser extension template

1ee3d46

Move semantic image search demo

56297b0

Move semantic audio search demo

97a3c48

Move webgpu embedding benchmark demo

0bf2fec

Merge branch 'main' into move-examples

c033537

Merge branch 'main' into move-examples

e20167a

Merge branch 'main' into v4

140f106

Upgrade sharp version

bf12225

Bump versions

5e2c942

Delete gh-pages.yml

9f9f6f2

xenova and others added 6 commits December 7, 2025 23:44

Add support for Olmo3

91b44eb

Add support for GPT OSS

cb6c8b1

Default logging logic improvements

45c83eb

Add support for Chatterbox Turbo

aab2326

nico-martin and others added 4 commits December 21, 2025 17:57

Cache wasm file blob fix (#1489)

b4a44aa

* added blob check before cahing wasm or mjs file * added propper handling for absolute/relative URLs * clean up * removed unneeded url check * use isValidUrl

[v4] Switch build system to esbuild (#1466)

760cc66

* added esuild * fixed stream and stream/promises import * changes after review * Delete webpack.config.js * Bump esbuild version --------- Co-authored-by: Joshua Lochner <[email protected]>

Add support for HunYuanDenseV1

698dbd2

nico-martin mentioned this pull request Jan 20, 2026

Where is v4? #1500

Open

xenova and others added 4 commits January 21, 2026 12:10

xenova marked this pull request as ready for review January 29, 2026 03:34

xenova added 3 commits January 28, 2026 22:57

Merge branch 'main' into v4

b9d0d4a

Post-merge fixes

906c43f

Formatting

49c378f

xenova added 8 commits January 28, 2026 23:22

Remove duplicate pnpm versions

644376f

Set ONNXRUNTIME_NODE_INSTALL env var in actions

6a38ad5

Use inline ONNXRUNTIME_NODE_INSTALL for PR workflows

87f5682

Set padding_side if set in tokenizer config

3ae83a8

Bump tokenizers.js to v0.1.1

841401d

Add support for Youtu-LLM

34f7777

huggingface/transformers#43166

Add support for LFM2-MoE

c4d35bb

Bump jinja.js

7a03906

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformers.js V4: Native WebGPU EP, repo restructuring, and more! #1382

Transformers.js V4: Native WebGPU EP, repo restructuring, and more! #1382

xenova commented Jul 31, 2025 •

edited

Loading

Uh oh!

xmcp commented Dec 17, 2025 •

edited

Loading

Uh oh!

xenova commented Jan 14, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Transformers.js V4: Native WebGPU EP, repo restructuring, and more! #1382

Are you sure you want to change the base?

Transformers.js V4: Native WebGPU EP, repo restructuring, and more! #1382

Conversation

xenova commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xmcp commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xenova commented Jan 14, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

xenova commented Jul 31, 2025 •

edited

Loading

xmcp commented Dec 17, 2025 •

edited

Loading