Skip to content

Relaxed SIMD support#6151

Merged
marxin merged 20 commits intomainfrom
relaxed-simd
Feb 17, 2026
Merged

Relaxed SIMD support#6151
marxin merged 20 commits intomainfrom
relaxed-simd

Conversation

@syrusakbary
Copy link
Member

Added support for Relaxed SIMD in Cranelift and LLVM compilers

Copilot AI review requested due to automatic review settings February 1, 2026 19:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the WebAssembly Relaxed SIMD proposal to Wasmer, implementing the feature in both the Cranelift and LLVM compiler backends.

Changes:

  • Added comprehensive test suite for Relaxed SIMD operations including relaxed min/max, multiply-add, lane selection, dot products, swizzle, truncation, and Q15 multiply-round
  • Implemented Relaxed SIMD operator support in both LLVM and Cranelift compilers
  • Added test infrastructure to support "either" assertions for non-deterministic relaxed SIMD results and enabled the relaxed_simd feature flag

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/wast/spec/proposals/relaxed-simd/*.wast Seven test files containing comprehensive spec tests for all relaxed SIMD operations
tests/lib/wast/src/wast.rs Added support for "either" assertions to handle non-deterministic relaxed SIMD behavior
tests/ignores.txt Added test ignores for singlepass (no SIMD support) and cranelift+riscv64 (no SIMD on riscv)
tests/compilers/wast.rs Added relaxed_simd feature flag detection and enablement
build.rs Registered relaxed-simd test directory for test generation
lib/compiler-llvm/src/translator/code.rs Implemented all relaxed SIMD operators in LLVM backend (min/max, madd/nmadd, laneselect, dot products, swizzle, truncation, q15mulr)
lib/compiler-cranelift/src/translator/code_translator.rs Implemented all relaxed SIMD operators in Cranelift backend with type_of function updates

Copy link
Contributor

@marxin marxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve just read the WA proposal (1) and noticed that the primary motivation for the extension is to allow more relaxed semantics for already existing WA instructions. For example, for the Operator::I8x16Swizzle instruction we currently clamp indices to the valid range (0–15) before generating the extract/insert sequence. In the relaxed variant, however, we can emit more aggressive code that is allowed to select a defined value, often resulting in fewer assembly instructions. This is essentially the core motivation behind Relaxed SIMD, and we should aim for implementations that are as fast as possible.

Aside from the newly introduced fused multiply-add (and negative multiply-add) instructions, all other instructions mainly serve as optimization opportunities, since their output semantics are intentionally more permissive.

It would also be beneficial to include the IMPLEMENTATION_DEFINED_ONE_OF documentation entry from the spec for each instruction. This would better document the expected "contract" and make the relaxed behavior more explicit. For example:

def relaxed_i8x16_swizzle(a: i8x16, s: i8x16):
    result = []
    for i in range(16):
        if s[i] < 16:
            result[i] = a[s[i]]
        elif s[i] < 128:
            result[i] = IMPLEMENTATION_DEFINED_ONE_OF(0, a[s[i] % 16])
        else:
            result[i] = 0
    return result

Footnotes

  1. https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md

ashvardanian added a commit to ashvardanian/SimSIMD that referenced this pull request Feb 2, 2026
With new FMA instructions in Relaxed WASM SIMD,
we can implement fast dot products and spatial metrics
for USearch in the browser & on edge.

Wasmer is about to gain support:
wasmerio/wasmer#6151
@marxin
Copy link
Contributor

marxin commented Feb 16, 2026

I made a few updates. Most notably, all newly added instructions now have x86_64 SIMD equivalents, allowing us to emit them efficiently (most WA instructions map 1:1 to x86_64 instructions). I also fixed the detection of the relaxed_simd feature in WASM files, which was previously broken.

@syrusakbary now it's time for your review of my changes

@marxin marxin requested a review from theduke February 16, 2026 08:35
@syrusakbary
Copy link
Member Author

I can't self approve (since I created the PR), but feel free to approve yourself and merge @marxin

@ashvardanian
Copy link

Thank you @syrusakbary & @marxin for implementing this!

I'm trying to leverage Relaxed SIMD in the upcoming bottom-up redesign of SimSIMD, one of my most widely used FOSS projects, and I've noticed that Wasmtime and other WASM runtimes may not be fully utilizing the underlying hardware potential.

Namely, sometimes they provide x86 SSE and Arm NEON paths, but don’t always leverage newer AVX-512 instructions or support RISC-V for emerging platforms. Even in other Wasmtime’s Cranelift version, it seems like dot_i8x16_i7x16_add_s maps to pmaddubsw + pmaddwd + paddd on x86, but should be just a vpdpbusd. I'd love to see these and other optimizations landing in Wasmer, and open to help 🤗


PS: Maybe cvtps2dq should be replaced with cvttps2dq?

@marxin
Copy link
Contributor

marxin commented Feb 17, 2026

I'm trying to leverage Relaxed SIMD in the upcoming bottom-up redesign of SimSIMD, one of my most widely used FOSS projects, and I've noticed that Wasmtime and other WASM runtimes may not be fully utilizing the underlying hardware potential.

Glad to hear there's a real-world use-case that will stress test WebAssembly SIMD extension. Frankly speaking, even though I implement the entire Relaxed SIMD with native instructions (at least on LLVM and x86_64), I bet there are other opportunities when it comes to native instructions and the SIMD Extension. Feel free to file issues if you spot a sub-optimal code emitted.

PS: Maybe cvtps2dq should be replaced with cvttps2dq?

Good point, applied the suggestion.

@marxin marxin enabled auto-merge (squash) February 17, 2026 08:29
Copy link
Contributor

@marxin marxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit selfish, but I am going to approve my changes ;)

@marxin marxin merged commit d1f3214 into main Feb 17, 2026
75 checks passed
@marxin marxin deleted the relaxed-simd branch February 17, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants