[BUG] import lance / lancedb crashes with SIGILL on x86_64 CPUs without AVX2 (Sandy Bridge, Ivy Bridge, FX-7500-class AMDs)

## What's broken

I have an Intel Xeon E5-2609 (Sandy Bridge, 2012 — has AVX and SSE4.2, no AVX2 and no FMA). The published lancedb wheel embeds lance, and `import lancedb` crashes immediately:

```
$ python -c "import lancedb"
[1]  Illegal instruction (core dumped)
```

A friend hit the same thing on an AMD FX-7500 (Steamroller, 2014 — has AVX + FMA, no AVX2). Both CPUs are pre-Haswell on the AVX2 timeline.

The cause is that the workspace `.cargo/config.toml` compiles with `target-cpu=haswell` + `target-feature=+avx2,+fma,+f16c`, which bakes AVX2 and FMA into every byte of compiled code — both the explicit SIMD kernels and any auto-vectorized loop in plain Rust. The existing runtime SIMD dispatch in `lance-core::utils::cpu::SIMD_SUPPORT` never gets a chance to run; the binary traps on its first AVX2 instruction at load time.

## Why it's worth fixing

The neighboring libraries in any data-science user's import path don't have this problem. On the same Sandy Bridge box:

| Library | What it ships | What we saw |
|---|---|---|
| pyarrow | `runtime_info.simd_level == 'avx'` | Imports cleanly at AVX tier |
| numpy | `baseline=X86_V2`, AVX2/AVX-512 listed under "not found" | Imports cleanly at V2 baseline |
| lancedb (embeds lance) | heavy AVX2 + AVX-512 instructions, no runtime guard | `Illegal instruction (core dumped)` |

A user who can `import numpy as np; import pyarrow as pa` cannot necessarily `import lancedb`. Lance is the outlier in the trio.

## Affected hardware

Anything pre-Haswell on the AVX2 timeline:

- **Intel**: Sandy Bridge (2011), Ivy Bridge (2012), Westmere (2010), Nehalem (2008)
- **AMD**: Bulldozer / Piledriver (2011-2012), Steamroller (2014, has FMA but no AVX2 — e.g. FX-7500)

Modern data-center hosts are all AVX2 or better, so this isn't blocking production. It does block lance on workstations, homelabs, older laptops, and any environment where someone is using lance alongside numpy and pyarrow expecting parity with how those libraries handle the hardware.

## The fix

The implementation:

- Lowers the workspace x86_64 baseline from `target-cpu=haswell` to `target-cpu=x86-64-v2` (matches numpy's published-wheel baseline — Nehalem-class)
- Adds runtime SIMD dispatch with 5-tier coverage (scalar / AVX / AVX+FMA / AVX2+FMA / AVX-512) to the f32/f64 hot distance kernels in lance-linalg
- Uses the same dispatch shape lance already uses for its u8 distance kernels (`dot_u8.rs`, `cosine_u8.rs`, `l2_u8.rs`) and for the f16/bf16 paths in `norm_l2.rs` — no new external dependencies
- Adds a `lance.simd_info()` Python API that mirrors `pyarrow.runtime_info()` so users can verify which tier dispatch picked on their host
- Adds a `qemu-x86_64 -cpu Nehalem` CI job so any future SIGILL leak fails CI before shipping
- The existing AVX2 path is preserved as one of the per-tier kernels — modern-CPU compiled output is unchanged from today, so no regression by construction; the only execution change is for hosts that today SIGILL, which now land on the AVX or scalar tier instead

Side benefit — the same workspace config change automatically fixes lance Java JNI users (the JNI build inherits the workspace baseline; no separate config there).

PR (on my fork, not upstream): https://github.com/tobocop2/lance/pull/2. Per-kernel design rationale, asm evidence, and bench methodology are in the PR description.

## Verified end-to-end on the failing hardware

On my Intel Xeon E5-2609 (Sandy Bridge — same CPU class the published wheel SIGILLs on):

```
$ grep -m1 'model name' /proc/cpuinfo
model name      : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

$ grep -m1 '^flags' /proc/cpuinfo | tr ' ' '\n' | grep -E '^(sse4_2|avx|avx2|fma|avx512f)$' | sort -u
avx
sse4_2
```

**Pre-fix** — install the published wheel, observe SIGILL:

```
$ pip install lancedb
$ python -c "import lancedb"
[1]  Illegal instruction (core dumped)
```

**Post-fix** — build the wheel from this branch (via [`tobocop2/lancedb#2`](https://github.com/tobocop2/lancedb/pull/2), which embeds [`tobocop2/lance#2`](https://github.com/tobocop2/lance/pull/2)), install, run a vector-search round-trip:

```
$ git clone -b fix/runtime-simd-pre-haswell https://github.com/tobocop2/lancedb.git
$ cd lancedb/python
$ maturin build --release
$ pip install ./target/wheels/lancedb-*.whl
$ python <<'PY'
import lancedb, tempfile, pyarrow as pa
schema = pa.schema([pa.field("vec", pa.list_(pa.float32(), 3))])
with tempfile.TemporaryDirectory() as d:
    t = lancedb.connect(d).create_table("t",
        pa.Table.from_pylist([{"vec": [1.0, 2.0, 3.0]}], schema=schema))
    print(t.search([1.0, 2.0, 3.0]).limit(1).to_arrow().to_pylist())
PY
[{'vec': [1.0, 2.0, 3.0], '_distance': 0.0}]
```

Import succeeds, table create + vector search succeed. Runtime dispatch picks the AVX tier; the AVX2/AVX-512 kernels stay dormant on this CPU and the binary doesn't trap.
---

I would really appreciate this. I'm working on https://github.com/tobocop2/lilbee and I'm really invested in this project — it would be incredible if it could target older architectures. It would be incredible to make our projects accessible to older hardware and I would really love if this could happen.

To be transparent: the PRs are on my fork (not opened upstream) because this isn't my domain of expertise and the implementation is AI-generated. I verified it works end-to-end on the failing hardware (the Sandy Bridge box above), but wanted to be upfront about that.

- lance fork PR: https://github.com/tobocop2/lance/pull/2
- lancedb fork PR: https://github.com/tobocop2/lancedb/pull/2

I'd be happy to target my PRs upstream if it makes sense and I'd be happy to roll in any feedback.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] import lance / lancedb crashes with SIGILL on x86_64 CPUs without AVX2 (Sandy Bridge, Ivy Bridge, FX-7500-class AMDs) #6618

What's broken

Why it's worth fixing

Affected hardware

The fix

Verified end-to-end on the failing hardware

Import succeeds, table create + vector search succeed. Runtime dispatch picks the AVX tier; the AVX2/AVX-512 kernels stay dormant on this CPU and the binary doesn't trap.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Library	What it ships	What we saw
pyarrow	`runtime_info.simd_level == 'avx'`	Imports cleanly at AVX tier
numpy	`baseline=X86_V2`, AVX2/AVX-512 listed under "not found"	Imports cleanly at V2 baseline
lancedb (embeds lance)	heavy AVX2 + AVX-512 instructions, no runtime guard	`Illegal instruction (core dumped)`

[BUG] import lance / lancedb crashes with SIGILL on x86_64 CPUs without AVX2 (Sandy Bridge, Ivy Bridge, FX-7500-class AMDs) #6618

Description

What's broken

Why it's worth fixing

Affected hardware

The fix

Verified end-to-end on the failing hardware

Import succeeds, table create + vector search succeed. Runtime dispatch picks the AVX tier; the AVX2/AVX-512 kernels stay dormant on this CPU and the binary doesn't trap.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions