Feat/celllist #1389

janbridley · 2026-01-22T20:20:46Z

Description

Add a new cell list nearest neighbor search, which is significantly faster than the previous AABBQuery. Note that this is a fundamentally different architecture from LinkCell, which is extremely slow compared to both alternatives.

TODOs:

Fix on windows
Clean up and lint
Request review

Architecture

This neighbor list is based on a spatially sorted linear memory region, with cells adjacent in the X direction contiguous in memory. We defer construction until the user attempts a query, allowing us to choose the optimal cell width for a given lookup. For num_nearest lookups, we estimate the cell width based on the density of the system, with an empirically determined scale factor for performance. Our choice of construction guarantees we generate every necessary ghost particle in a single layer of ghost cells, which is optimal for performance in r_max queries. For num_nearest queries, no such guarantee is possible and so we fall back to wrapping neighbor particles should we need to look outside the first shell of cells.

Performance

For performance, I have two benchmarks -- one based on constructing a full neighbor list in python (cq.toNeighborList()) and a more representative test based on computing the RDF of a system with a single bin. This latter benchmark aims to test freud's internal use of lists in NeighborComputeFunctional, and is what the text outputs of the code measure performance for. Note that generating random systems and computing the RDF itself takes ~25-30% of the runtime of this benchmark, so the % change performance numbers are underestimates.

Note that, because of the way we handle ghosts, the largest performance improvements are only realized in ball queries. kNN is still faster than AABBQuery, but by a factor of ~20% rather than 60%+.

Note in the benchmarks below, I test against vesin, which is the fastest nearest neighbor library I've found. Note the results are not fully comparable, however, as they do not use the freud toNeighborList function that dominates the runtime.

OSX M1 Pro (Python 3.13)

Benchmarks for uniform random systems in random, lightly sheared boxes. r_cut=1.5 and rho=0.5.

============================================================
PERCENTAGE IMPROVEMENT: RDF (CellQuery vs AABBQuery)
============================================================

N = 1,000 particles:
  Serial:   +44.0% (AABB: 1.122ms -> Cell: 0.628ms)
  Parallel: +42.3% (AABB: 0.338ms -> Cell: 0.195ms)

N = 2,000 particles:
  Serial:   +49.1% (AABB: 2.305ms -> Cell: 1.174ms)
  Parallel: +45.0% (AABB: 0.685ms -> Cell: 0.377ms)

N = 4,000 particles:
  Serial:   +53.1% (AABB: 5.007ms -> Cell: 2.348ms)
  Parallel: +55.1% (AABB: 1.305ms -> Cell: 0.586ms)

N = 8,000 particles:
  Serial:   +53.7% (AABB: 11.093ms -> Cell: 5.133ms)
  Parallel: +48.7% (AABB: 2.672ms -> Cell: 1.371ms)

N = 16,000 particles:
  Serial:   +53.3% (AABB: 23.478ms -> Cell: 10.958ms)
  Parallel: +59.0% (AABB: 5.807ms -> Cell: 2.381ms)

N = 32,000 particles:
  Serial:   +56.5% (AABB: 43.443ms -> Cell: 18.909ms)
  Parallel: +64.2% (AABB: 11.455ms -> Cell: 4.097ms)

Average improvement across all particle counts:
  Serial:   +51.6%
  Parallel: +52.4%
============================================================

============================================================
PERCENTAGE IMPROVEMENT: k-NN RDF (CellQuery vs AABBQuery)
============================================================

N = 1,000 particles:
  Serial:   +6.0% (AABB: 2.736ms -> Cell: 2.573ms)
  Parallel: +3.4% (AABB: 0.670ms -> Cell: 0.647ms)

N = 2,000 particles:
  Serial:   +4.1% (AABB: 5.683ms -> Cell: 5.452ms)
  Parallel: +7.7% (AABB: 1.444ms -> Cell: 1.332ms)

N = 4,000 particles:
  Serial:   +10.4% (AABB: 11.966ms -> Cell: 10.719ms)
  Parallel: +13.7% (AABB: 2.933ms -> Cell: 2.530ms)

N = 8,000 particles:
  Serial:   +11.8% (AABB: 24.377ms -> Cell: 21.499ms)
  Parallel: +21.7% (AABB: 5.526ms -> Cell: 4.327ms)

N = 16,000 particles:
  Serial:   +14.4% (AABB: 50.328ms -> Cell: 43.062ms)
  Parallel: +20.7% (AABB: 10.548ms -> Cell: 8.369ms)

N = 32,000 particles:
  Serial:   +12.8% (AABB: 99.519ms -> Cell: 86.752ms)
  Parallel: +24.0% (AABB: 20.818ms -> Cell: 15.825ms)

Average improvement across all particle counts:
  Serial:   +9.9%
  Parallel: +15.2%
============================================================

Purdue Anvil, `-n 8`

============================================================
PERCENTAGE IMPROVEMENT: RDF (CellQuery vs AABBQuery)
============================================================

N = 1,000 particles:
  Serial:   +52.6% (AABB: 1.936ms -> Cell: 0.917ms)
  Parallel: +41.2% (AABB: 0.418ms -> Cell: 0.246ms)

N = 2,000 particles:
  Serial:   +53.8% (AABB: 4.228ms -> Cell: 1.953ms)
  Parallel: +45.9% (AABB: 0.891ms -> Cell: 0.481ms)

N = 4,000 particles:
  Serial:   +55.6% (AABB: 8.794ms -> Cell: 3.903ms)
  Parallel: +48.5% (AABB: 1.751ms -> Cell: 0.901ms)

N = 8,000 particles:
  Serial:   +59.7% (AABB: 17.038ms -> Cell: 6.873ms)
  Parallel: +50.6% (AABB: 3.196ms -> Cell: 1.580ms)

N = 16,000 particles:
  Serial:   +61.1% (AABB: 36.066ms -> Cell: 14.044ms)
  Parallel: +60.6% (AABB: 8.019ms -> Cell: 3.158ms)

N = 32,000 particles:
  Serial:   +60.5% (AABB: 78.146ms -> Cell: 30.855ms)
  Parallel: +63.3% (AABB: 18.246ms -> Cell: 6.698ms)

Average improvement across all particle counts:
  Serial:   +57.2%
  Parallel: +51.7%
============================================================

============================================================
PERCENTAGE IMPROVEMENT: k-NN RDF (CellQuery vs AABBQuery)
============================================================

N = 1,000 particles:
  Serial:   +26.0% (AABB: 5.864ms -> Cell: 4.341ms)
  Parallel: +0.1% (AABB: 0.849ms -> Cell: 0.849ms)

N = 2,000 particles:
  Serial:   +18.3% (AABB: 10.633ms -> Cell: 8.689ms)
  Parallel: +5.8% (AABB: 1.734ms -> Cell: 1.634ms)

N = 4,000 particles:
  Serial:   +4.6% (AABB: 18.370ms -> Cell: 17.533ms)
  Parallel: +12.7% (AABB: 3.515ms -> Cell: 3.069ms)

N = 8,000 particles:
  Serial:   +5.9% (AABB: 37.618ms -> Cell: 35.386ms)
  Parallel: -12.8% (AABB: 5.855ms -> Cell: 6.604ms)

N = 16,000 particles:
  Serial:   +7.4% (AABB: 77.515ms -> Cell: 71.810ms)
  Parallel: +18.1% (AABB: 16.315ms -> Cell: 13.358ms)

N = 32,000 particles:
  Serial:   +11.3% (AABB: 188.828ms -> Cell: 167.549ms)
  Parallel: +14.3% (AABB: 31.584ms -> Cell: 27.060ms)

Average improvement across all particle counts:
  Serial:   +12.2%
  Parallel: +6.4%
============================================================

Comments

freud's toNeighborList is extremely slow for systems of reasonable size (<100k particles), mainly due to overhead in the tbb parallel loop. This is true more generally, with many parallel loops in freud incurring performance costs for small-ish systems. This is not a surprise, but does indicate an opportunity for more performance in the future -- although I don't recall where I saw these figures, I have seen commentary that bs_thread_pool (which we use in SPATULA) has much lower overhead for a similar work-stealing paradigm. We do use more of TBB's machinery throughout freud, but it's worth considering.

Secondly, freud's lazy evaluation of neighbors makes evaluation of certain order parameters relatively inefficient. The issue is that we interleave the pair bond calculations (which have a fair amount of branching and indirection) with what are otherwise fairly dense calculations. This is most notable in fast order parameters like nematic and BOOD, but is true to a lesser extent for environment and density OPs as well.

Note that this cell list is optimized for uniform, dense systems which is a common pattern within the glotzerlab but perhaps not more generally. AABBQuery will be faster for spatially inhomogeneous data, although we avoid common problems with low-density simulations due to the linear layout of our memory. Because we never rebuild neighbor lists in freud, low-occupancy bins can be stored as efficiently as larger ones, and empty bins can be skipped in the spatial sort entirely.

There is a wide variety of (reasonably) modern literature on neighbor list calculation. GROMACS advocates for a blocked tree-based neighbor list similar to the current AABBQuery, with a few extra tweaks for SIMD between the particles themselves. I tested this as well, but the pattern does not fit freud's lazy evaluation well and the performance was not competitive for reasonable particle counts. There is also research on novel neighbor finding methods for (1) spatially inhomogeneous data (SNN, an approach I really like, but it degenerates to all pairs in the uniform case so not useful for crystals) and (2) kNN queries, which do not translate as efficiently to ball queries as the current code in the other direction.

Motivation and Context

Resolves: #???

How Has This Been Tested?

Tests extending the existing NeighborQueryTest class have been implemented, with a variety of random systems also tested offline.

Checklist:

I have reviewed the Contributor Guidelines.
I agree with the terms of the Freud Contributor Agreement.
My name is on the list of contributors in the pull request source branch.
I have updated the Change log.

janbridley added 30 commits January 14, 2026 11:41

Add LinearCell wrapper

b6395ac

Add LinearCell tests

cc74b83

Link linearcell file

2f98a3f

basic class template

1e5b102

Properly link

dee3c37

Attempt forward declaration

3a3ef3d

Include guards and proper forward declaration

aa29b83

Actually correct declaration now

eb55082

Refactor out selection

414f9d2

Comment out linearcell tests

fea3624

Refactor out updateImageVectors

f418ead

Clean up and document

588da8f

Notes and planning

811adb5

makeGrid helper

5df534e

lint

bbf6ef9

standardize naming

c9fffd7

WIP on buildGrid

c4909c8

TaggedParticle->TaggedPosition

1684d3a

WIP: ghosts

4d7ae4c

Make structs private

3ccb38a

Finish ghosts

eac585a

use correct index

d15b350

restore m_n_total

3887cbe

prefixsum

0dcf02f

prefix sum

b1f0a51

Finish construction

8873a4f

doc

ce90d89

clean up for python

df13b70

Export

35de55d

Fix

23fd105

janbridley added 30 commits January 16, 2026 13:36

Refactor out prefix sum

bba9b1e

Minor logic simplification

5d8740e

Correct closed form for grid dims

4945a24

Refactor out placeParticles fn

12c28dc

Simplify comparisons from bit cast

3cc11b4

KNN (effectively falls back to all pairs)

65ff09f

More sane handling of knn

ae15886

simplify

7a6b988

lint

babeeb0

more notes

709496e

clarify

40c5bde

Refactor out for shells

6aeb3ff

extra shells

3c54052

update constructor

7c0016e

no cout

9f0df1c

Wrap in cell(TEMP)

3395ad9

Properly handle wrapping

85f3849

refactor out loop

23e0698

Clean up

c58861f

getSafeRMax

4f0f901

fix safe_n logic

0851ed6

Tweak params for r_guess

eacecc3

lint

c8b4b28

process blocks

54dc8ef

Unordered map

03320d8

Missing header on windows

bc5797d

lint

b466d39

lints

84b95bb

Lint

5c95e81

Merge remote-tracking branch 'origin/main' into feat/celllist

a55e277

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/celllist #1389

Feat/celllist #1389

Uh oh!

janbridley commented Jan 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feat/celllist #1389

Are you sure you want to change the base?

Feat/celllist #1389

Uh oh!

Conversation

janbridley commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

TODOs:

Architecture

Performance

OSX M1 Pro (Python 3.13)

Purdue Anvil, -n 8

Comments

Motivation and Context

How Has This Been Tested?

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

janbridley commented Jan 22, 2026 •

edited

Loading

Purdue Anvil, `-n 8`