-
Notifications
You must be signed in to change notification settings - Fork 52
Feat/celllist #1389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
janbridley
wants to merge
114
commits into
main
Choose a base branch
from
feat/celllist
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Feat/celllist #1389
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Add a new cell list nearest neighbor search, which is significantly faster than the previous AABBQuery. Note that this is a fundamentally different architecture from LinkCell, which is extremely slow compared to both alternatives.
TODOs:
Architecture
This neighbor list is based on a spatially sorted linear memory region, with cells adjacent in the X direction contiguous in memory. We defer construction until the user attempts a query, allowing us to choose the optimal cell width for a given lookup. For
num_nearestlookups, we estimate the cell width based on the density of the system, with an empirically determined scale factor for performance. Our choice of construction guarantees we generate every necessary ghost particle in a single layer of ghost cells, which is optimal for performance inr_maxqueries. Fornum_nearestqueries, no such guarantee is possible and so we fall back to wrapping neighbor particles should we need to look outside the first shell of cells.Performance
For performance, I have two benchmarks -- one based on constructing a full neighbor list in python (
cq.toNeighborList()) and a more representative test based on computing the RDF of a system with a single bin. This latter benchmark aims to test freud's internal use of lists inNeighborComputeFunctional, and is what the text outputs of the code measure performance for. Note that generating random systems and computing the RDF itself takes ~25-30% of the runtime of this benchmark, so the % change performance numbers are underestimates.Note that, because of the way we handle ghosts, the largest performance improvements are only realized in ball queries. kNN is still faster than AABBQuery, but by a factor of ~20% rather than 60%+.
Note in the benchmarks below, I test against vesin, which is the fastest nearest neighbor library I've found. Note the results are not fully comparable, however, as they do not use the freud
toNeighborListfunction that dominates the runtime.OSX M1 Pro (Python 3.13)
Benchmarks for uniform random systems in random, lightly sheared boxes. r_cut=1.5 and rho=0.5.
Purdue Anvil,
-n 8Comments
freud's
toNeighborListis extremely slow for systems of reasonable size (<100k particles), mainly due to overhead in the tbb parallel loop. This is true more generally, with many parallel loops in freud incurring performance costs for small-ish systems. This is not a surprise, but does indicate an opportunity for more performance in the future -- although I don't recall where I saw these figures, I have seen commentary that bs_thread_pool (which we use in SPATULA) has much lower overhead for a similar work-stealing paradigm. We do use more of TBB's machinery throughout freud, but it's worth considering.Secondly, freud's lazy evaluation of neighbors makes evaluation of certain order parameters relatively inefficient. The issue is that we interleave the pair bond calculations (which have a fair amount of branching and indirection) with what are otherwise fairly dense calculations. This is most notable in fast order parameters like nematic and BOOD, but is true to a lesser extent for
environmentanddensityOPs as well.Note that this cell list is optimized for uniform, dense systems which is a common pattern within the glotzerlab but perhaps not more generally. AABBQuery will be faster for spatially inhomogeneous data, although we avoid common problems with low-density simulations due to the linear layout of our memory. Because we never rebuild neighbor lists in freud, low-occupancy bins can be stored as efficiently as larger ones, and empty bins can be skipped in the spatial sort entirely.
There is a wide variety of (reasonably) modern literature on neighbor list calculation. GROMACS advocates for a blocked tree-based neighbor list similar to the current AABBQuery, with a few extra tweaks for SIMD between the particles themselves. I tested this as well, but the pattern does not fit freud's lazy evaluation well and the performance was not competitive for reasonable particle counts. There is also research on novel neighbor finding methods for (1) spatially inhomogeneous data (SNN, an approach I really like, but it degenerates to all pairs in the uniform case so not useful for crystals) and (2) kNN queries, which do not translate as efficiently to ball queries as the current code in the other direction.
Motivation and Context
Resolves: #???
How Has This Been Tested?
Tests extending the existing
NeighborQueryTestclass have been implemented, with a variety of random systems also tested offline.Checklist: