Skip to content

Improve recast performance for .ply files#10

Open
dilirity wants to merge 6 commits intor5valkyrie:S16-S21-MERGEfrom
dilirity:improve/recast-performance-ply
Open

Improve recast performance for .ply files#10
dilirity wants to merge 6 commits intor5valkyrie:S16-S21-MERGEfrom
dilirity:improve/recast-performance-ply

Conversation

@dilirity
Copy link
Copy Markdown

@dilirity dilirity commented Mar 29, 2026

Summary

Optimize PLY mesh loading, parallelize shared pipeline stages (normals, BVH, rendering), and harden the PLY parser against malformed files. Builds on the OBJ performance work merged in #13.

What changed

PLY loader rewrite (MeshLoaderPly.cpp)

  • Replaced std::ifstream with single fread into a memory buffer -- eliminates millions of tiny I/O calls
  • Bulk memcpy for vertex data (binary layout matches rdVec3D)
  • Validated header parsing with strtoll and INT_MAX bounds
  • Pre-allocation bounds check: rejects files where claimed vertex/face counts exceed actual file size before allocating
  • Face index validation: every triangle index checked against [0, m_vertCount) before use
  • Integer overflow protection on all size calculations (long long arithmetic, m_triCount <= INT_MAX/3 guard)

Parallel normal calculation (MeshLoaderObj.cpp, MeshLoaderPly.cpp)

  • Chunk-based work distribution (4096 triangles per chunk) with std::atomic counter
  • Spawns hardware_concurrency - 1 worker threads writing to non-overlapping output ranges
  • Applied to both OBJ and PLY loaders

Parallel BVH construction (ChunkyTriMesh.cpp)

  • Replaced qsort + C comparators with std::sort + inlined lambdas
  • std::execution::par for sorting arrays >= 32K items
  • Parallel bounds computation across all cores (same atomic chunk pattern)
  • Parallel tree subdivision: pre-computes subtree node counts, spawns left/right on separate threads for the top 4 levels
  • maxNodes guard prevents overlap if pre-computed counts exceed allocation

VBO rendering (Editor.cpp, EditorInterfaces.cpp)

  • Cached display list vertex data uploaded to GPU via VBOs (glBufferData with GL_DYNAMIC_DRAW)
  • Each frame binds and draws from GPU memory instead of transferring from CPU RAM
  • GL buffer extension function pointers loaded via SDL_GL_GetProcAddress with null validation at startup
  • VBOs properly created, re-uploaded on cache dirty, and deleted in destructor

Navmesh cache invalidation (Editor.cpp, Editor_SoloMesh.cpp, Editor_TempObstacles.cpp, NavMeshPruneTool.cpp)

  • Added invalidateNavMeshCache() calls after mesh rebuild, navmesh load, traverse link rebuild, and static pathing rebuild

How to test

  1. Open the NavEditor
  2. Load a large PLY mesh (1GB+)
  3. Verify the mesh loads and renders at interactive frame rates (expect 60-90 FPS depending on GPU)
  4. Build a navmesh and verify it renders correctly
  5. Toggle "NavMesh" checkbox off/on -- cache should update without stalls
  6. Click "Rebuild Static Pathing Data" -- navmesh display should refresh
  7. Try loading a small/malformed PLY file -- should fail gracefully without crashing

@dilirity dilirity force-pushed the improve/recast-performance-ply branch from 6176f94 to 8f2a798 Compare April 3, 2026 19:56
@dilirity dilirity changed the base branch from main to S16-S21-MERGE April 3, 2026 19:56
@dilirity dilirity force-pushed the improve/recast-performance-ply branch from 8f2a798 to 790bbaf Compare April 3, 2026 20:09
dilirity added 2 commits April 3, 2026 23:37
Rewrite PLY loader to read the entire file in one fread with bulk
memcpy for vertices instead of millions of tiny ifstream reads.

Parallelize normal calculation across all cores for both OBJ and
PLY loaders. Parallelize BVH bounds computation in chunky tri mesh
builder. Replace qsort with std::sort (inlined comparators) and
use std::execution::par for large arrays. Parallelize BVH
subdivision by pre-computing subtree sizes and recursing left/right
on separate threads for the top 4 levels.

Upload cached display list vertex data to GPU via VBOs (glBufferData
with GL_STATIC_DRAW) so each frame just binds and draws from GPU
memory instead of transferring from CPU RAM every frame. Load GL
buffer extension function pointers via SDL_GL_GetProcAddress.
@dilirity dilirity force-pushed the improve/recast-performance-ply branch 2 times, most recently from 029def1 to 5caac7f Compare April 3, 2026 20:47
dilirity added 4 commits April 3, 2026 23:59
Guard parallel BVH subdivision against maxNodes overflow by validating
pre-computed node counts fit before spawning threads. Use GL_DYNAMIC_DRAW
instead of GL_STATIC_DRAW for VBO uploads since display list data can
change on cache invalidation. Validate GL extension function pointers
on startup and exit gracefully if VBO support is unavailable.
Replace atoi with strtol for vertex/face count parsing to detect
overflow and malformed values. Use long long instead of ssize_t for
file size and vertex byte calculations to avoid truncation on 32-bit.
@dilirity dilirity marked this pull request as ready for review April 3, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant