Skip to content

Embree geometry instancing for surface element BVH setup#130

Draft
Waqar-ukaea wants to merge 3 commits intoxdg-org:mainfrom
Waqar-ukaea:embree-instancing
Draft

Embree geometry instancing for surface element BVH setup#130
Waqar-ukaea wants to merge 3 commits intoxdg-org:mainfrom
Waqar-ukaea:embree-instancing

Conversation

@Waqar-ukaea
Copy link
Copy Markdown
Collaborator

@Waqar-ukaea Waqar-ukaea commented Jun 19, 2025

Started work on switching over to embree instances to hopefully reduce the memory cost for BVHs on shared surfaces (to close issue #129). This branch seems to be working fine for ray intersections but point queries are not working. Currently the memory footprint has not been reduced at all from these changes

As far as I can tell the idea is that we create an rtcGeometry of type RTC_GEOMETRY_TYPE_INSTANCE and attach it to an rtcScene for each unique surface and then that rtcScene can be attached as an instance to our volume rtcScene TLAS. The changes I've made so far seem to be working for ray intersections but not point queries. It's possible that ray intersections automatically descend down into the BLAS instance from the TLAS but point queries maybe don't? I did make an attempt to try and do this descent manually but it didn't fix the failing tests so I've commented it out in my latest commit.

The tests in test_closest.cpp and test_normal.cpp are both failing due to xdg->closest() not returning the expected value:

TEST_CASE("Test Mesh Mock")
{
  std::shared_ptr<MeshManager> mm = std::make_shared<MeshMock>();
  mm->init(); // this should do nothing, but its good practice to call it
  std::shared_ptr<XDG> xdg = std::make_shared<XDG>(mm);
  xdg->prepare_raytracer();

  MeshID volume = mm->volumes()[0];

  Position origin {0.0, 0.0, 0.0};
  double nearest_distance {0.0};

  xdg->closest(volume, origin, nearest_distance);
  REQUIRE_THAT(nearest_distance, Catch::Matchers::WithinAbs(2.0, 1e-6));

Test result:

/home/waqar/xdg/tests/test_closest.cpp:31: FAILED:
  REQUIRE_THAT( nearest_distance, Catch::Matchers::WithinAbs(2.0, 1e-6) )
with expansion:
  7.0 is within 0.000001 of 2.0

Other changes:

  • Looking through some embree examples/tutorials they always seem to call rtcReleaseGeometry on an rtcGeometry after its been attached to an rtcScene but I notice that we hadn't been doing that previously so I don't know if this is strictly required or not.
  • register_surface() is now create_surface_instance() and returns an rtcScene (to be attached as an instance to the volume) rather than an rtcGeometry.
  • Similar changes to some of the internal ray tracer mappings.
  • I changed the way we're creating primitive_ids to be handled entirely in the create_surface_instance().

Useful links:

@Waqar-ukaea Waqar-ukaea linked an issue Jun 19, 2025 that may be closed by this pull request
@Waqar-ukaea
Copy link
Copy Markdown
Collaborator Author

Waqar-ukaea commented Sep 2, 2025

Latest commit is essentially me having started fresh with the workflow changes to creation of Embree BVHs which have been recently merged in main. I.e making the new create_surface_tree() -> register_surface() workflow make use of Embree's instancing system. This does result in some nice simplification of the logic within these functions which is good to see. The RTCIntersectionFunc works with the switch to instancing without needing any further chages which means ray_fire() and point_in_volume() calls are producing consistent results with the previous BVH construction implementation. Also means that the particle-sim miniapp produces consistent results too.

The main caveat however is still that I haven't been able to figure out how to get RTCPointQueries to work with the instancing in place. I'm seeing the same incorrect results I saw when I first refactored EmbreeRayTracer to make use of Embree instancing. See the comment above for more details: https://github.com/pshriwise/xdg/pull/130#issue-3160962007

@Waqar-ukaea
Copy link
Copy Markdown
Collaborator Author

Waqar-ukaea commented Oct 24, 2025

I have managed to resolve the issues I was having with rtcPointQuery and now have a full implementation of Embree instances working with XDG. The problem was me not properly de-referencing the new hierarchy of structures since instancing adds another layer.

I'm not actually seeing any difference in memory usage in the instancing :/

But maybe that does actually make sense. Previously we were creating creating one set of RTCGeometry and surface_user_data per surface and attaching them to multiple RTCScenes (volume TLASes). Now with instances, I am still doing the same thing and re-using the RTCGeometry/user data for each surface but instead I am actually creating an RTCScene (BLAS) for each surface and then having the volume TLAS instance those BLASes.

So if anything memory should be about the same or slightly higher because there are now more, smaller BVHs created for any given model. Because I think we have essentially gone from a few large per-volume BVHs to many small per-surface BVHs + ligher TLAS BVHs per-volume.

Whilst we don't get any memory gains, we are essentially getting a finer level of modularity and the potential for extra control over filtering (i.e culling surfaces via masks).

@Waqar-ukaea
Copy link
Copy Markdown
Collaborator Author

Waqar-ukaea commented Oct 24, 2025

Copied over from #129 (comment) - A description on how we may be able to use instancing for finer grained control over surface culling.

Another benefit of switching to per-surface instancing is that we can use instance masks to pre-filter which surface instances participate in a query. This lets us drop most instances before we enter their BLAS, reducing TLAS work for surface-scoped queries (for example xdg::surface_normal()).

Bucketed masks

Embree exposes a 32-but geometry mask. So we could use that to bucket surfaces into 32 possible groups:

// Build time (on the instance geometry):
uint32_t bucket = surfaceId % 32;
uint32_t bit = 1u << bucket;
rtcSetGeometryMask(inst, bit);

Then at ray query time set:

// Ray query time:
RTCRayHit rh{};
uint_32_t bucket = (surfaceId % 32);
rh.ray.mask = 1u << bucket ; // select the bucket for this surface
rtcIntersect(scene, &rh);

This prunes ~31/32 of instances up-front for that query, so if a volume TLAS has many surfaces, the traversal visits far fewer instance leaves. It won’t be exactly a 32× speedup but I think it will still be meaningful.

If we ever need to perform a global surface queries (using a TLAS that instantiates all surfaces), the benefit is more obvious because we have a single TLAS with many surfaces. For example, with 3200 surfaces, bucketing reduces candidates to ~100 (≈1/32) for a surface-specific query.

I'm not sure if this kind of query will ever be required but geometry masking seems like a cheap easy way to get some free speedups without any extra cost if we do end up merging this PR.

Embree Point Queries

It might not actually be possible for this to work with Embree Point Queries. I need to do some more reading but It's not clear whether or not they support geometry masking in the same way the standard ray traversal algorithms do.

Vulkan/GPRT

In our GPRT implementation we actually already make use of explicit BLAS instancing. Though I haven't currently made use of geometry masking at all. I also think that in Vulkan geometry masks are limited to 8bits which means we can only cull a maximum of 7/8 buckets before BVH traversal.

@Waqar-ukaea
Copy link
Copy Markdown
Collaborator Author

Waqar-ukaea commented Oct 24, 2025

It might not actually be possible for this to work with Embree Point Queries.

Looking through the Embree docs, it seems that PointQueries doesn't support masking unfortunately :/

So that does unfortunately put this PR in a slightly weird spot. We already perform volume scoped ray queries (ray_fire and point_in_volume) anyway which won't actually benefit from geometry masking since we don't want to exclude any surfaces in those BVH traversals.

I do still think having instances in place is probably the more modern/conventional way to approach BVH setup. It is inherently more modular and provides additional flexibility in extra levers which we might be able to use in the future.

The PR does also include some nice simplification of the create_surface_tree() and register_surface() methods which are used to setup the BVHs. There is now a much clearer separation of concerns between the two methods:

  • register_surface() --> Explicitly creates surface BLAS
  • create_surface_tree() --> Creates volume TLAS which contains instances of those surface BLASes

EDIT - To clarify, this is only for surface element BVH setup. I have not touched volumetric elements. I could do so in this PR if preferred.

@Waqar-ukaea Waqar-ukaea changed the title Embree geometry instancing Embree geometry instancing for surface element BVH setup Oct 24, 2025
@Waqar-ukaea Waqar-ukaea added the Embree Embree specific changes which have no effect on the existing GPRT interface label Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Embree Embree specific changes which have no effect on the existing GPRT interface

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switching to an instance based system for embree BVHs

2 participants