Skip to content

WebGPURenderer: Add compute shader bounds check#33186

Open
marcofugaro wants to merge 3 commits intomrdoob:devfrom
marcofugaro:compute-bounds-check
Open

WebGPURenderer: Add compute shader bounds check#33186
marcofugaro wants to merge 3 commits intomrdoob:devfrom
marcofugaro:compute-bounds-check

Conversation

@marcofugaro
Copy link
Contributor

@marcofugaro marcofugaro commented Mar 15, 2026

Related issue: Closes #33185

Description

See issue for details, added the internal WGSL bounds check with:

// bounds check
if ( instanceIndex >= nodeInstanceCount ) {
	return;
}

nodeInstanceCount is an overridable constant which by default equals to the max value representable by a u32.

@github-actions
Copy link

github-actions bot commented Mar 15, 2026

📦 Bundle size

Full ESM build, minified and gzipped.

Before After Diff
WebGL 359.97
85.54
359.97
85.54
+0 B
+0 B
WebGPU 630.72
175.04
631.01
175.17
+292 B
+128 B
WebGPU Nodes 628.84
174.75
629.13
174.88
+292 B
+129 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Before After Diff
WebGL 492.17
120.12
492.17
120.12
+0 B
+0 B
WebGPU 703.46
189.94
703.75
190.06
+292 B
+125 B
WebGPU Nodes 652.69
177.33
652.98
177.45
+292 B
+122 B

@gkjohnson
Copy link
Collaborator

gkjohnson commented Mar 16, 2026

This is not something that should be automatically done for the user in my opinion. There's workgroup size, dispatch size, and now I see there's a "count" value? I'm actually a bit confused as to how these are supposed to work together. How is a single dimensional "count" value supposed to work with the higher dimensional dispatch and workgroup size values?

edit Perhaps this is more of a comment on why this "count" value seems to already exist for compute kernels? There already seem to be so many different ways to do the same thing in TSL and many of them do not seem to interoperate cleanly.

if ( pipeline === undefined ) {

pipeline = new ComputePipeline( cacheKey, stageCompute );
pipeline = new ComputePipeline( cacheKey, stageCompute, typeof computeNode.count === 'number' ? computeNode.count : null );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip this verification, computeNode.count should be number or null

Copy link
Contributor Author

@marcofugaro marcofugaro Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in bc98b02

@sunag sunag added this to the r184 milestone Mar 16, 2026
+ globalId.z * ( ${ workgroupSizeX } * numWorkgroups.x ) * ( ${ workgroupSizeY } * numWorkgroups.y );
// bounds check
if ( instanceIndex >= nodeInstanceCount ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can remove this conditional if computeNode.count is null?

The count is not necessarily fixed too; maybe we need to consider a uniform for this.

Copy link
Contributor Author

@marcofugaro marcofugaro Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can remove this conditional if computeNode.count is null?

Done in 002482d, 0xFFFFFFFFu is now unreachable, I left it just for safety.

The count is not necessarily fixed too; maybe we need to consider a uniform for this.

I don't have the full picture, does it make sense for it to change after the ComputePipeline has been initialized? I used a constant for the pipeline specialization at compile time advantage, since the rest of the shader is now dead code.
Let me know if I should use an uniform instead.

@marcofugaro
Copy link
Contributor Author

marcofugaro commented Mar 16, 2026

@gkjohnson what do you think would be a better solution? The count logic is not exposed to the user, is internal to three.js. It is just the value the user passes to the .compute() call.

The workgroups logic is not exposed to the basic user either. As that user I'd expect that .compute(42) would run 42 times.
Personally I found out about this issue because the last item of my buffer would somehow be corrupted, as I explained in #33185.

@gkjohnson
Copy link
Collaborator

gkjohnson commented Mar 17, 2026

what do you think would be a better solution?

I think better documentation may be the first step - but everything needed to run and length check a compute shader is already available.

The workgroups logic is not exposed to the basic user either.

Can you explain what you mean by this? You can set the workgroup size like so:

// set the work group size for the shader
const kernel = fn.computeKernel( [ x, y, z ] );

The work group size is associated with the shader itself with a compute kernel function annotation meaning a new shader needs to be generated when changing it, so it makes sense for this to be associated with the TSL fn kernel rather than anywhere else. And it can be accessed via kernel.workgroupSize.

One primary issue with setting "count" on the kernel is that it doesn't actually do what it says. The workgroup size * dispatch size is actually what dictates the number of threads that get run, not this variable. Looking at the code it would seem more appropriate to call this variable a "maxThreadCount" since it will (imo confusingly) truncate the number of threads that was actually requested by the user using the dispatch syntax:

renderer.computeAsync( kernel, [ dispatch_x, dispatch_y, dispatch_z ] );

Having a thread count associated with the kernel itself does not make sense because it's common to use a compute kernel for processing multiple different buffers meaning you want to issue it with different dispatch counts. So now if you had set this "maxThreadCount" value for the kernel and then switch the data being processed to a differently sized buffer it will not work as expected.

Users should be writing their kernels to be resilient to the data sizes they are processing. It's possible to interrogate the dimensions of the buffers you're writing to and guard against writing outside the bounds of the buffer:

const fn = wgslFn( /* wgsl */`
	fn compute( globalId: vec3u ) -> void {
		let size = textureDimensions( tex );
		if ( globalId.x >= size.x || globalId.y >= size.y ) {
			return;
		}

		textureStore( tex, globalId.xy, vec4( 0, 0, 0, 1 ) );
	}
` )( { globalId } );

Now the kernel is resilient to the size of the buffer being written to and you can still take advantage of the performance benefits of a more optimally set workGroupSize. If "count" were to truncate my compute shader here it may be doing so in a way that isn't aligned with my function definition.

--

If you're not interested in tuning workgroup size for your use case then you can set the work group size to "1" and then dispatch based directly on the size of your buffer. Then you have no risk of overrunning the bounds of your data if you don't want to add a guard:

const kernel = fn( { data } ).computeKernel( [ 1, 1, 1 ] );
renderer.compute( kernel, [ data.length, 1, 1 ] );

I wasn't aware of this "count" value but based on the fact that it's inappopriately shaped as a single dimension, this seems vestigial from before 2d or 3d compute kernels were able to be run at all (see #31393). In my opinion this should be removed or at the least rethought significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WebGPURenderer: Compute shaders lack bounds checking in the shader

3 participants