Skip to content

Fix a warning with subgroup size control#2476

Open
xoofx wants to merge 1 commit intoKhronosGroup:mainfrom
xoofx:fix-warning-requiredSubgroupSizeStages
Open

Fix a warning with subgroup size control#2476
xoofx wants to merge 1 commit intoKhronosGroup:mainfrom
xoofx:fix-warning-requiredSubgroupSizeStages

Conversation

@xoofx
Copy link
Copy Markdown

@xoofx xoofx commented Mar 26, 2025

Hello there ☺️

This PR is fixing a warning when trying to use VK_EXT_subgroup_size_control (even If I keep the default 32 size). Today, if this feature is enabled (while supported), it will generate the following warning:

[Error] Validation - vkCreateShaderModule(): SPIR-V Capability VulkanMemoryModel was declared, but one of the following requirements is required (VkPhysicalDeviceVulkan12Features::vulkanMemoryModel).
The Vulkan spec states: If pCode is a pointer to SPIR-V code, and pCode declares any of the capabilities listed in the SPIR-V Environment appendix, one of the corresponding requirements must be satisfied (https://vulkan.lunarg.com/doc/view/1.4.309.0/mac/antora/spec/latest/chapters/shaders.html#VUID-VkShaderModuleCreateInfo-pCode-08740)
[Error] Validation - (VK_OBJECT_TYPE_SHADER_MODULE) vkCreateComputePipelines(): pCreateInfos[0].stage SPIR-V (VK_SHADER_STAGE_COMPUTE_BIT) is not in requiredSubgroupSizeStages (VkShaderStageFlags(0)).
The Vulkan spec states: If a VkPipelineShaderStageRequiredSubgroupSizeCreateInfo structure is included in the pNext chain, the subgroupSizeControl feature must be enabled, and stage must be a valid bit specified in requiredSubgroupSizeStages (https://vulkan.lunarg.com/doc/view/1.4.309.0/mac/antora/spec/latest/chapters/pipelines.html#VUID-VkPipelineShaderStageCreateInfo-pNext-02755)

I'm not entirely sure, but I would thought that it is only relevant for compute shaders. So that's what this PR is adding (instead of 0).

Thanks!

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 26, 2025

CLA assistant check
All committers have signed the CLA.

@cdavis5e
Copy link
Copy Markdown
Collaborator

The problem is that we don't support VkPipelineShaderStageRequiredSubgroupSizeCreateInfo. At all. This change will indeed silence the validation failure, but any extension struct you pass will continue to be ignored.

The reason RequiredSubgroupSize isn't implemented is that the threadExecutionWidth of a compute pipeline in Metal cannot yet be set--no such property exists on the MTLComputePipelineDescriptor class. I don't know if Apple will ever support actually setting the threadExecutionWidth. The best you can do right now is assert that all subgroups must be filled out (i.e. with VK_PIPELINE_SHADER_STAGE_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT)--which Metal does allow you to declare.

@xoofx
Copy link
Copy Markdown
Author

xoofx commented Mar 26, 2025

The problem is that we don't support VkPipelineShaderStageRequiredSubgroupSizeCreateInfo. At all. This change will indeed silence the validation failure, but any extension struct you pass will continue to be ignored.

Fair enough, but then, should MoltenVk still report the support of this extension then?

image

Edit: Glad to make a PR to remove that feature support otherwise.

@squidbus
Copy link
Copy Markdown
Collaborator

squidbus commented May 5, 2025

There's no indication I could find that requiredSubgroupSizeStages cannot be 0; the application must simply respect whatever bits it may or may not have set before using the structure in a chain. There are still other useful aspects of the extension such as being able to advertise the range of possible subgroups and control whether subgroups can vary or must be fully launched, even if MoltenVK does not support providing a specific size for any stage. MoltenVK is also not the only implementation to set this to 0 according to the GPU info database.

@cdavis5e
Copy link
Copy Markdown
Collaborator

cdavis5e commented May 5, 2025

I was actually disappointed that I couldn't implement actual subgroup size control--especially after I learned that Direct3D 12 has it now as well. I begged Apple to support it, but they've been less than forthcoming. I can't blame them: they may be a big company, but even they have limited engineering resources, and they can't address every one of our complaints. Plus, this would only be relevant on AMD hardware, which to my knowledge is the only hardware Apple supports where the threadExecutionWidth/SIMD-group size can vary, and their focus, for obvious and understandable reasons, is on their own GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants