[PyTorch] Introduce semantic quantizer roles#2620
[PyTorch] Introduce semantic quantizer roles#2620negvet wants to merge 15 commits intoNVIDIA:mainfrom
Conversation
…ipe state Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR introduces semantic quantizer roles via the Key changes:
Benefits:
Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TB
subgraph "Module/Operation Layer"
Module["Module (Linear, GroupedLinear, etc.)"]
Module -->|implements| GetRoles["get_quantizer_roles()"]
GetRoles -->|returns| RoleList["List[QuantizerRole]"]
end
subgraph "QuantizerRole Dataclass"
RoleList --> Role["QuantizerRole(frozen=True)"]
Role --> ModType["module_type: str<br/>(e.g., 'linear', 'grouped_linear')"]
Role --> TensType["tensor_type: str<br/>(e.g., 'input', 'weight', 'grad_output')"]
Role --> Name["name: str<br/>(e.g., 'qkv', 'fc1')"]
Role --> Helper["is_gemm(): bool"]
end
subgraph "Recipe State Creation"
RoleList --> RecipeState["RecipeState.create(recipe, roles=...)"]
RecipeState --> CustomState["CustomRecipeState"]
CustomState --> MakeQ["make_quantizers()"]
end
subgraph "Quantizer Factory"
MakeQ --> Factory["qfactory(role: QuantizerRole)"]
Factory --> InspectRole["Inspect role.tensor_type,<br/>role.module_type, role.name"]
InspectRole --> ReturnQ["Return appropriate<br/>Quantizer instance"]
end
Factory -.->|example| CurrentScaling["current_scaling_ref_quantizer_factory:<br/>E5M2 for grad_*, E4M3 otherwise"]
Factory -.->|example| NVFP4["nvfp4_ref_rht_2d_quantizer_factory:<br/>16x16 tiles for GEMM weights,<br/>1x16 with RHT otherwise"]
Last reviewed commit: a86fdad |
This comment was marked as off-topic.
This comment was marked as off-topic.
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
timmoon10
left a comment
There was a problem hiding this comment.
Overall this design is quite clean and generalizable.
| position : str | ||
| Module-internal sub-slot. For modules that fuse multiple sequential operations, | ||
| e.g. `LayerNormMLP` has `"fc1"` and `"fc2"` sub-slots. | ||
| Empty string for simple modules. |
There was a problem hiding this comment.
I feel name and position are redundant. I see how position is basically just there to accommodate LayerNormMLP, but I'm uneasy about designing just for that (especially since it's not used publicly in Megatron-LM or Megatron-Bridge).
Instead of contorting QuantizerRole to work with LayerNormMLP, how about we contort LayerNormMLP? Instead of the module having a single name, it could have fc1_name and fc2_name.
transformer_engine/pytorch/custom_recipes/quantization_nvfp4.py
Outdated
Show resolved
Hide resolved
| base = [ | ||
| QuantizerRole(module_type="linear", tensor_type="input", name=name), | ||
| QuantizerRole(module_type="linear", tensor_type="weight", name=name), | ||
| QuantizerRole(module_type="linear", tensor_type="output", name=name), | ||
| ] | ||
| else: | ||
| base = [ | ||
| QuantizerRole(module_type="linear", tensor_type="grad_output", name=name), | ||
| QuantizerRole(module_type="linear", tensor_type="grad_input", name=name), | ||
| ] |
There was a problem hiding this comment.
"output" and "grad_input" roles don't make sense. In reality, we are implicitly assuming that the tensor will be consumed by another linear-like layer.
| base = [ | |
| QuantizerRole(module_type="linear", tensor_type="input", name=name), | |
| QuantizerRole(module_type="linear", tensor_type="weight", name=name), | |
| QuantizerRole(module_type="linear", tensor_type="output", name=name), | |
| ] | |
| else: | |
| base = [ | |
| QuantizerRole(module_type="linear", tensor_type="grad_output", name=name), | |
| QuantizerRole(module_type="linear", tensor_type="grad_input", name=name), | |
| ] | |
| base = [ | |
| QuantizerRole(module_type="linear", tensor_type="input", name=name), | |
| QuantizerRole(module_type="linear", tensor_type="weight", name=name), | |
| QuantizerRole(module_type="linear", tensor_type="input", name=name), | |
| ] | |
| else: | |
| base = [ | |
| QuantizerRole(module_type="linear", tensor_type="grad_output", name=name), | |
| QuantizerRole(module_type="linear", tensor_type="grad_output", name=name), | |
| ] |
Alternatively, if we want to use the output in FP8 DPA, the right role would be module_type="dpa" and module_type="input". We should probably make this configurable. I kind of like that this design is exposing the hidden assumptions we've been making.
| assert counts["input"] == 1 | ||
| assert counts["weight"] == 1 | ||
| assert counts["output"] == 1 | ||
| assert counts["grad_output"] == 1 | ||
| assert counts["grad_input"] == 1 |
There was a problem hiding this comment.
| assert counts["input"] == 1 | |
| assert counts["weight"] == 1 | |
| assert counts["output"] == 1 | |
| assert counts["grad_output"] == 1 | |
| assert counts["grad_input"] == 1 | |
| assert counts["input"] == 2 | |
| assert counts["weight"] == 1 | |
| assert counts["output"] == 0 | |
| assert counts["grad_output"] == 2 | |
| assert counts["grad_input"] == 0 |
Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
for more information, see https://pre-commit.ci
| def is_gemm(self) -> bool: | ||
| """Whether this role belongs to a GEMM-based module.""" | ||
| return self.module_type in self.GEMM_MODULE_TYPES | ||
|
|
There was a problem hiding this comment.
I think this is baking in assumptions about what formats are similar (our recent experiences with grouped tensors makes me wonder if the requirements for "linear" and "grouped_linear" will diverge in the future), and it's also not giving us that much convenience.
| def is_gemm(self) -> bool: | |
| """Whether this role belongs to a GEMM-based module.""" | |
| return self.module_type in self.GEMM_MODULE_TYPES |
Description
Introducing semantic quantizer roles, e.g.
linear:input,layernorm_linear:grad_output.Emitted by module/op and used through
RecipeState.create(., roles=..), so that right quantizers can be constructed without relying on index in a list.Now used only by
CustomRecipe, but can be extended to all recipes.Also extendable to arbitrary operations, e.g.
dpa:qkvanddpa:s(scores) for attention.Type of change
Changes
Please list the changes introduced in this PR:
Checklist: