Skip to content

Support 32x32 scaling for weights in MXFP8 weight quantization kernel #4185

@danielvegamyhre

Description

@danielvegamyhre

32x32 scaling is more performant by avoiding an extra scale calculation in the backward pass, and it's also better for accuracy by ensuring that there's no params getting underflowed in forward (and not contributing to output) but not getting underflowed in backward (and thus getting a gradient).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions