-
Notifications
You must be signed in to change notification settings - Fork 439
Open
Labels
awqFor any issue / PR related to AWQ supportFor any issue / PR related to AWQ supportenhancementNew feature or requestNew feature or requestgood first issueA good first issue for users wanting to contributeA good first issue for users wanting to contributegood follow-up issueA good issue for users with some familiarity of the codebaseA good issue for users with some familiarity of the codebasekeep-open
Description
Summary
Similar to SmoothQuant, AWQ determines and applies smoothing scales. This process is facilitated by a set of model specific mappings which indicate the activation layers to smooth. Once smoothing is complete, the modifier applies redundant code that is already present within the QuantizationModifier to determine optimal quantization scales and zero_points
The scope of this task is to simplify this modifier by:
- Removing all the code responsible for generating the quantization scales and zero-points. AWQ, like SmoothQuant, should have the ability to be used in a stackable manner with other quantization modifiers such as the GPTQModifier and QuantizationModifier.
Examples 1:
recipe = [
AWQModifier(...),
QuantizationModifier(...)
]Examples 2:
recipe = [
AWQModifier(...),
GPTQModifier(...)
]Example 3:
recipe = [
AWQModifier(...),
QuIPModifier(
rotations=["v", "u"], transform_block_size=128, transform_type="hadamard"
),
QuantizationModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"]),
]- Simplify the modifier. This can be done by targeting key functionality that is quite complex, such as
_compute_best_scale
Implementation Steps
- Remove quantization scale and zero-point generation code and ensure AWQ can be used in a stackable manner with other quantization modifiers. Validate this through additional test scripts.
- Make other simplifications within the modiifer to improve the complexity of the modifier
- Update all examples and tests using the AWQModifier to now use
AWQModifierstacked with theQuantizationModifier. Validate performance and accuracy remains the same - As the AWQModifier will still require quantization arguments to generate the means, potentially add a recipe validation step to validate that these arguments match the quantization arguments passed to the subsequent quantization modifier, such as GPTQModifier and QuantizationModifier
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
awqFor any issue / PR related to AWQ supportFor any issue / PR related to AWQ supportenhancementNew feature or requestNew feature or requestgood first issueA good first issue for users wanting to contributeA good first issue for users wanting to contributegood follow-up issueA good issue for users with some familiarity of the codebaseA good issue for users with some familiarity of the codebasekeep-open