Commit 29b84c1
[Common] Fix NVFP4 tuned-kernel numerics (#2639)
* Fixed scaling-factor computation for FP32 to match the reference implementation.
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
* Uncommented the tuned kernel path
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>1 parent 94ba75d commit 29b84c1
File tree
2 files changed
+22
-7
lines changed- transformer_engine/common/cast/nvfp4
- specialized
2 files changed
+22
-7
lines changedLines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1168 | 1168 | | |
1169 | 1169 | | |
1170 | 1170 | | |
1171 | | - | |
1172 | | - | |
1173 | | - | |
1174 | | - | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
1175 | 1175 | | |
1176 | 1176 | | |
1177 | 1177 | | |
| |||
Lines changed: 18 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
163 | 163 | | |
164 | 164 | | |
165 | 165 | | |
166 | | - | |
167 | | - | |
168 | | - | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
169 | 184 | | |
170 | 185 | | |
171 | 186 | | |
| |||
0 commit comments