Conversation
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: root <[email protected]>
Signed-off-by: root <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: root <[email protected]>
Greptile SummaryThis PR introduces a new Key changes:
Critical concern: Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant debug_api
participant DumpTensors
participant TensorLogger
participant _get_quantized_internals
participant Disk
User->>debug_api: inspect_tensor(layer_name, tensor_name, iteration, tensor, rowwise_qt, columnwise_qt)
debug_api->>DumpTensors: inspect_tensor(config, ...)
DumpTensors->>DumpTensors: validate rowwise == columnwise (or one is None)
DumpTensors->>DumpTensors: resolve quantized_tensor (rowwise ?? columnwise)
DumpTensors->>TensorLogger: ensure_initialized(root_log_dir)
TensorLogger-->>DumpTensors: ready
alt dump_hp=True and tensor not None
DumpTensors->>DumpTensors: dump_dict["high_precision"] = tensor
end
alt dump_quant=True and quantized_tensor not None
DumpTensors->>DumpTensors: dump_dict["quantized"] = quantized_tensor
opt dump_quantized_internals=True
DumpTensors->>_get_quantized_internals: _get_quantized_internals(quantized_tensor)
note over _get_quantized_internals: dispatch by type:<br/>Float8Tensor → data, scale_inv<br/>Float8BlockwiseQTensor → rowwise/columnwise data+scales<br/>MXFP8Tensor → data + float8_e8m0fnu scales<br/>NVFP4Tensor → packed data + unpacked FP4 values + float8_e4m3fn scales + amax
_get_quantized_internals-->>DumpTensors: internals dict
DumpTensors->>DumpTensors: dump_dict.update(internals)
end
end
DumpTensors->>TensorLogger: save_tensor(dump_dict, layer_name, tensor_name, iteration)
TensorLogger->>TensorLogger: sanitize names, build filepath
TensorLogger->>Disk: torch.save(dump_dict, "{layer}_{tensor}_iter_{iter:06d}.pt")
Disk-->>TensorLogger: saved
TensorLogger-->>DumpTensors: done
DumpTensors-->>debug_api: log success message
Last reviewed commit: b78d36f |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Paweł Gadziński <[email protected]>
Signed-off-by: root <[email protected]>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Paweł Gadziński <[email protected]>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Paweł Gadziński <[email protected]>
for more information, see https://pre-commit.ci
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Paweł Gadziński <[email protected]>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Paweł Gadziński <[email protected]>
Signed-off-by: root <[email protected]>
Signed-off-by: root <[email protected]>
for more information, see https://pre-commit.ci
|
/te-ci pytorch |
Description
This PR introduces a new debug feature focused on offline analysis of tensors.
The motivation is to make it easier to inspect and analyze intermediate tensors outside of runtime, especially during quantization debugging.
The new
DumpTensorsfeature allows saving:amaxfor NVFP4).A key context is that some quantization metadata (notably scale-related values in NVFP4 paths) is stored in compact/perf-oriented formats (
uint8-backed representations), which are efficient but hard to analyze directly.To improve offline usability, the dump path converts these values into appropriate floating-point dtypes for easier interpretation.
Type of change
Changes
Please list the changes introduced in this PR:
transformer_engine.debug.features.dump_tensors.DumpTensors.inspect_tensor.dump_quantized_internals) for FP8/FP8-blockwise/MXFP8/NVFP4 tensor types.tests/pytorch/debug/test_log.pyfor DumpTensors sanity flow.DumpTensorsindocs/debug/3_api_features.rst.Checklist