[pyTorch] Replace the make_empty implementation to use C++ implementation#2666
[pyTorch] Replace the make_empty implementation to use C++ implementation#2666ptrendx wants to merge 9 commits intoNVIDIA:mainfrom
Conversation
|
/te-ci L1 pytorch |
1 similar comment
|
/te-ci L1 pytorch |
| """Construct quantized tensor with uninitialized data""" | ||
| raise NotImplementedError( | ||
| f"{self.__class__.__name__} class does not implement make_empty function, " | ||
| "required for construction of unintialized quantized tensor" |
There was a problem hiding this comment.
This clear NotImplementedError is beneficial for custom quantizers that do not override make_empty().
Now, if custom quantizer does not have make_empty(), then it will fail at the C++ convert_quantizer, because there is no registered C++ converter. C++ failure with NVTE_ERROR("Unexpected type for quantizer") is not as clear as NotImplementedError.
What about making C++ error more clear or even better add a check at base Quantizer.make_empty:
def make_empty(...):
if getattr(self, "custom", False):
raise NotImplementedError(
f"{self.__class__.__name__} does not implement make_empty"
)
# ... existing C++ path ...
known quantizers Signed-off-by: Przemek Tredak <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
98f9681 to
6be430a
Compare
Signed-off-by: Przemek Tredak <[email protected]>
|
/te-ci pytorch L1 |
Greptile SummaryThis PR unifies Key changes:
Issue found:
Confidence Score: 2/5
Sequence DiagramsequenceDiagram
participant PY as Python caller
participant QBase as Quantizer.make_empty (Python base)
participant TEX as tex.create_empty_quantized_tensor (C++)
participant QCPP as QuantizerCPP::create_tensor (C++)
participant AT as at::empty / PyTorch allocator
PY->>QBase: make_empty(shape, dtype, device, pin_memory)
QBase->>QBase: normalize device (str → torch.device)
QBase->>TEX: create_empty_quantized_tensor(quantizer, shape, dtype, device, pin_memory)
TEX->>QCPP: quantizer_cpp->create_tensor(shape, te_dtype, device, pin_memory)
QCPP->>AT: at::empty(shape, opts.device(device).pinned_memory(pin_memory))
AT-->>QCPP: allocated tensor(s)
QCPP-->>TEX: (TensorWrapper, py::object)
TEX-->>QBase: py::object (QuantizedTensor)
QBase->>QBase: result.requires_grad_(True) if requires_grad
QBase-->>PY: QuantizedTensor
Last reviewed commit: 4abf5a8 |
| pin_memory, | ||
| ) | ||
| if requires_grad: | ||
| result.requires_grad_(True) |
There was a problem hiding this comment.
Doing this in C++ itslef might be faster, since we are anyway going to call the QuantizedTensor.new method with requires_grad argument. Calling this from python for custom quantized tensor has severe python overheads
There was a problem hiding this comment.
But I see it can get quite complicated since we might have to change the create_tensor API to accept the requires_grad argument.
Signed-off-by: Przemek Tredak <[email protected]>
|
/te-ci pytorch |
for more information, see https://pre-commit.ci
Signed-off-by: Przemek Tredak <[email protected]>
for more information, see https://pre-commit.ci
|
/te-ci pytorch |
Additional Comments (1)
The new This creates two problems:
Fix: Remove the shadowing local variable and use the parameter directly: The local |
Description
This PR unifies the implementation of the QuantizedTensor creation by using the C++ implementation of the create_tensor.
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: