Proposal: Switch to safer data_ptr API

### 🐛 Describe the bug

# TL;DR

1. We should use safer data_ptr accessing API. Newer template APIs have additional checks. If possible, use `tensor.mutable_data_ptr<T>()` and `tensor.const_data_ptr<T>()`. Stop using `<scalar_t>tensor.data_ptr()`.
2. Stop using `char*`. Either use `int *` or use the allocator directly for buffer accessing.


# 1. Use template data_ptr API for tensor

## 1.1 Existing code usage and its problem

Existing code uses raw pointers like this:
```C++
// non-template usage. Avoid using this
kernel_call(
    (scalar_t*)tensor.data_ptr();
);
```
The above usage doesn't check whether the `tensor` is initialized or not. When the `tensor`'s storage is not initialized (for example, FakeTensor), this `data_ptr` is null. Then the kernel may try to write to a non-initialized memory, and causing a page fault.


## 1.2 template data_ptr

The template data_ptr will do additional checks. Like `has_storage()` and `storage_initialized()`. Please refer to the PyTorch [TensorImpl.h](https://github.com/pytorch/pytorch/blob/03f49c9523db2f307b749ef3fe1735f522941f78/c10/core/TensorImpl.h#L1561-L1564) for detail.

In view of that, change to a safer usage for old usage.

```C++
// template usage. Not encouraged.
input_.data_ptr<scalar_t>();
```

## 1.3 New API:  `mutable_data_ptr()` and `const_data_ptr()`

PyTorch has introduced new APIs for `tensor.data_ptr()`.  Their usage is the same with the template `data_ptr`. Their meaning is just as the name suggests.

```C++
// Use this
input_.const_data_ptr<scalar_t>();
input_.mutable_data_ptr<scalar_t>();
```

# 2. Avoid using `char *` data ptr for buffer

The existing code has the following pattern:

For example, the code snippet in [torch-xpu-ops/...Reduce.h](https://github.com/intel/torch-xpu-ops/blob/bbd200abe2d3a3e290c45e04127723f65293ed7e/src/ATen/native/xpu/sycl/Reduce.h#L1509-L1519):

```C++
Tensor buffer;
Tensor semaphores;

buffer = at::empty(
        config.global_memory_size(),
        at::TensorOptions().dtype(kChar).device(kXPU));
semaphores = at::empty(
        config.semaphore_size(), at::TensorOptions().dtype(kChar).device(kXPU));
    at::detail::Array<char*, 1> data;

data[0] = (char*)semaphores.data_ptr();
buf_ptr = buffer.defined() ? (void*)buffer.data_ptr() : nullptr,
```
The template `tensor.data_ptr<T>` does not support the `char` type.  Thus, we should avoid using this. We could align with PyTorch's writing usage with the following:

## 2.1 Use `int*` instead of  `char*`
For example, PyTorch has the following code in [Normalization.cuh](https://github.com/pytorch/pytorch/blob/aebfd3d4de59bb2b1053051b79b05be60d67240e/aten/src/ATen/native/cuda/Normalization.cuh#L1567)

```C++
at::Tensor semaphores;
semaphores = at::zeros({grid.x}, input.options().dtype(at::kInt));

int* semaphores_ptr = grid.y > 1 ? semaphores.mutable_data_ptr<int>() : nullptr;
```
However, the above usage is not the best practice, this usage is constructing a tensor and using data in it. We could directly use an allocator. See section 2.2.

## 2.2 Use the allocator directly.

For example, for the code snippet in [Reduce.cuh](https://github.com/pytorch/pytorch/blob/63374dda6999a2541db447ef9984ac7c2e82ea3c/aten/src/ATen/native/cuda/Reduce.cuh#L1202-L1203):

```C++
at::DataPtr buffer;
at::DataPtr semaphores;

if (config.should_global_reduce()) {
    auto& allocator = *c10::cuda::CUDACachingAllocator::get();
    buffer = allocator.allocate(config.global_memory_size());
    semaphores = allocator.allocate(config.semaphore_size());
   //...
}

auto reduce_kernel = ReduceOp<scalar_t, ops_t, uint32_t, out_scalar_t, vt0>(
       buffer.get(),
      (int*)semaphores.get(),
);
```

It directly uses the allocator, rather than constructing the tensor. I believe that this usage could have some performance gain.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Switch to safer data_ptr API #654

🐛 Describe the bug

TL;DR

1. Use template data_ptr API for tensor

1.1 Existing code usage and its problem

1.2 template data_ptr

1.3 New API: `mutable_data_ptr()` and `const_data_ptr()`

2. Avoid using `char *` data ptr for buffer

2.1 Use `int` instead of `char`

2.2 Use the allocator directly.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Switch to safer data_ptr API #654

Description

🐛 Describe the bug

TL;DR

1. Use template data_ptr API for tensor

1.1 Existing code usage and its problem

1.2 template data_ptr

1.3 New API: mutable_data_ptr() and const_data_ptr()

2. Avoid using char * data ptr for buffer

2.1 Use int* instead of char*

2.2 Use the allocator directly.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1.3 New API: `mutable_data_ptr()` and `const_data_ptr()`

2. Avoid using `char *` data ptr for buffer

2.1 Use `int` instead of `char`