Skip to content

Proposal: Switch to safer data_ptr API #654

@Stonepia

Description

@Stonepia

🐛 Describe the bug

TL;DR

  1. We should use safer data_ptr accessing API. Newer template APIs have additional checks. If possible, use tensor.mutable_data_ptr<T>() and tensor.const_data_ptr<T>(). Stop using <scalar_t>tensor.data_ptr().
  2. Stop using char*. Either use int * or use the allocator directly for buffer accessing.

1. Use template data_ptr API for tensor

1.1 Existing code usage and its problem

Existing code uses raw pointers like this:

// non-template usage. Avoid using this
kernel_call(
    (scalar_t*)tensor.data_ptr();
);

The above usage doesn't check whether the tensor is initialized or not. When the tensor's storage is not initialized (for example, FakeTensor), this data_ptr is null. Then the kernel may try to write to a non-initialized memory, and causing a page fault.

1.2 template data_ptr

The template data_ptr will do additional checks. Like has_storage() and storage_initialized(). Please refer to the PyTorch TensorImpl.h for detail.

In view of that, change to a safer usage for old usage.

// template usage. Not encouraged.
input_.data_ptr<scalar_t>();

1.3 New API: mutable_data_ptr() and const_data_ptr()

PyTorch has introduced new APIs for tensor.data_ptr(). Their usage is the same with the template data_ptr. Their meaning is just as the name suggests.

// Use this
input_.const_data_ptr<scalar_t>();
input_.mutable_data_ptr<scalar_t>();

2. Avoid using char * data ptr for buffer

The existing code has the following pattern:

For example, the code snippet in torch-xpu-ops/...Reduce.h:

Tensor buffer;
Tensor semaphores;

buffer = at::empty(
        config.global_memory_size(),
        at::TensorOptions().dtype(kChar).device(kXPU));
semaphores = at::empty(
        config.semaphore_size(), at::TensorOptions().dtype(kChar).device(kXPU));
    at::detail::Array<char*, 1> data;

data[0] = (char*)semaphores.data_ptr();
buf_ptr = buffer.defined() ? (void*)buffer.data_ptr() : nullptr,

The template tensor.data_ptr<T> does not support the char type. Thus, we should avoid using this. We could align with PyTorch's writing usage with the following:

2.1 Use int* instead of char*

For example, PyTorch has the following code in Normalization.cuh

at::Tensor semaphores;
semaphores = at::zeros({grid.x}, input.options().dtype(at::kInt));

int* semaphores_ptr = grid.y > 1 ? semaphores.mutable_data_ptr<int>() : nullptr;

However, the above usage is not the best practice, this usage is constructing a tensor and using data in it. We could directly use an allocator. See section 2.2.

2.2 Use the allocator directly.

For example, for the code snippet in Reduce.cuh:

at::DataPtr buffer;
at::DataPtr semaphores;

if (config.should_global_reduce()) {
    auto& allocator = *c10::cuda::CUDACachingAllocator::get();
    buffer = allocator.allocate(config.global_memory_size());
    semaphores = allocator.allocate(config.semaphore_size());
   //...
}

auto reduce_kernel = ReduceOp<scalar_t, ops_t, uint32_t, out_scalar_t, vt0>(
       buffer.get(),
      (int*)semaphores.get(),
);

It directly uses the allocator, rather than constructing the tensor. I believe that this usage could have some performance gain.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions