[Enhancement] Leverage the PagedCPUGPUMemoryAllocator for the P2P Backend to recv on dev

**Is your feature request related to a problem? Please describe.**
- Currently LMCache P2PBackend leverages CPU allocator as a staging buffer for P2P transfer, this can incur an extra copy step on both hosts. Perhaps we can allow configurable allocator, and utilize the gpu allocator to use the on device buffer as staging, mitigating the extra copy on the remote host at the very least. 

**Describe the solution you'd like**
- For Ascend, we should first test the P2Pbackend whether using the PagedCPUGPUMemoryAllocator allow the HCCL transfer channel to directly execute the D2D transport succesfully.
- For LMCache modification, we should create an appropriate PR that accomodate this feature.

**Describe alternatives you've considered**
- N/A

**Additional context**
- Utilizing on device buffer consumes HBM memory, we should give advice on how much buffer is desirable.
- This is only the first step for such direct D2D RDMA transfer. 
- During the development, we should also take a look at whether independent parallelism strategy can be satisfied.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Leverage the PagedCPUGPUMemoryAllocator for the P2P Backend to recv on dev #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Enhancement] Leverage the PagedCPUGPUMemoryAllocator for the P2P Backend to recv on dev #166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions