Skip to content

[Enhancement] Leverage the PagedCPUGPUMemoryAllocator for the P2P Backend to recv on dev #166

@matthewygf

Description

@matthewygf

Is your feature request related to a problem? Please describe.

  • Currently LMCache P2PBackend leverages CPU allocator as a staging buffer for P2P transfer, this can incur an extra copy step on both hosts. Perhaps we can allow configurable allocator, and utilize the gpu allocator to use the on device buffer as staging, mitigating the extra copy on the remote host at the very least.

Describe the solution you'd like

  • For Ascend, we should first test the P2Pbackend whether using the PagedCPUGPUMemoryAllocator allow the HCCL transfer channel to directly execute the D2D transport succesfully.
  • For LMCache modification, we should create an appropriate PR that accomodate this feature.

Describe alternatives you've considered

  • N/A

Additional context

  • Utilizing on device buffer consumes HBM memory, we should give advice on how much buffer is desirable.
  • This is only the first step for such direct D2D RDMA transfer.
  • During the development, we should also take a look at whether independent parallelism strategy can be satisfied.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions