Move the prefetched info to preallocated buffers#5251
Open
chouxi wants to merge 1 commit intopytorch:mainfrom
Open
Move the prefetched info to preallocated buffers#5251chouxi wants to merge 1 commit intopytorch:mainfrom
chouxi wants to merge 1 commit intopytorch:mainfrom
Conversation
Summary: X-link: facebookresearch/FBGEMM#2219 This change improves the performance of tracking the deltas in TBE, mainly by replacing DtoH copy with {F1984231816} with DtoD copy with async DtoH under stream_callback {F1984231839} To achieve this, the following is added - the pre-registered UVA buffer that's accessible from both GPU and CPU are reused every iteration - makes the lifetime of tensors the same to TBE makes it safe to async copy. - reuse the same buffer to avoid repeating allocation. - trigger the CPU thread to async copy in raw_embedding_streamer.stream() - GPU ops don't wait on the D2H - To avoid the D2D copy overlaps with D2H copy - A GPU event to track the finish of the D2D copy, make the CPU thread to wait for the D2D copy finish - join_stream_tensor_copy_thread to trigger a blocking wait for the copy in the next iteration in case of CPU copies take too long before overwriting the pre-registered buffer. Differential Revision: D86888586
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2219
This change improves the performance of tracking the deltas in TBE, mainly by
replacing DtoH copy with {F1984231816}
with DtoD copy with async DtoH under stream_callback {F1984231839}
To achieve this, the following is added
Differential Revision: D86888586