Skip to content

Initial nccl implementation -- experimental#1

Open
aniabrown-nvidia wants to merge 8 commits intoalejandrogallo:cudafrom
aniabrown-nvidia:nccl
Open

Initial nccl implementation -- experimental#1
aniabrown-nvidia wants to merge 8 commits intoalejandrogallo:cudafrom
aniabrown-nvidia:nccl

Conversation

@aniabrown-nvidia
Copy link
Collaborator

Initial nccl implementation using only the default stream. Contains some temporary fixes to get code to compile and some temporary experimentation into mpi performance -- will require cleanup later. Suggest merging into a separate nccl branch for now.

Requires linking additional libraries: -lnvToolsExt -lnccl

Some extra context for commits:
01a6df2, e77d120 -- very temporary fixes to compile time and run time errors outside of dgemms -- needs looking into
5c4ce5b -- Bug fix -- source memory was not being allocated. At time of writing commit I had assumed memory for sources was allocated every iteration. This is not the case, but we may still want to allocate a pool of memory for atrip early on in program execution, particularly to address the following commit.
5c4ce5b -- This alloc was taking a similar amount of time as the dgemm for No=50. Need to check if this takes non-negligible time for larger sizes
46c56b9 -- removing host-device transfer that's not needed in the gpu source version
e95ca45 -- this 'warm up' was for experimentation only -- used to test point-to-point handle creation at start of app
7878a14 -- switching point to point comms to use nccl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant