coll/ucc: fix ucc resource leak on MPI_Finalize#13767
Open
janjust wants to merge 1 commit intoopen-mpi:mainfrom
Open
coll/ucc: fix ucc resource leak on MPI_Finalize#13767janjust wants to merge 1 commit intoopen-mpi:mainfrom
janjust wants to merge 1 commit intoopen-mpi:mainfrom
Conversation
e928049 to
3bf61b2
Compare
All UCC team teardown and context/library finalization was driven by a communicator attribute delete callback on MPI_COMM_WORLD. OMPI commit b79004e (v5: 6a581ad) intentionally skips user-defined attribute callbacks on MPI_COMM_WORLD during MPI_Finalize to fix a PETSc deadlock (sec. open-mpi#12035), so ucc_context_destroy / ucc_finalize were never called. Remove the communicator attribute mechanism entirely: - Call ucc_team_destroy() directly from mca_coll_ucc_module_destruct(), which fires for every communicator including MPI_COMM_WORLD regardless of OMPI version. - Add mca_coll_ucc_finalize_ctx() and call it from the MPI_COMM_WORLD module destructor. ompi_comm_destruct() releases coll modules (firing destructors) before releasing c_local_group or calling PML del_comm, so MPI_COMM_WORLD is still fully functional for ucc_context_destroy's OOB allgather at that point. mca_coll_ucc_close() retains the call as an idempotent safety net. Signed-off-by: Tomislav Janjusic <[email protected]>
3bf61b2 to
bbc0be5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
All UCC team teardown and context/library finalization was driven by a
communicator attribute delete callback on MPI_COMM_WORLD. OMPI commit
b79004e (v5: 6a581ad) intentionally skips user-defined attribute
callbacks on MPI_COMM_WORLD during MPI_Finalize to fix a PETSc deadlock
(sec. #12035), so ucc_context_destroy / ucc_finalize were never called.
Remove the communicator attribute mechanism entirely:
Call ucc_team_destroy() directly from mca_coll_ucc_module_destruct(),
which fires for every communicator including MPI_COMM_WORLD regardless
of OMPI version.
Add mca_coll_ucc_finalize_ctx() and call it from the MPI_COMM_WORLD
module destructor. ompi_comm_destruct() releases coll modules (firing
destructors) before releasing c_local_group or calling PML del_comm,
so MPI_COMM_WORLD is still fully functional for ucc_context_destroy's
OOB allgather at that point. mca_coll_ucc_close() retains the call as
an idempotent safety net.