Skip to content

Ifpack2 btds gemm#2829

Merged
lucbv merged 7 commits intokokkos:developfrom
lucbv:ifpack2_btds_gemm
Oct 30, 2025
Merged

Ifpack2 btds gemm#2829
lucbv merged 7 commits intokokkos:developfrom
lucbv:ifpack2_btds_gemm

Conversation

@lucbv
Copy link
Copy Markdown
Contributor

@lucbv lucbv commented Oct 16, 2025

Pushing some preliminary changes based of the previous work by @yasahi-hpc
The goal is to isolate the failure that ensue in Trilinos' BTDS solver.
The current changes are sufficient to make the BTDS fail which might indicate that the underlying issue is a bad usage of Kokkos View?

@lucbv lucbv self-assigned this Oct 16, 2025
@lucbv lucbv added the Cleanup Code maintenance that isn't a bugfix or new feature label Oct 16, 2025
Copy link
Copy Markdown
Contributor

@yasahi-hpc yasahi-hpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucbv

Thanks a lot for your help.
I think we can update the extent computation as attempted in #2667
I am not sure the correct (or expected) behavior for the stride computation

v.stride(r) for r >= View::rank()

Comment on lines +761 to +770
if (r == 0) {
int V_extent_0 = V_rank == 0 ? 0 : v.extent_int(0);
return V_extent_0;
} else if (r == 1) {
int V_extent_1 = V_rank == 0 ? 0 : V_rank == 1 ? 1 : v.extent_int(1);
return V_extent_1;
} else {
return -1;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (r == 0) {
int V_extent_0 = V_rank == 0 ? 0 : v.extent_int(0);
return V_extent_0;
} else if (r == 1) {
int V_extent_1 = V_rank == 0 ? 0 : V_rank == 1 ? 1 : v.extent_int(1);
return V_extent_1;
} else {
return -1;
}
}
if (r == 0) {
int V_extent_0 = V_rank == 0 ? 1 : v.extent_int(0);
return V_extent_0;
} else if (r == 1) {
int V_extent_1 = V_rank <= 1 ? 1 : v.extent_int(1);
return V_extent_1;
} else {
return 1;
}

I think extent would be computed like this.

@ndellingwood
Copy link
Copy Markdown
Contributor

ndellingwood commented Oct 21, 2025

I tested this PR with/without Yuichi's suggestion, gcc/11.3, OpenMPI/4.1.5 Serial build with the new View enabled, I'm still seeing the test failures:

[ndellin@blake01 build]$ ctest -R Ifpack2_BlockTriDiContainer_
Test project /home/ndellin/trilinos/Trilinos-1/Build/Blake-serial-gcc-1130-ompi-415/build
    Start 700: Ifpack2_BlockTriDiContainer_BTD_UnitTests_MPI_4
1/2 Test #700: Ifpack2_BlockTriDiContainer_BTD_UnitTests_MPI_4 ......***Failed  Required regular expression not found. Regex=[End Result: TEST PASSED
]  0.94 sec
    Start 701: Ifpack2_BlockTriDiContainer_Jacobi_UnitTests_MPI_4
2/2 Test #701: Ifpack2_BlockTriDiContainer_Jacobi_UnitTests_MPI_4 ...***Failed  Required regular expression not found. Regex=[End Result: TEST PASSED
]  1.05 sec

0% tests passed, 2 tests failed out of 2

Subproject Time Summary:
Ifpack2    =   7.99 sec*proc (2 tests)

Total Test time (real) =   2.57 sec

The following tests FAILED:
	700 - Ifpack2_BlockTriDiContainer_BTD_UnitTests_MPI_4 (Failed)
	701 - Ifpack2_BlockTriDiContainer_Jacobi_UnitTests_MPI_4 (Failed)
Errors while running CTest

Tested with:
kokkos/kokkos@4747e23
kernels with PR #2829 +/- Yuichi's suggestion
trilinos/Trilinos@69d24c6

@yasahi-hpc
Copy link
Copy Markdown
Contributor

@ndellingwood Thank you for the information
Then, it would be related to the stride computation
Concerning the correct implementation in #2593

The stride computation may be like this?

template <typename ViewType>
KOKKOS_INLINE_FUNCTION std::size_t get_stride(const ViewType &v, const int r) {
  static_assert(Kokkos::is_view_v<ViewType>, "KokkosBatched: ViewType is not a Kokkos::View.");
  constexpr std::size_t V_rank = ViewType::rank();
  static_assert(V_rank <= 2, "KokkosBatched: ViewType must have rank 0, 1 or 2.");

  if (r == 0) {
    std::size_t V_stride_0 = V_rank == 0 ? 1 : v.stride(0);
    return V_stride_0;
  } else if (r == 1) {
    std::size_t V_stride_1 = V_rank <= 1 ? 1 : v.stride(1);
    return V_stride_1;
  } else {
    return 1;
  }
}

@ndellingwood
Copy link
Copy Markdown
Contributor

Cross-referencing issue #2622

@ndellingwood
Copy link
Copy Markdown
Contributor

ndellingwood commented Oct 28, 2025

I retested as a sanity check

  • Serial build with SKX enabled had the failures, as posted above
  • Serial build with SKX disabled passed!
  • Cuda build passed!

All jobs above enabled the new View impl and complex types

@ndellingwood
Copy link
Copy Markdown
Contributor

I rebuilt with @yasahi-hpc suggestion (thank you!) , and the Serial job with SKX still exhibits the failures

@ndellingwood
Copy link
Copy Markdown
Contributor

I retested Trilinos@develop (containing the 4.7.01 releases as packages) in the Serial build with SKX and the legacy View, and the Ifpack2_BlockTriDiContainer* tests failed in that case as well, looks like we encountered a combo of issues

To move forward, it looks like we need

  1. Resolve merge conflicts
  2. DCO signoff
  3. Whether to include the extent and stride modifications from the comments
  4. CI

lucbv added 3 commits October 28, 2025 17:51
This reverts commit 386663c.

Signed-off-by: Luc Berger-Vergiat <[email protected]>
We have been liberally querrying the extent and stride of Views
without checking if the rank of the view is high enough to return
a valid value. The fixes lead to failures in BTDS which might point
to a bug that relied on the old UB behavior of Kokkos::View x(

Signed-off-by: Luc Berger-Vergiat <[email protected]>
@lucbv lucbv force-pushed the ifpack2_btds_gemm branch from 337b1f5 to c301882 Compare October 28, 2025 23:52
Signed-off-by: Luc Berger-Vergiat <[email protected]>
@yasahi-hpc
Copy link
Copy Markdown
Contributor

@ndellingwood @lucbv

Thanks a lot for your help.
Glad to hear that tests pass without SKX option.
Let us keep in touch

Making sure we return 1 as the minimum stride and extent
even if a view has a rank lower than the stride/extent querried.

Signed-off-by: Luc Berger-Vergiat <[email protected]>
@lucbv lucbv marked this pull request as ready for review October 29, 2025 14:31
@lucbv
Copy link
Copy Markdown
Contributor Author

lucbv commented Oct 29, 2025

@yasahi-hpc fixed the stride and extent formula, should be good to go with all the changes made yesterday and today

Copy link
Copy Markdown
Contributor

@yasahi-hpc yasahi-hpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changes

I do not expect these changes make any difference, but it is safer to use the value that is confirmed to work correctly

Comment thread batched/KokkosBatched_Util.hpp Outdated
Comment thread batched/KokkosBatched_Util.hpp Outdated
lucbv and others added 2 commits October 29, 2025 11:22
Co-authored-by: yasahi-hpc <[email protected]>
Signed-off-by: Luc Berger <[email protected]>
Co-authored-by: yasahi-hpc <[email protected]>
Signed-off-by: Luc Berger <[email protected]>
@lucbv
Copy link
Copy Markdown
Contributor Author

lucbv commented Oct 29, 2025

Yes, agreed, applying the changes

Copy link
Copy Markdown
Contributor

@yasahi-hpc yasahi-hpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for the fix

@lucbv lucbv merged commit 2d76deb into kokkos:develop Oct 30, 2025
24 of 25 checks passed
@lucbv lucbv deleted the ifpack2_btds_gemm branch October 30, 2025 15:11
ndellingwood pushed a commit to ndellingwood/kokkos-kernels that referenced this pull request Oct 30, 2025
* Revert "batched/dense/impl/KokkosBatched_Gemm_Team_Impl.hpp (kokkos#2626)"

This reverts commit 82605a9.

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* Revert "ConjTrans support for batched team gemm (kokkos#2580)"

This reverts commit 386663c.

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* Batch - Dense: Re-applying GEMM fixes for extent/stride

We have been liberally querrying the extent and stride of Views
without checking if the rank of the view is high enough to return
a valid value. The fixes lead to failures in BTDS which might point
to a bug that relied on the old UB behavior of Kokkos::View x(

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* Applying clang-format

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* Batched - Utils: fixing extent and stride calculation

Making sure we return 1 as the minimum stride and extent
even if a view has a rank lower than the stride/extent querried.

Signed-off-by: Luc Berger-Vergiat <[email protected]>

* Update batched/KokkosBatched_Util.hpp

Co-authored-by: yasahi-hpc <[email protected]>
Signed-off-by: Luc Berger <[email protected]>

* Update batched/KokkosBatched_Util.hpp

Co-authored-by: yasahi-hpc <[email protected]>
Signed-off-by: Luc Berger <[email protected]>

---------

Signed-off-by: Luc Berger-Vergiat <[email protected]>
Signed-off-by: Luc Berger <[email protected]>
Co-authored-by: yasahi-hpc <[email protected]>
ndellingwood added a commit that referenced this pull request Oct 30, 2025
* Revert "batched/dense/impl/KokkosBatched_Gemm_Team_Impl.hpp (#2626)"

This reverts commit 82605a9.



* Revert "ConjTrans support for batched team gemm (#2580)"

This reverts commit 386663c.



* Batch - Dense: Re-applying GEMM fixes for extent/stride

We have been liberally querrying the extent and stride of Views
without checking if the rank of the view is high enough to return
a valid value. The fixes lead to failures in BTDS which might point
to a bug that relied on the old UB behavior of Kokkos::View x(



* Applying clang-format



* Batched - Utils: fixing extent and stride calculation

Making sure we return 1 as the minimum stride and extent
even if a view has a rank lower than the stride/extent querried.



* Update batched/KokkosBatched_Util.hpp




* Update batched/KokkosBatched_Util.hpp




---------

Signed-off-by: Luc Berger-Vergiat <[email protected]>
Signed-off-by: Luc Berger <[email protected]>
Co-authored-by: Luc Berger <[email protected]>
Co-authored-by: yasahi-hpc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cleanup Code maintenance that isn't a bugfix or new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants