Skip to content

Multiple test failures on ppc64le with custom semirings #1018

@strophy

Description

@strophy

Describe the bug
Multiple tests are failing on ppc64le arch only under Alpine Linux, an example failure looks like this:

31/75 Test #54: LAGraphX_PageRankGX ..................***Failed    1.43 sec
Test test_ranker...                             karate:   err: 4.012138e-05 (standard), sum(r): 1.000000e+00 iters: 19
[ FAILED ]
  test_PageRankGX.c:48: Check _Generic ((diff), GrB_Matrix : _Generic ((GrB_MINUS_FP32), GrB_Semiring : GrB_Matrix_eWiseAdd_Semiring , GrB_Monoid : GrB_Matrix_eWiseAdd_Monoid , GrB_BinaryOp : GrB_Matrix_eWiseAdd_BinaryOp ), GrB_Vector : _Generic ((GrB_MINUS_FP32), GrB_Semiring : GrB_Vector_eWiseAdd_Semiring , GrB_Monoid : GrB_Vector_eWiseAdd_Monoid , GrB_BinaryOp : GrB_Vector_eWiseAdd_BinaryOp)) (diff, ((void*)0), ((void*)0), GrB_MINUS_FP32, cmatlab, centrality, ((void*)0)) == 0... failed
karate:   err: -inf (Graphalytics), sum(r): 9.999999e-01 iters: 100
status: -1005 msg: LAGraph failure (file /builds/strophy/aports/community/suitesparse/src/SuiteSparse-7.12.1/LAGraph/src/algorithm/LAGr_PageRank.c, line 145): pagerank failed to converge in 2 iterations
west0067: err: 2.117455e-05 (standard), sum(r): 9.999999e-01 iters: 11
west0067: err: 2.076663e-05 (standard), sum(r): 9.999999e-01 iters: 100

=========== ldbc-directed-example, with sink nodes 3 and 9:
Graph: kind: directed, nodes: 10 entries: 17 type: double
  structural symmetry: unknown
  adjacency matrix: GrB_FP64 matrix: 10-by-10 entries: 17
    (0, 2)   0.5
    (0, 4)   0.3
    (1, 3)   0.1
    (1, 4)   0.3
    (1, 9)   0.12
    (2, 0)   0.53
    (2, 4)   0.62
    (2, 7)   0.21
    (2, 9)   0.52
    (4, 2)   0.69
    (4, 3)   0.53
    (4, 7)   0.1
    (5, 2)   0.23
    (5, 3)   0.39
    (6, 3)   0.83
    (7, 0)   0.39
    (8, 3)   0.69
  adjacency matrix transposed: GrB_FP64 matrix: 10-by-10 entries: 17
    (0, 2)   0.53
    (0, 7)   0.39
    (2, 0)   0.5
    (2, 4)   0.69
    (2, 5)   0.23
    (3, 1)   0.1
    (3, 4)   0.53
    (3, 5)   0.39
    (3, 6)   0.83
    (3, 8)   0.69
    (4, 0)   0.3
    (4, 1)   0.3
    (4, 2)   0.62
    (7, 2)   0.21
    (7, 4)   0.1
    (9, 1)   0.12
    (9, 2)   0.52
  out degree: GrB_INT64 vector: n: 10 entries: 8
    (0)   2
    (1)   3
    (2)   4
    (4)   3
    (5)   2
    (6)   1
    (7)   1
    (8)   1

with sinks handled properly:
ldbc-directed: err: 3.109872e-05 (standard), sum(r): 1.000000e+00, niters 12
This is the correct pagerank, with sinks handled properly:
GrB_FP32 vector: n: 10 entries: 10
    (0)   0.16977
    (1)   0.0361532
    (2)   0.167321
    (3)   0.166878
    (4)   0.154096
    (5)   0.0361532
    (6)   0.0361532
    (7)   0.11537
    (8)   0.0361532
    (9)   0.0819525

with sinks handled properly:
ldbc-directed: err: 3.521144e-05 (standard), sum(r): 9.999999e-01, niters 100
This is the correct pagerank, with sinks handled properly:
GrB_FP64 vector: n: 10 entries: 10
    (0)   0.169772
    (1)   0.0361501
    (2)   0.16733
    (3)   0.166874
    (4)   0.154103
    (5)   0.0361501
    (6)   0.0361501
    (7)   0.11537
    (8)   0.0361501
    (9)   0.0819501
FAILED: 1 of 1 unit tests has failed.

It seems that this is caused by an issue in custom JIT semirings on ppc64le arch, because applying a patch like this to use the pre-compiled semiring causes the test to pass:

diff --git a/LAGraph/experimental/algorithm/LAGr_PageRankGX.c b/LAGraph/experimental/algorithm/LAGr_PageRankGX.c
index 87e744965..8924c8cbf 100644
--- a/LAGraph/experimental/algorithm/LAGr_PageRankGX.c
+++ b/LAGraph/experimental/algorithm/LAGr_PageRankGX.c
@@ -156,7 +156,7 @@ int LAGr_PageRankGX
         GRB_TRY (GrB_assign (r, NULL, NULL, teleport + sink_value, GrB_ALL,
             n, NULL)) ;
         // r += A'*w
-        GRB_TRY (GrB_mxv (r, NULL, GrB_PLUS_FP64, LAGraph_plus_second_fp64,
+        GRB_TRY (GrB_mxv (r, NULL, GrB_PLUS_FP64, GxB_PLUS_SECOND_FP64,
             AT, w, NULL)) ;
     }

To Reproduce
Build and run tests on ppc64le

Expected behavior
Tests should pass

Desktop (please complete the following information):

  • OS: Alpine 3.22.2
  • Compiler: gcc 15.2.0
  • BLAS and LAPACK: openblas 0.3.30
  • Version: 7.12.1

Additional context
I'm just a package maintainer and don't really know how to debug this further, but from reading some other issues here it seems better to report the issue than patch the tests to use pre-compiled semirings. Help diagnosing this would be appreciated, if necessary please inspect the Alpine merge request showing logs before and after applying a partial patch.

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions