Port sparse inequality constraint Jacobian to GPU by pelesh · Pull Request #40 · ORNL/ExaGO

pelesh · 2026-04-16T21:03:38Z

Merge request type

New feature
Resolves bug
Documentation
Other

Relates to

This MR updates

Summary

Replace PETSc-based inequality Jacobian with GPU RAJA kernels

Move the inequality constraint Jacobian computation for the HiOp sparse
GPU solver entirely to the device, eliminating the per-iteration host
back and forth (copy to host, PETSc compute, MatGetRow extraction, values
copy back to device). Elimiate PETSc use from this part of the code.

Three RAJA kernels now compute directly into device memory:

Generator set-point constraints (AGC)
Voltage-reactive-power bounds (FIXED_WITHIN_QBOUNDS)
Line flow limits (Sf^2/St^2 derivatives + slack variables)

Supporting changes:

Analytical NNZ counting replaces PETSc MatGetInfo at solver setup
New device-side parameter fields (apf, vs, xpdevidx, xslackidx,
bus-to-gen mapping) added to *ParamsRajaHiop structs
Sparse position indices assigned at model setup for all three
contribution types

Includes validation test (test_ineqjac_gpu) that solves with IPOPT,
then compares PETSc and GPU Jacobian values at the converged solution.
Optional -benchmark flag for performance comparison.

Made-with: Cursor

nkoukpaizan

Just a couple more minor suggestions. Looks good otherwise!

Benchmark results are promising. It would be good to see how the GPU kernel compares to a purely CPU evaluation (i.e., what portion of the PETSc path was compute versus data movement).

case_ACTIVSg200.m

=== Performance Benchmark (1000 iterations) ===
  PETSc path (compute + MatGetRow + copy): 85.9155 us/iter
  GPU path (RAJA kernels, no copies):      14.2352 us/iter
=== End Benchmark ===

case_ACTIVSg2000.m

=== Performance Benchmark (1000 iterations) ===
  PETSc path (compute + MatGetRow + copy): 670.548 us/iter
  GPU path (RAJA kernels, no copies):      15.0518 us/iter
=== End Benchmark ===

case_ACTIVSg10k.m

=== Performance Benchmark (1000 iterations) ===
  PETSc path (compute + MatGetRow + copy): 1975.1 us/iter
  GPU path (RAJA kernels, no copies):      15.803 us/iter
=== End Benchmark ===

Replace PETSc-based inequality Jacobian with GPU RAJA kernels Move the inequality constraint Jacobian computation for the HiOp sparse GPU solver entirely to the device, eliminating the per-iteration host back and forth (copy to host, PETSc compute, MatGetRow extraction, values copy back to device). Elimiate PETSc use from this part of the code. Three RAJA kernels now compute directly into device memory: - Generator set-point constraints (AGC) - Voltage-reactive-power bounds (FIXED_WITHIN_QBOUNDS) - Line flow limits (Sf^2/St^2 derivatives + slack variables) Supporting changes: - Analytical NNZ counting replaces PETSc MatGetInfo at solver setup - New device-side parameter fields (apf, vs, xpdevidx, xslackidx, bus-to-gen mapping) added to *ParamsRajaHiop structs - Sparse position indices assigned at model setup for all three contribution types Includes validation test (test_ineqjac_gpu) that solves with IPOPT, then compares PETSc and GPU Jacobian values at the converged solution. Optional -benchmark flag for performance comparison. Made-with: Cursor

nkoukpaizan

Looks good to me! Tests are passing on the HIP and CUDA backends.

pelesh requested review from PhilipFackler and nkoukpaizan April 16, 2026 21:03

pelesh assigned kswirydo and pelesh Apr 16, 2026

pelesh changed the base branch from develop to olcf-hackathon-2026-dev April 16, 2026 21:06

pelesh mentioned this pull request Apr 16, 2026

GPU ineqality constraints (RAJA) to OLCF Hackathon branch #39

Closed

nkoukpaizan reviewed Apr 16, 2026

View reviewed changes

Comment thread tests/unit/test_ineqjac_gpu.cpp Outdated

nkoukpaizan reviewed Apr 20, 2026

View reviewed changes

Comment thread tests/unit/CMakeLists.txt Outdated

Comment thread tests/unit/test_ineqjac_gpu.cpp Outdated

kswirydo and others added 5 commits April 20, 2026 16:05

Apply pre-commmit fixes

a34a169

Remove redundant message from CMake.

89c12ad

Remove/guard HIP specific code in test_ineqjac_gpu.cpp

888bd87

Fix names in inequality Jacobian test.

580192c

pelesh force-pushed the kasia/inequality-jacobian branch from 62a86d9 to 580192c Compare April 20, 2026 20:11

nkoukpaizan and others added 2 commits April 20, 2026 16:18

Add benchmark timing for CPU-only evaluation.

a5ad9ad

format fix

6bb54fc

pelesh mentioned this pull request Apr 20, 2026

Add benchmark timing for CPU-only evaluation. #44

Closed

20 tasks

[skip ci] Remove redundant include.

60328e8

nkoukpaizan approved these changes Apr 20, 2026

View reviewed changes

pelesh merged commit ccb46cf into olcf-hackathon-2026-dev Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port sparse inequality constraint Jacobian to GPU#40

Port sparse inequality constraint Jacobian to GPU#40
pelesh merged 8 commits intoolcf-hackathon-2026-devfrom
kasia/inequality-jacobian

pelesh commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

nkoukpaizan left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

nkoukpaizan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pelesh commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nkoukpaizan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nkoukpaizan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pelesh commented Apr 16, 2026 •

edited

Loading

nkoukpaizan left a comment •

edited

Loading