Skip to content

Port sparse inequality constraint Jacobian to GPU#40

Merged
pelesh merged 8 commits intoolcf-hackathon-2026-devfrom
kasia/inequality-jacobian
Apr 20, 2026
Merged

Port sparse inequality constraint Jacobian to GPU#40
pelesh merged 8 commits intoolcf-hackathon-2026-devfrom
kasia/inequality-jacobian

Conversation

@pelesh
Copy link
Copy Markdown
Collaborator

@pelesh pelesh commented Apr 16, 2026

Merge request type

  • New feature
  • Resolves bug
  • Documentation
  • Other

Relates to

  • OPFLOW
  • SOPFLOW
  • SCOPFLOW
  • TCOPFLOW
  • CMake build system
  • Spack configuration
  • Manual
  • Web docs
  • Other

This MR updates

  • Header files
  • Source code
  • CMake build system
  • Spack configuration
  • Web docs
  • Manual
  • Other

Summary

Replace PETSc-based inequality Jacobian with GPU RAJA kernels

Move the inequality constraint Jacobian computation for the HiOp sparse
GPU solver entirely to the device, eliminating the per-iteration host
back and forth (copy to host, PETSc compute, MatGetRow extraction, values
copy back to device). Elimiate PETSc use from this part of the code.

Three RAJA kernels now compute directly into device memory:

  • Generator set-point constraints (AGC)
  • Voltage-reactive-power bounds (FIXED_WITHIN_QBOUNDS)
  • Line flow limits (Sf^2/St^2 derivatives + slack variables)

Supporting changes:

  • Analytical NNZ counting replaces PETSc MatGetInfo at solver setup
  • New device-side parameter fields (apf, vs, xpdevidx, xslackidx,
    bus-to-gen mapping) added to *ParamsRajaHiop structs
  • Sparse position indices assigned at model setup for all three
    contribution types

Includes validation test (test_ineqjac_gpu) that solves with IPOPT,
then compares PETSc and GPU Jacobian values at the converged solution.
Optional -benchmark flag for performance comparison.

Made-with: Cursor

@pelesh pelesh changed the base branch from develop to olcf-hackathon-2026-dev April 16, 2026 21:06
Comment thread tests/unit/test_ineqjac_gpu.cpp Outdated
Copy link
Copy Markdown
Collaborator

@nkoukpaizan nkoukpaizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more minor suggestions. Looks good otherwise!

Benchmark results are promising. It would be good to see how the GPU kernel compares to a purely CPU evaluation (i.e., what portion of the PETSc path was compute versus data movement).

case_ACTIVSg200.m

=== Performance Benchmark (1000 iterations) ===
  PETSc path (compute + MatGetRow + copy): 85.9155 us/iter
  GPU path (RAJA kernels, no copies):      14.2352 us/iter
=== End Benchmark ===

case_ACTIVSg2000.m

=== Performance Benchmark (1000 iterations) ===
  PETSc path (compute + MatGetRow + copy): 670.548 us/iter
  GPU path (RAJA kernels, no copies):      15.0518 us/iter
=== End Benchmark ===

case_ACTIVSg10k.m

=== Performance Benchmark (1000 iterations) ===
  PETSc path (compute + MatGetRow + copy): 1975.1 us/iter
  GPU path (RAJA kernels, no copies):      15.803 us/iter
=== End Benchmark ===

Comment thread tests/unit/CMakeLists.txt Outdated
Comment thread tests/unit/test_ineqjac_gpu.cpp Outdated
kswirydo and others added 5 commits April 20, 2026 16:05
Replace PETSc-based inequality Jacobian with GPU RAJA kernels

Move the inequality constraint Jacobian computation for the HiOp sparse
GPU solver entirely to the device, eliminating the per-iteration host
back and forth (copy to host, PETSc compute, MatGetRow extraction, values
copy back to device). Elimiate PETSc use from this part of the code.

Three RAJA kernels now compute directly into device memory:
- Generator set-point constraints (AGC)
- Voltage-reactive-power bounds (FIXED_WITHIN_QBOUNDS)
- Line flow limits (Sf^2/St^2 derivatives + slack variables)

Supporting changes:
- Analytical NNZ counting replaces PETSc MatGetInfo at solver setup
- New device-side parameter fields (apf, vs, xpdevidx, xslackidx,
  bus-to-gen mapping) added to *ParamsRajaHiop structs
- Sparse position indices assigned at model setup for all three
  contribution types

Includes validation test (test_ineqjac_gpu) that solves with IPOPT,
then compares PETSc and GPU Jacobian values at the converged solution.
Optional -benchmark flag for performance comparison.

Made-with: Cursor
@pelesh pelesh force-pushed the kasia/inequality-jacobian branch from 62a86d9 to 580192c Compare April 20, 2026 20:11
Copy link
Copy Markdown
Collaborator

@nkoukpaizan nkoukpaizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Tests are passing on the HIP and CUDA backends.

@pelesh pelesh merged commit ccb46cf into olcf-hackathon-2026-dev Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants