Merge branch 'development' into change_species_threshold_default

zingale · web-flow · commit 81d47c25d428 · 2026-04-12T08:37:32.000-04:00
diff --git a/.github/workflows/draft-pdf.yml b/.github/workflows/draft-pdf.yml
@@ -24,5 +24,6 @@ jobs:
           # PDF. Note, this should be the same directory as the input
           # paper.md
           path: paper/paper.pdf
+          archive: false
 
 # see https://github.com/marketplace/actions/open-journals-pdf-generator
diff --git a/CHANGES.md b/CHANGES.md
@@ -1,5 +1,26 @@
 # Changelog
 
+## 26.04
+
+  * add documentation on recovering from burn failures (#1971)
+
+  * fix the step-rejection logic for increase in X over a step in VODE
+    for SDC (#1968)
+
+  * make `species_failure_tolerance` a runtime parameter (#1969)
+
+  * update the JOSS paper (#1967)
+
+  * hybrid Powell solver updates: fix a NaN loop check (#1959),
+    template on the Jacobian type (#1958), fix comments (#1957), fix
+    the logic for refreshing the spectral radius (#1955)
+
+  * add an assert on the NSE table index (#1951)
+
+  * fix RKC compilation with NSE (#1956)
+
+  * CI action updates (#1942)
+
 ## 26.03
 
    * allow screening to output log(screening) (#1939)
diff --git a/Docs/source/burn_cell_sdc.rst b/Docs/source/burn_cell_sdc.rst
@@ -103,6 +103,8 @@ can be set via the usual Microphysics runtime parameters, e.g.
 ``integrator.atol_spec``.
 
 
+.. _sec:redo_burn_fail:
+
 Rerunning a burn fail
 ---------------------
 
diff --git a/Docs/source/burn_failures.rst b/Docs/source/burn_failures.rst
@@ -0,0 +1,162 @@
+**************************
+Dealing with Burn Failures
+**************************
+
+Sometimes the ODE integration of a reaction network will fail.  Here
+we summarize some wisdom on how to avoid integrator failures.
+
+Error codes
+===========
+
+The error code that is output when the integrator fails can be
+interpreted from the ``enum`` in ``integrator_data.H``.  See
+:ref:`sec:error_codes` for a list of the possible codes.
+
+The most common errors are ``-2`` (timestep underflow) and ``-4`` (too
+many steps required).  A timestep underflow usually means that
+something has gone really wrong in the integration and the integrator
+keeps trying to cut the timestep to be able to advance.  Too many
+steps means that the integrator hit the cap imposed by
+``integrator.ode_max_steps``.  This should never be made too
+large---usually if you need more than a few 1000 steps, then the some
+of the solutions discussed below can help.
+
+
+
+Why does the integrator struggle?
+=================================
+
+There are a few common reasons why the integrator might encounter trouble:
+
+* The integrators don't know that the mass fractions should stay in
+  $[0, 1]$.
+
+  Note: they should ensure that the mass fractions sum to 1 as long as
+  $\sum_i \dot{\omega}_k = 0$ in the righthand side function.
+
+  This is a place where adjusting ``integrator.atol_spec`` can help.
+
+* The state wants to enter nuclear statistical equilibrium.  As we
+  approach equilibrium, the integrator will see large, oppositely
+  signed flows from one step to the next, which should cancel, but
+  numerically, the cancellation is not perfect.
+
+  For some networks, the solution here is to use the NSE solver.
+
+* The Jacobian is not good enough to allow the nonlinear solver to
+  converge.
+
+  The Jacobians used in our networks are approximate (even the
+  analytic one).  For example, the analytic Jacobian neglects the
+  composition dependence in the screening functions.  In the analytic we
+  neglect the composition influence in screening.
+
+
+Making the integration robust
+=============================
+
+.. index:: integrator.atol_spec, integrator.species_failure_tolerance, integrator.use_jacobian_caching, integrator.do_corrector_validation, integrator.use_burn_retry, integrator.X_reject_buffer
+
+Some tips for helping the integrator:
+
+* Use a tight absolute tolerance for the species
+  (``integrator.atol_spec``).  This ensures that even small species
+  are tracked well, and helps prevent generating negative mass
+  fractions.
+
+  In general, ``atol_spec`` should be picked to be the magnitude of
+  the smallest mass fraction you care about.  You should never set
+  it to be larger that ``rtol_spec``.  See :ref:`sec:tolerances`
+  for some discussion.
+
+* Adjust the step-rejection logic in the integrator.  This is a
+  check on the state after the step that rejects the advance
+  if we find unphysical species.  Not every integrator supports
+  all of these options.
+
+  * Reduce ``integrator.species_failure_tolerance``.  This is used to
+    determine whether a step should be rejected.  If any of the mass
+    fractions are more than ``integrator.species_failure_tolerance``
+    outside of $[0, 1]$, then the integrator will reject the step and
+    try again (this is implemented in ``VODE`` and ``BackwardEuler``).
+
+    If the value is too large (like ``0.01``), then this could allow the
+    mass fractions to grow too much outside of $[0, 1]$, and then a
+    single step rejection is not enough to recover.  Setting this value to
+    be close to the typical scale of a mass fraction that we care about
+    (closer to ``integrator.atol_spec``) can help.
+
+  * Increase ``integrator.X_reject_buffer``.  This is used in the
+    check on how much the species changed from one step to the next
+    (inside of the integrator).  We limit the change to a factor of 4
+    per step.  We only check the species that are larger than
+    ``X_reject_buffer * atol_spec``, so if trace species are causing
+    problems, ``X_reject_buffer`` can be used to ignore them.
+
+* Try the numerical Jacobian (``integrator.jacobian=2``).  The
+  analytic Jacobian is faster to evaluate, but it approximates some
+  terms.  The numerical Jacobian sometimes works better.
+
+* Use the burn retry logic.  By setting
+  ``integrator.use_burn_retry=1``, the burn will immediately be
+  retried if it fails, with a slightly different configuration.
+
+  By default the type of Jacobian is swapped (if we were analytic,
+  then do numerical, and vice vera).  This is controlled by
+  ``integrator.retry_swap_jacobian``.  The tolerances can also be
+  changed on retry.
+
+  See :ref:`sec:retry` for the full list of options.
+
+* Use the right integrator.  In general, VODE is the best choice.
+  But if the network is only mildly stiff, then RKC can work well
+  (typically, it works when the temperatures are below $10^9~\mathrm{K}$.
+
+* If you are near NSE, then use the NSE solver.  This is described
+  in :ref:`self_consistent_nse`.
+
+  Note: not every network is compatible with the self-consistent
+  NSE solver.
+
+* Use Jacobian-caching.  If you build on GPUs, this is disabled by
+  default.  You can re-enable it by building with
+  ``USE_JACOBIAN_CACHING=TRUE``.  Also make sure that
+  ``integrator.use_jacobian_caching=1`` (this is the default).
+
+  By reducing the number of times the Jacobian is evaluated, we also
+  reduce the possibility of trying to evaluate it with a bad state.
+
+* Use the corrector validation (``integrator.do_corrector_validation``).
+
+  This checks to make sure the state is valid inside of the corrector
+  loop, and if not, it bails out of the corrector, forcing the
+  integrator to retry the entire step.
+
+  This can sometimes make the integration harder, especially if used with
+  ``integrator.species_failure_tolerance``.
+
+Things we no longer recommend:
+
+.. index:: integrator.do_species_clip, integrator.renormalize_abundances
+
+* Clipping the species (``integrator.do_species_clip``) can lead to
+  instabilities.  This changes the integration state directly in the
+  righthand side function, outside of the control of the integrator.
+  While it sometimes may work, it often leads to problems.  A better
+  way to deal with keeping the species in $[0, 1]$ is through the
+  absolute tolerance.
+
+  The `SUNDIALS CVODE documentation <https://sundials.readthedocs.io/en/latest/cvode/Usage/index.html#advice-on-controlling-unphysical-negative-values>`_ has
+  a good discussion on this numerical instability.
+
+* Renormalizing the species during the integration / righthand side call
+  (``integrator.renormalize_abundances``).  Like clipping, this can
+  cause numerical instabilities.
+
+Debugging a burn failure
+========================
+
+When a burn fails, the entire ``burn_t`` state will be output to
+stdout (CPU runs only).  This state can then be used with
+``burn_cell_sdc`` to reproduce the burn failure outside of a
+simulation.  For SDC, see the discussion at :ref:`sec:redo_burn_fail`.
diff --git a/Docs/source/index.rst b/Docs/source/index.rst
@@ -79,6 +79,7 @@ system.
    ode_integrators
    nse
    sdc
+   burn_failures
 
 .. toctree::
    :maxdepth: 1
diff --git a/Docs/source/ode_integrators.rst b/Docs/source/ode_integrators.rst
@@ -130,6 +130,8 @@ of equations.  Pivoting can be disabled by setting ``integrator.linalg_do_pivoti
 
 
 
+.. _sec:error_codes:
+
 Integration errors
 ==================
 
@@ -164,6 +166,8 @@ used to interpret the failure.  The current codes are:
 | -100  | entered NSE                                              |
 +-------+----------------------------------------------------------+
 
+.. _sec:tolerances:
+
 Tolerances
 ==========
 
@@ -173,6 +177,12 @@ is, the more accurate the results will be.  However, if the tolerance
 is too small, the code may run for too long, the ODE solver will
 never converge, or it might require at timestep that underflows.
 
+.. tip::
+
+   The `SUNDIALS CVODE documentation on tolerances
+   <https://sundials.readthedocs.io/en/latest/cvode/Usage/index.html#general-advice-on-choice-of-tolerances>`_
+   provides a good discussion on tolerances that apply here.
+
 .. index:: integrator.rtol_spec, integrator.rtol_enuc, integrator.atol_spec, integrator.atol_enuc
 
 There are separate tolerances for the mass fractions and the energy,
@@ -288,6 +298,7 @@ constraint on the intermediate states during the integration.
   ``integrator.do_species_clip`` is disabled.  Note: this is not
   implemented for every integrator.
 
+.. _sec:retry:
 
 Retry Mechanism
 ===============
diff --git a/integration/BackwardEuler/be_integrator.H b/integration/BackwardEuler/be_integrator.H
@@ -171,17 +171,14 @@ int single_step (BurnT& state, BeT& be, const amrex::Real dt)
     if (! converged) {
 
         if (ierr == IERR_SUCCESS) {
-
             // if we didn't set another error, then we probably ran
             // out of iterations, so set nonconvergence
-
             ierr = IERR_CORRECTOR_CONVERGENCE;
+        }
 
-            // reset the solution to the original
-            for (int n = 1; n <= int_neqs; n++) {
-                be.y(n) = y_old(n);
-            }
-
+        // reset the solution to the original on any failure
+        for (int n = 1; n <= int_neqs; n++) {
+            be.y(n) = y_old(n);
         }
 
     }
@@ -206,7 +203,9 @@ int be_integrator (BurnT& state, BeT& be)
         // Do a single step to the final time
         const amrex::Real dt_single_step = be.tout - be.t;
         ierr = single_step(state, be, dt_single_step);
-        be.t = be.tout;
+        if (ierr == IERR_SUCCESS) {
+            be.t = be.tout;
+        }
         ++be.n_step;
         return ierr;
     }
@@ -240,50 +239,55 @@ int be_integrator (BurnT& state, BeT& be)
         // our strategy is to take 2 steps at dt/2 and one at dt and
         // to compute the error from those
 
-
-        // try to take a step dt
-
         // first do 2 (fine) dt/2 steps
 
         amrex::Array1D<amrex::Real, 1, int_neqs> y_fine;
 
+        // keep track of whether we have a valid fine solution
+        // this means no errors from either half step
+        bool have_fine_solution = false;
+
         ierr = single_step(state, be, dt_sub/2);
         if (ierr == IERR_SUCCESS) {
             ierr = single_step(state, be, dt_sub/2);
 
-            // store the fine dt solution
-
-            for (int n = 1; n <= int_neqs; ++n) {
-                y_fine(n) = be.y(n);
-            }
+            if (ierr == IERR_SUCCESS) {
+                // store the fine dt solution
+                for (int n = 1; n <= int_neqs; ++n) {
+                    y_fine(n) = be.y(n);
+                }
+                have_fine_solution = true;
 
-            // now that single (coarse) dt step
-            // first reset the solution
-            for (int n = 1; n <= int_neqs; ++n) {
-                be.y(n) = y_old(n);
+                // now do a single (coarse) dt step
+                // first reset the solution
+                for (int n = 1; n <= int_neqs; ++n) {
+                    be.y(n) = y_old(n);
+                }
+                ierr = single_step(state, be, dt_sub);
             }
-            ierr = single_step(state, be, dt_sub);
         }
 
-        // define a weight for each variable to use in checking the error
+        amrex::Real rel_error = 0.0_rt;
+        bool step_success = false;
+        if (ierr == IERR_SUCCESS && have_fine_solution) {
+            // define a weight for each variable to use in checking the error
 
-        amrex::Array1D<amrex::Real, 1, int_neqs> w;
-        for (int n = 1; n <= NumSpec; n++) {
-            w(n) = 1.0_rt / (be.rtol_spec * std::abs(y_fine(n)) + be.atol_spec);
-        }
-        w(net_ienuc) = 1.0_rt / (be.rtol_enuc * std::abs(y_fine(net_ienuc)) + be.atol_enuc);
+            amrex::Array1D<amrex::Real, 1, int_neqs> w;
+            for (int n = 1; n <= NumSpec; n++) {
+                w(n) = 1.0_rt / (be.rtol_spec * std::abs(y_fine(n)) + be.atol_spec);
+            }
+            w(net_ienuc) = 1.0_rt / (be.rtol_enuc * std::abs(y_fine(net_ienuc)) + be.atol_enuc);
 
-        // now look for w |y_fine - y_coarse| < 1
+            // now look for w |y_fine - y_coarse| < 1
 
-        amrex::Real rel_error = 0.0_rt;
-        for (int n = 1; n <= NumSpec; n++) {
-            rel_error = amrex::max(rel_error, w(n) * std::abs(y_fine(n) - be.y(n)));
-        }
-        rel_error = amrex::max(rel_error, w(net_ienuc) * std::abs(y_fine(net_ienuc) - be.y(net_ienuc)));
+            for (int n = 1; n <= NumSpec; n++) {
+                rel_error = amrex::max(rel_error, w(n) * std::abs(y_fine(n) - be.y(n)));
+            }
+            rel_error = amrex::max(rel_error, w(net_ienuc) * std::abs(y_fine(net_ienuc) - be.y(net_ienuc)));
 
-        bool step_success = false;
-        if (rel_error < 1.0_rt) {
-            step_success = true;
+            if (rel_error < 1.0_rt) {
+                step_success = true;
+            }
         }
 
         if (ierr == IERR_SUCCESS && step_success) {
diff --git a/paper/paper.bib b/paper/paper.bib
@@ -1098,4 +1098,16 @@ @article{dnn_astro_2025
                   effectively mitigate stiffness constraints, offering
                   a scalable approach for high-fidelity modeling of
                   astrophysical nuclear reacting flows.}
-}
+}
+
+@article{singularity,
+ doi = {10.21105/joss.06805}, 
+ url = {https://doi.org/10.21105/joss.06805},
+ year = {2024}, publisher = {The Open Journal},
+ volume = {9},
+ number = {103},
+ pages = {6805},
+ author = {Miller, Jonah M. and Holladay, Daniel A. and Peterson, Jeffrey H. and Mauney, Christopher M. and Berger, Richard and Graham, Anna Pietarila and Tsai, Karen C. and Barker, Brandon and Holas, Alexander and Mattsson, Ann E. and Gogilashvili, Mariam and Dolence, Joshua C. and Meyer, Chad D. and Swaminarayan, Sriram and Junghans, Christoph},
+ title = {Singularity-EOS: Performance Portable Equations of State and Mixed Cell Closures},
+ journal = {Journal of Open Source Software} } 
+
diff --git a/paper/paper.md b/paper/paper.md