Skip to content

Commit 81d47c2

Browse files
authored
Merge branch 'development' into change_species_threshold_default
2 parents 381a5cd + df323de commit 81d47c2

File tree

9 files changed

+273
-47
lines changed

9 files changed

+273
-47
lines changed

.github/workflows/draft-pdf.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,5 +24,6 @@ jobs:
2424
# PDF. Note, this should be the same directory as the input
2525
# paper.md
2626
path: paper/paper.pdf
27+
archive: false
2728

2829
# see https://github.com/marketplace/actions/open-journals-pdf-generator

CHANGES.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,26 @@
11
# Changelog
22

3+
## 26.04
4+
5+
* add documentation on recovering from burn failures (#1971)
6+
7+
* fix the step-rejection logic for increase in X over a step in VODE
8+
for SDC (#1968)
9+
10+
* make `species_failure_tolerance` a runtime parameter (#1969)
11+
12+
* update the JOSS paper (#1967)
13+
14+
* hybrid Powell solver updates: fix a NaN loop check (#1959),
15+
template on the Jacobian type (#1958), fix comments (#1957), fix
16+
the logic for refreshing the spectral radius (#1955)
17+
18+
* add an assert on the NSE table index (#1951)
19+
20+
* fix RKC compilation with NSE (#1956)
21+
22+
* CI action updates (#1942)
23+
324
## 26.03
425

526
* allow screening to output log(screening) (#1939)

Docs/source/burn_cell_sdc.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,8 @@ can be set via the usual Microphysics runtime parameters, e.g.
103103
``integrator.atol_spec``.
104104

105105

106+
.. _sec:redo_burn_fail:
107+
106108
Rerunning a burn fail
107109
---------------------
108110

Docs/source/burn_failures.rst

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
**************************
2+
Dealing with Burn Failures
3+
**************************
4+
5+
Sometimes the ODE integration of a reaction network will fail. Here
6+
we summarize some wisdom on how to avoid integrator failures.
7+
8+
Error codes
9+
===========
10+
11+
The error code that is output when the integrator fails can be
12+
interpreted from the ``enum`` in ``integrator_data.H``. See
13+
:ref:`sec:error_codes` for a list of the possible codes.
14+
15+
The most common errors are ``-2`` (timestep underflow) and ``-4`` (too
16+
many steps required). A timestep underflow usually means that
17+
something has gone really wrong in the integration and the integrator
18+
keeps trying to cut the timestep to be able to advance. Too many
19+
steps means that the integrator hit the cap imposed by
20+
``integrator.ode_max_steps``. This should never be made too
21+
large---usually if you need more than a few 1000 steps, then the some
22+
of the solutions discussed below can help.
23+
24+
25+
26+
Why does the integrator struggle?
27+
=================================
28+
29+
There are a few common reasons why the integrator might encounter trouble:
30+
31+
* The integrators don't know that the mass fractions should stay in
32+
$[0, 1]$.
33+
34+
Note: they should ensure that the mass fractions sum to 1 as long as
35+
$\sum_i \dot{\omega}_k = 0$ in the righthand side function.
36+
37+
This is a place where adjusting ``integrator.atol_spec`` can help.
38+
39+
* The state wants to enter nuclear statistical equilibrium. As we
40+
approach equilibrium, the integrator will see large, oppositely
41+
signed flows from one step to the next, which should cancel, but
42+
numerically, the cancellation is not perfect.
43+
44+
For some networks, the solution here is to use the NSE solver.
45+
46+
* The Jacobian is not good enough to allow the nonlinear solver to
47+
converge.
48+
49+
The Jacobians used in our networks are approximate (even the
50+
analytic one). For example, the analytic Jacobian neglects the
51+
composition dependence in the screening functions. In the analytic we
52+
neglect the composition influence in screening.
53+
54+
55+
Making the integration robust
56+
=============================
57+
58+
.. index:: integrator.atol_spec, integrator.species_failure_tolerance, integrator.use_jacobian_caching, integrator.do_corrector_validation, integrator.use_burn_retry, integrator.X_reject_buffer
59+
60+
Some tips for helping the integrator:
61+
62+
* Use a tight absolute tolerance for the species
63+
(``integrator.atol_spec``). This ensures that even small species
64+
are tracked well, and helps prevent generating negative mass
65+
fractions.
66+
67+
In general, ``atol_spec`` should be picked to be the magnitude of
68+
the smallest mass fraction you care about. You should never set
69+
it to be larger that ``rtol_spec``. See :ref:`sec:tolerances`
70+
for some discussion.
71+
72+
* Adjust the step-rejection logic in the integrator. This is a
73+
check on the state after the step that rejects the advance
74+
if we find unphysical species. Not every integrator supports
75+
all of these options.
76+
77+
* Reduce ``integrator.species_failure_tolerance``. This is used to
78+
determine whether a step should be rejected. If any of the mass
79+
fractions are more than ``integrator.species_failure_tolerance``
80+
outside of $[0, 1]$, then the integrator will reject the step and
81+
try again (this is implemented in ``VODE`` and ``BackwardEuler``).
82+
83+
If the value is too large (like ``0.01``), then this could allow the
84+
mass fractions to grow too much outside of $[0, 1]$, and then a
85+
single step rejection is not enough to recover. Setting this value to
86+
be close to the typical scale of a mass fraction that we care about
87+
(closer to ``integrator.atol_spec``) can help.
88+
89+
* Increase ``integrator.X_reject_buffer``. This is used in the
90+
check on how much the species changed from one step to the next
91+
(inside of the integrator). We limit the change to a factor of 4
92+
per step. We only check the species that are larger than
93+
``X_reject_buffer * atol_spec``, so if trace species are causing
94+
problems, ``X_reject_buffer`` can be used to ignore them.
95+
96+
* Try the numerical Jacobian (``integrator.jacobian=2``). The
97+
analytic Jacobian is faster to evaluate, but it approximates some
98+
terms. The numerical Jacobian sometimes works better.
99+
100+
* Use the burn retry logic. By setting
101+
``integrator.use_burn_retry=1``, the burn will immediately be
102+
retried if it fails, with a slightly different configuration.
103+
104+
By default the type of Jacobian is swapped (if we were analytic,
105+
then do numerical, and vice vera). This is controlled by
106+
``integrator.retry_swap_jacobian``. The tolerances can also be
107+
changed on retry.
108+
109+
See :ref:`sec:retry` for the full list of options.
110+
111+
* Use the right integrator. In general, VODE is the best choice.
112+
But if the network is only mildly stiff, then RKC can work well
113+
(typically, it works when the temperatures are below $10^9~\mathrm{K}$.
114+
115+
* If you are near NSE, then use the NSE solver. This is described
116+
in :ref:`self_consistent_nse`.
117+
118+
Note: not every network is compatible with the self-consistent
119+
NSE solver.
120+
121+
* Use Jacobian-caching. If you build on GPUs, this is disabled by
122+
default. You can re-enable it by building with
123+
``USE_JACOBIAN_CACHING=TRUE``. Also make sure that
124+
``integrator.use_jacobian_caching=1`` (this is the default).
125+
126+
By reducing the number of times the Jacobian is evaluated, we also
127+
reduce the possibility of trying to evaluate it with a bad state.
128+
129+
* Use the corrector validation (``integrator.do_corrector_validation``).
130+
131+
This checks to make sure the state is valid inside of the corrector
132+
loop, and if not, it bails out of the corrector, forcing the
133+
integrator to retry the entire step.
134+
135+
This can sometimes make the integration harder, especially if used with
136+
``integrator.species_failure_tolerance``.
137+
138+
Things we no longer recommend:
139+
140+
.. index:: integrator.do_species_clip, integrator.renormalize_abundances
141+
142+
* Clipping the species (``integrator.do_species_clip``) can lead to
143+
instabilities. This changes the integration state directly in the
144+
righthand side function, outside of the control of the integrator.
145+
While it sometimes may work, it often leads to problems. A better
146+
way to deal with keeping the species in $[0, 1]$ is through the
147+
absolute tolerance.
148+
149+
The `SUNDIALS CVODE documentation <https://sundials.readthedocs.io/en/latest/cvode/Usage/index.html#advice-on-controlling-unphysical-negative-values>`_ has
150+
a good discussion on this numerical instability.
151+
152+
* Renormalizing the species during the integration / righthand side call
153+
(``integrator.renormalize_abundances``). Like clipping, this can
154+
cause numerical instabilities.
155+
156+
Debugging a burn failure
157+
========================
158+
159+
When a burn fails, the entire ``burn_t`` state will be output to
160+
stdout (CPU runs only). This state can then be used with
161+
``burn_cell_sdc`` to reproduce the burn failure outside of a
162+
simulation. For SDC, see the discussion at :ref:`sec:redo_burn_fail`.

Docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ system.
7979
ode_integrators
8080
nse
8181
sdc
82+
burn_failures
8283

8384
.. toctree::
8485
:maxdepth: 1

Docs/source/ode_integrators.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,8 @@ of equations. Pivoting can be disabled by setting ``integrator.linalg_do_pivoti
130130

131131

132132

133+
.. _sec:error_codes:
134+
133135
Integration errors
134136
==================
135137

@@ -164,6 +166,8 @@ used to interpret the failure. The current codes are:
164166
| -100 | entered NSE |
165167
+-------+----------------------------------------------------------+
166168

169+
.. _sec:tolerances:
170+
167171
Tolerances
168172
==========
169173

@@ -173,6 +177,12 @@ is, the more accurate the results will be. However, if the tolerance
173177
is too small, the code may run for too long, the ODE solver will
174178
never converge, or it might require at timestep that underflows.
175179

180+
.. tip::
181+
182+
The `SUNDIALS CVODE documentation on tolerances
183+
<https://sundials.readthedocs.io/en/latest/cvode/Usage/index.html#general-advice-on-choice-of-tolerances>`_
184+
provides a good discussion on tolerances that apply here.
185+
176186
.. index:: integrator.rtol_spec, integrator.rtol_enuc, integrator.atol_spec, integrator.atol_enuc
177187

178188
There are separate tolerances for the mass fractions and the energy,
@@ -288,6 +298,7 @@ constraint on the intermediate states during the integration.
288298
``integrator.do_species_clip`` is disabled. Note: this is not
289299
implemented for every integrator.
290300

301+
.. _sec:retry:
291302

292303
Retry Mechanism
293304
===============

integration/BackwardEuler/be_integrator.H

Lines changed: 40 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -171,17 +171,14 @@ int single_step (BurnT& state, BeT& be, const amrex::Real dt)
171171
if (! converged) {
172172

173173
if (ierr == IERR_SUCCESS) {
174-
175174
// if we didn't set another error, then we probably ran
176175
// out of iterations, so set nonconvergence
177-
178176
ierr = IERR_CORRECTOR_CONVERGENCE;
177+
}
179178

180-
// reset the solution to the original
181-
for (int n = 1; n <= int_neqs; n++) {
182-
be.y(n) = y_old(n);
183-
}
184-
179+
// reset the solution to the original on any failure
180+
for (int n = 1; n <= int_neqs; n++) {
181+
be.y(n) = y_old(n);
185182
}
186183

187184
}
@@ -206,7 +203,9 @@ int be_integrator (BurnT& state, BeT& be)
206203
// Do a single step to the final time
207204
const amrex::Real dt_single_step = be.tout - be.t;
208205
ierr = single_step(state, be, dt_single_step);
209-
be.t = be.tout;
206+
if (ierr == IERR_SUCCESS) {
207+
be.t = be.tout;
208+
}
210209
++be.n_step;
211210
return ierr;
212211
}
@@ -240,50 +239,55 @@ int be_integrator (BurnT& state, BeT& be)
240239
// our strategy is to take 2 steps at dt/2 and one at dt and
241240
// to compute the error from those
242241

243-
244-
// try to take a step dt
245-
246242
// first do 2 (fine) dt/2 steps
247243

248244
amrex::Array1D<amrex::Real, 1, int_neqs> y_fine;
249245

246+
// keep track of whether we have a valid fine solution
247+
// this means no errors from either half step
248+
bool have_fine_solution = false;
249+
250250
ierr = single_step(state, be, dt_sub/2);
251251
if (ierr == IERR_SUCCESS) {
252252
ierr = single_step(state, be, dt_sub/2);
253253

254-
// store the fine dt solution
255-
256-
for (int n = 1; n <= int_neqs; ++n) {
257-
y_fine(n) = be.y(n);
258-
}
254+
if (ierr == IERR_SUCCESS) {
255+
// store the fine dt solution
256+
for (int n = 1; n <= int_neqs; ++n) {
257+
y_fine(n) = be.y(n);
258+
}
259+
have_fine_solution = true;
259260

260-
// now that single (coarse) dt step
261-
// first reset the solution
262-
for (int n = 1; n <= int_neqs; ++n) {
263-
be.y(n) = y_old(n);
261+
// now do a single (coarse) dt step
262+
// first reset the solution
263+
for (int n = 1; n <= int_neqs; ++n) {
264+
be.y(n) = y_old(n);
265+
}
266+
ierr = single_step(state, be, dt_sub);
264267
}
265-
ierr = single_step(state, be, dt_sub);
266268
}
267269

268-
// define a weight for each variable to use in checking the error
270+
amrex::Real rel_error = 0.0_rt;
271+
bool step_success = false;
272+
if (ierr == IERR_SUCCESS && have_fine_solution) {
273+
// define a weight for each variable to use in checking the error
269274

270-
amrex::Array1D<amrex::Real, 1, int_neqs> w;
271-
for (int n = 1; n <= NumSpec; n++) {
272-
w(n) = 1.0_rt / (be.rtol_spec * std::abs(y_fine(n)) + be.atol_spec);
273-
}
274-
w(net_ienuc) = 1.0_rt / (be.rtol_enuc * std::abs(y_fine(net_ienuc)) + be.atol_enuc);
275+
amrex::Array1D<amrex::Real, 1, int_neqs> w;
276+
for (int n = 1; n <= NumSpec; n++) {
277+
w(n) = 1.0_rt / (be.rtol_spec * std::abs(y_fine(n)) + be.atol_spec);
278+
}
279+
w(net_ienuc) = 1.0_rt / (be.rtol_enuc * std::abs(y_fine(net_ienuc)) + be.atol_enuc);
275280

276-
// now look for w |y_fine - y_coarse| < 1
281+
// now look for w |y_fine - y_coarse| < 1
277282

278-
amrex::Real rel_error = 0.0_rt;
279-
for (int n = 1; n <= NumSpec; n++) {
280-
rel_error = amrex::max(rel_error, w(n) * std::abs(y_fine(n) - be.y(n)));
281-
}
282-
rel_error = amrex::max(rel_error, w(net_ienuc) * std::abs(y_fine(net_ienuc) - be.y(net_ienuc)));
283+
for (int n = 1; n <= NumSpec; n++) {
284+
rel_error = amrex::max(rel_error, w(n) * std::abs(y_fine(n) - be.y(n)));
285+
}
286+
rel_error = amrex::max(rel_error, w(net_ienuc) * std::abs(y_fine(net_ienuc) - be.y(net_ienuc)));
283287

284-
bool step_success = false;
285-
if (rel_error < 1.0_rt) {
286-
step_success = true;
288+
if (rel_error < 1.0_rt) {
289+
step_success = true;
290+
}
287291
}
288292

289293
if (ierr == IERR_SUCCESS && step_success) {

paper/paper.bib

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1098,4 +1098,16 @@ @article{dnn_astro_2025
10981098
effectively mitigate stiffness constraints, offering
10991099
a scalable approach for high-fidelity modeling of
11001100
astrophysical nuclear reacting flows.}
1101-
}
1101+
}
1102+
1103+
@article{singularity,
1104+
doi = {10.21105/joss.06805},
1105+
url = {https://doi.org/10.21105/joss.06805},
1106+
year = {2024}, publisher = {The Open Journal},
1107+
volume = {9},
1108+
number = {103},
1109+
pages = {6805},
1110+
author = {Miller, Jonah M. and Holladay, Daniel A. and Peterson, Jeffrey H. and Mauney, Christopher M. and Berger, Richard and Graham, Anna Pietarila and Tsai, Karen C. and Barker, Brandon and Holas, Alexander and Mattsson, Ann E. and Gogilashvili, Mariam and Dolence, Joshua C. and Meyer, Chad D. and Swaminarayan, Sriram and Junghans, Christoph},
1111+
title = {Singularity-EOS: Performance Portable Equations of State and Mixed Cell Closures},
1112+
journal = {Journal of Open Source Software} }
1113+

0 commit comments

Comments
 (0)