JuliaDiff · ChrisRackauckas · Aug 13, 2025 · Aug 13, 2025 · Aug 13, 2025 · Aug 13, 2025
diff --git a/docs/make.jl b/docs/make.jl
@@ -23,6 +23,7 @@ open(joinpath(@__DIR__, "src", "index.md"), "w") do io
     for line in eachline(joinpath(dirname(@__DIR__), "README.md"))
         println(io, line)
     end
+
     for line in eachline(joinpath(@__DIR__, "src", "reproducibility.md"))
         println(io, line)
     end

diff --git a/docs/pages.jl b/docs/pages.jl
@@ -1,3 +1,13 @@
 # Put in a separate page so it can be used by SciMLDocs.jl
 
-pages = ["Home" => "index.md", "tutorials.md", "api.md"]
+pages = [
+    "Home" => "index.md",
+    "Tutorials" => "tutorials.md",
+    "Derivatives" => "derivatives.md",
+    "Gradients" => "gradients.md", 
+    "Jacobians" => "jacobians.md",
+    "Hessians" => "hessians.md",
+    "Jacobian-Vector Products" => "jvp.md",
+    "Step Size Selection" => "epsilons.md",
+    "Internal Utilities" => "utilities.md"
+]
diff --git a/docs/src/api.md b/docs/src/api.md
diff --git a/docs/src/derivatives.md b/docs/src/derivatives.md
@@ -0,0 +1,26 @@
+# Derivatives
+
+Functions for computing derivatives of scalar-valued functions.
+
+## Overview
+
+Derivatives are computed for scalar→scalar maps `f(x)` where `x` can be a single point or a collection of points. The derivative functions support:
+
+- **Forward differences**: `O(1)` function evaluation per point, `O(h)` accuracy
+- **Central differences**: `O(2)` function evaluations per point, `O(h²)` accuracy  
+- **Complex step**: `O(1)` function evaluation per point, machine precision accuracy
+
+For optimal performance with repeated computations, use the cached versions with `DerivativeCache`.
+
+## Functions
+
+```@docs
+FiniteDiff.finite_difference_derivative
+FiniteDiff.finite_difference_derivative!
+```
+
+## Cache
+
+```@docs
+FiniteDiff.DerivativeCache
+```
diff --git a/docs/src/epsilons.md b/docs/src/epsilons.md
@@ -0,0 +1,71 @@
+# Step Size Selection (Epsilons)
+
+Functions and theory for computing optimal step sizes in finite difference approximations.
+
+## Theory
+
+The choice of step size (epsilon) in finite difference methods is critical for accuracy. Too large a step leads to truncation error, while too small a step leads to round-off error. The optimal step size balances these two sources of error.
+
+### Error Analysis
+
+For a function `f` with bounded derivatives, the total error in finite difference approximations consists of:
+
+1. **Truncation Error**: Comes from the finite difference approximation itself
+   - Forward differences: `O(h)` where `h` is the step size
+   - Central differences: `O(h²)`
+   - Hessian central differences: `O(h²)` for second derivatives
+
+2. **Round-off Error**: Comes from floating-point arithmetic
+   - Forward differences: `O(eps/h)` where `eps` is machine epsilon
+   - Central differences: `O(eps/h)`
+
+### Optimal Step Sizes
+
+Minimizing the total error `truncation + round-off` gives optimal step sizes:
+
+- **Forward differences**: `h* = sqrt(eps)` - balances `O(h)` truncation with `O(eps/h)` round-off
+- **Central differences**: `h* = eps^(1/3)` - balances `O(h²)` truncation with `O(eps/h)` round-off
+- **Hessian central**: `h* = eps^(1/4)` - balances `O(h²)` truncation for mixed derivatives
+- **Complex step**: `h* = eps` - no subtractive cancellation, only limited by machine precision
+
+## Adaptive Step Sizing
+
+The step size computation uses both relative and absolute components:
+
+```julia
+epsilon = max(relstep * abs(x), absstep) * dir
+```
+
+This ensures:
+- **Large values**: Use relative step `relstep * |x|` for scale-invariant accuracy
+- **Small values**: Use absolute step `absstep` to avoid underflow
+- **Direction**: Multiply by `dir` (±1) for forward differences
+
+## Implementation
+
+The step size computation is handled by internal functions:
+
+- **`compute_epsilon(fdtype, x, relstep, absstep, dir)`**: Computes the actual step size for a given finite difference method and input value
+- **`default_relstep(fdtype, T)`**: Returns the optimal relative step size for a given method and numeric type
+
+These functions are called automatically by all finite difference routines, but understanding their behavior can help with custom implementations or debugging numerical issues.
+
+## Special Cases
+
+### Complex Step Differentiation
+
+For complex step differentiation, the step size is simply machine epsilon since this method avoids subtractive cancellation entirely:
+
+⚠️ **Important**: The function `f` must be complex analytic when the input is complex!
+
+### Sparse Jacobians
+
+When computing sparse Jacobians with graph coloring, the step size is computed based on the norm of the perturbation vector to ensure balanced accuracy across all columns in the same color group.
+
+## Practical Considerations
+
+- **Default step sizes** are optimal for most smooth functions
+- **Custom step sizes** may be needed for functions with unusual scaling or near-discontinuities
+- **Relative steps** should scale with the magnitude of the input
+- **Absolute steps** provide a fallback for inputs near zero
+- **Direction parameter** allows for one-sided differences when needed (e.g., at domain boundaries)
diff --git a/docs/src/gradients.md b/docs/src/gradients.md
@@ -0,0 +1,38 @@
+# Gradients
+
+Functions for computing gradients of scalar-valued functions with respect to vector inputs.
+
+## Function Types
+
+Gradients support two types of function mappings:
+
+- **Vector→scalar**: `f(x)` where `x` is a vector and `f` returns a scalar
+- **Scalar→vector**: `f(fx, x)` for in-place evaluation or `fx = f(x)` for out-of-place
+
+## Performance Notes
+
+- **Forward differences**: `O(n)` function evaluations, `O(h)` accuracy
+- **Central differences**: `O(2n)` function evaluations, `O(h²)` accuracy
+- **Complex step**: `O(n)` function evaluations, machine precision accuracy
+
+## Cache Management
+
+When using `GradientCache` with pre-computed function values:
+
+- If you provide `fx`, then `fx` will be used in forward differencing to skip a function call
+- You must update `cache.fx` before each call to `finite_difference_gradient!`
+- For immutable types (scalars, `StaticArray`), use `@set` from [Setfield.jl](https://github.com/jw3126/Setfield.jl)
+- Consider aliasing existing arrays into the cache for memory efficiency
+
+## Functions
+
+```@docs
+FiniteDiff.finite_difference_gradient
+FiniteDiff.finite_difference_gradient!
+```
+
+## Cache
+
+```@docs
+FiniteDiff.GradientCache
+```
diff --git a/docs/src/hessians.md b/docs/src/hessians.md
@@ -0,0 +1,45 @@
+# Hessians
+
+Functions for computing Hessian matrices of scalar-valued functions.
+
+## Function Requirements
+
+Hessian functions are designed for scalar-valued functions `f(x)` where:
+
+- `x` is a vector of parameters
+- `f(x)` returns a scalar value
+- The Hessian `H[i,j] = ∂²f/(∂x[i]∂x[j])` is automatically symmetrized
+
+## Mathematical Background
+
+For a scalar function `f: ℝⁿ → ℝ`, the Hessian central difference approximation is:
+
+```
+H[i,j] ≈ (f(x + eᵢhᵢ + eⱼhⱼ) - f(x + eᵢhᵢ - eⱼhⱼ) - f(x - eᵢhᵢ + eⱼhⱼ) + f(x - eᵢhᵢ - eⱼhⱼ)) / (4hᵢhⱼ)
+```
+
+where `eᵢ` is the i-th unit vector and `hᵢ` is the step size in dimension i.
+
+## Performance Considerations
+
+- **Complexity**: Requires `O(n²)` function evaluations for an n-dimensional input
+- **Accuracy**: Central differences provide `O(h²)` accuracy for second derivatives  
+- **Memory**: The result is returned as a `Symmetric` matrix view
+- **Alternative**: For large problems, consider computing the gradient twice instead
+
+## StaticArrays Support
+
+The cache constructor automatically detects `StaticArray` types and adjusts the `inplace` parameter accordingly for optimal performance.
+
+## Functions
+
+```@docs
+FiniteDiff.finite_difference_hessian
+FiniteDiff.finite_difference_hessian!
+```
+
+## Cache
+
+```@docs
+FiniteDiff.HessianCache
+```
diff --git a/docs/src/jacobians.md b/docs/src/jacobians.md
@@ -0,0 +1,41 @@
+# Jacobians
+
+Functions for computing Jacobian matrices of vector-valued functions.
+
+## Function Types
+
+Jacobians support the following function signatures:
+
+- **Out-of-place**: `fx = f(x)` where both `x` and `fx` are vectors
+- **In-place**: `f!(fx, x)` where `f!` modifies `fx` in-place
+
+## Sparse Jacobians
+
+FiniteDiff.jl provides efficient sparse Jacobian computation using graph coloring:
+
+- Pass a `colorvec` of matrix colors to enable column compression
+- Provide `sparsity` as a sparse or structured matrix (`Tridiagonal`, `Banded`, etc.)
+- Supports automatic sparsity pattern detection via ArrayInterfaceCore.jl
+- Results are automatically decompressed unless `sparsity=nothing`
+
+## Performance Notes
+
+- **Forward differences**: `O(n)` function evaluations, `O(h)` accuracy  
+- **Central differences**: `O(2n)` function evaluations, `O(h²)` accuracy
+- **Complex step**: `O(n)` function evaluations, machine precision accuracy
+- **Sparse Jacobians**: Use graph coloring to reduce function evaluations significantly
+
+For non-square Jacobians, specify the output vector `fx` when creating the cache to ensure proper sizing.
+
+## Functions
+
+```@docs
+FiniteDiff.finite_difference_jacobian
+FiniteDiff.finite_difference_jacobian!
+```
+
+## Cache
+
+```@docs
+FiniteDiff.JacobianCache
+```
diff --git a/docs/src/jvp.md b/docs/src/jvp.md
@@ -0,0 +1,48 @@
+# Jacobian-Vector Products (JVP)
+
+Functions for computing Jacobian-vector products efficiently without forming the full Jacobian matrix.
+
+## Mathematical Background
+
+The JVP computes `J(x) * v` where `J(x)` is the Jacobian of function `f` at point `x` and `v` is a direction vector. This is computed using finite difference approximations:
+
+- **Forward**: `J(x) * v ≈ (f(x + h*v) - f(x)) / h`  
+- **Central**: `J(x) * v ≈ (f(x + h*v) - f(x - h*v)) / (2h)`
+
+where `h` is the step size and `v` is the direction vector.
+
+## Performance Benefits
+
+JVP functions are particularly efficient when you only need directional derivatives:
+
+- **Function evaluations**: Only 2 function evaluations (vs `O(n)` for full Jacobian)
+- **Forward differences**: 2 function evaluations, `O(h)` accuracy
+- **Central differences**: 2 function evaluations, `O(h²)` accuracy  
+- **Memory efficient**: No need to store the full Jacobian matrix
+
+## Use Cases
+
+JVP is particularly useful for:
+
+- **Optimization**: Computing directional derivatives along search directions
+- **Sparse directions**: When `v` has few non-zero entries
+- **Memory constraints**: Avoiding storage of large Jacobian matrices
+- **Newton methods**: Computing Newton steps `J⁻¹ * v` iteratively
+
+## Limitations
+
+- **Complex step**: JVP does not currently support complex step differentiation (`Val(:complex)`)
+- **In-place functions**: For in-place function evaluation, ensure proper cache sizing
+
+## Functions
+
+```@docs
+FiniteDiff.finite_difference_jvp
+FiniteDiff.finite_difference_jvp!
+```
+
+## Cache
+
+```@docs
+FiniteDiff.JVPCache
+```