Skip to content

Conversation

@smarter
Copy link
Contributor

@smarter smarter commented Nov 11, 2025

This includes using fixtures for ground truth generation and test configuration,
so that we can just do:

uv run pytest -sv tests/ekfac_tests

and ground truth will be auto-generated.

Additionally, this required adding a pass/fail threshold to tests/ekfac_tests/test_eigenvalue_correction.py.

I haven't tested test_apply_ekfac yet.

The keyword parameter `dtype` for AutoModelForCausalLM.from_pretrained does not
exist in version 4.54.1 which was present in uv.lock (this parameter used to be
called `torch_dtype` which is now a deprecated alias).
Also rework default handling to avoid specifying default values in multiple places.
…ection)

This way when we start using pytest, test failures will be properly reported.

test_eigenvalue_correction had no explicit criterion for success so I made one
up.
This includes using fixtures for ground truth generation and test configuration,
so that we can just do:

uv run pytest -sv tests/ekfac_tests

and ground truth will be auto-generated.
Ran "uv pre-commit run --all-files" which reads from .pre-commit-config.yaml

Unfortunately pre-commit does not respect tool settings in pyproject.toml, so
right now there's conflicting informations in pyproject.toml and
.pre-commit-config.yaml and so different settings and tool versions used
depending on how we run tools.
test_eigenvalue_corrections had to be disabled due to precision errors:

  h.6.attn.attention.out_proj: max_rel_diff=2.285%
  h.6.mlp.c_proj: max_rel_diff=3.599%
  h.7.attn.attention.out_proj: max_rel_diff=4.041%
  h.7.mlp.c_proj: max_rel_diff=2.204%
It seems the working-directory parameter in the CI config is ignored if
pyproject.toml configures pyright, so tweak that instead.
Overwriting is allowed using the --overwrite flag.
@smarter smarter force-pushed the ekfac-pytests branch 5 times, most recently from 2f2ff46 to 08b9e5e Compare December 10, 2025 19:10
Use loss.sum().backward() to avoid scaling the gradients by 1/B (and the
covariance matrix by 1/B^2).

Without this change, G2/G1 is empirically ~0.2 with the default set of
parameters.
This compares the KFAC approximation against the exact FIM computed on a toy
model. We intentionally restrict test conditions to avoid exercising issues with
padding and last token gradient which are fixed in the next commit.
When batching sequences of different lengths, we pad shorter sequences. These
padding positions aren't real data and shouldn't contribute to the FIM.
Similarly, the last position of each sequence has no next token to predict.

Invalid positions affected both covariances differently. The activation
covariance A was contaminated with out-of-distribution activations for padding.
The gradient covariance G was underestimated: gradients are zero for invalid
positions, but total_processed included them in the denominator. When
sample=True, there was a third issue: sampled labels didn't preserve -100 for
padding, so G was corrupted with non-zero gradients.

The fix computes valid_masks in pad_and_tensor() and uses it to filter
activations and restrict loss computation to valid positions.
CovarianceCollector was called without the target_modules parameter, causing it
to hook into all MLP layers instead of just the specified target modules.
LambdaCollector and the ground truth collectors already had this parameter set
correctly.
@LouisYRYJ LouisYRYJ merged commit 9f839a3 into EleutherAI:ekfac Jan 12, 2026
1 of 4 checks passed
smarter added a commit to smarter/bergson that referenced this pull request Jan 16, 2026
This is still missing FSDP support and test_apply_ekfac.py from
EleutherAI#68

Co-Authored-By: LouisYRYJ <[email protected]>
smarter added a commit to smarter/bergson that referenced this pull request Jan 16, 2026
This is still missing FSDP support and test_apply_ekfac.py from
EleutherAI#68

Co-Authored-By: LouisYRYJ <[email protected]>
smarter added a commit to smarter/bergson that referenced this pull request Jan 16, 2026
This is still missing FSDP support and test_apply_ekfac.py from
EleutherAI#68

Co-Authored-By: LouisYRYJ <[email protected]>
smarter added a commit to smarter/bergson that referenced this pull request Jan 16, 2026
This is still missing FSDP support and test_apply_ekfac.py from
EleutherAI#68

Co-Authored-By: LouisYRYJ <[email protected]>
smarter added a commit to smarter/bergson that referenced this pull request Jan 16, 2026
This is still missing FSDP support and test_apply_ekfac.py from
EleutherAI#68

Co-Authored-By: LouisYRYJ <[email protected]>
LouisYRYJ added a commit that referenced this pull request Jan 16, 2026
* Fix mask bug and add batch size invariance test wih toy model

The backward_hook was using g.reshape(-1, O) which includes padding
positions in the covariance computation. This causes incorrect results
when batches have different sequence lengths.

Before this commit, the added test failed with:
> FAILED tests/ekfac_tests/test_batch_size_invariance.py::test_trace_batch_invariant[seq_lengths1-20] - AssertionError: Scalars are not close!
>
> Expected 1.231401894309304 but got 0.8983965093439276.
> Absolute difference: 0.33300538496537635 (up to 1e-4 allowed)
> Relative difference: 0.27042786478102654 (up to 0.01 allowed)

* Fix use_dataset_labels condition and add FIM accuracy test

The condition `if not hessian_cfg.use_dataset_labels:` was inverted,
causing the empirical Fisher (with dataset labels) to use sampled
labels and vice versa.

Add test_fim_accuracy.py which verifies that KFAC approximates the
Fisher Information Matrix within tolerance for both empirical FIM
(dataset labels) and true FIM (sampled labels).

* Add ground truth ekfac tests

This is still missing FSDP support and test_apply_ekfac.py from
#68

Co-Authored-By: LouisYRYJ <[email protected]>
LouisYRYJ added a commit that referenced this pull request Jan 26, 2026
* ekfac implementation done (untested)

* remove unnecessary squeeze

* add tkfac

* fix claude issues

* shampoo

* minor fix

* Add EKFAC tests and fix a couple of bugs (#125)

* Fix mask bug and add batch size invariance test wih toy model

The backward_hook was using g.reshape(-1, O) which includes padding
positions in the covariance computation. This causes incorrect results
when batches have different sequence lengths.

Before this commit, the added test failed with:
> FAILED tests/ekfac_tests/test_batch_size_invariance.py::test_trace_batch_invariant[seq_lengths1-20] - AssertionError: Scalars are not close!
>
> Expected 1.231401894309304 but got 0.8983965093439276.
> Absolute difference: 0.33300538496537635 (up to 1e-4 allowed)
> Relative difference: 0.27042786478102654 (up to 0.01 allowed)

* Fix use_dataset_labels condition and add FIM accuracy test

The condition `if not hessian_cfg.use_dataset_labels:` was inverted,
causing the empirical Fisher (with dataset labels) to use sampled
labels and vice versa.

Add test_fim_accuracy.py which verifies that KFAC approximates the
Fisher Information Matrix within tolerance for both empirical FIM
(dataset labels) and true FIM (sampled labels).

* Add ground truth ekfac tests

This is still missing FSDP support and test_apply_ekfac.py from
#68

Co-Authored-By: LouisYRYJ <[email protected]>

* ekfac_tests/test_batch_size_invariance.py: Fix error thresholds when running on CPU

* Cleanup EKFAC tests

- Replace set_all_seeds by existing setup_reproducibility
- Reuse approximate_hessians instead of doing something
  equivalent manually.

* Add --token_batch_size option to EKFAC tests

* Add --n_samples option to EKFAC tests

Allow configuring the number of samples from pile-10k dataset via
pytest command line option instead of hardcoding 100. The dataset
directory is now named dynamically (e.g., pile_100_examples).

* hessians: Fix distributed support and test it

Restore the calls to dist.barrier that existed in
#13, without them the process would
hang when running with world_size > 1.

For testing, we add _allocate_batches_world to compute the batches for the
ground truth. The tests don't pass due to numerical errors, this is handled in
the next commit by changing our comparison logic.

* ekfac_tests: Use appropriate metrics for each comparison

- Eigenvectors: Check |cosine_similarity| ≈ 1 per column, which naturally
  handles sign ambiguity (eigenvectors are only defined up to sign)
- Covariances: Check relative Frobenius norm since values should match exactly
- Eigenvalue corrections: Align signs based on eigenvector orientation, then
  check relative error (λ[i,j] transforms as sign_G[i] * sign_A[j])
  - Also reenable CPU tests which pass after this change.

* ekfac_tests: Relax thresholds for distributed runs

With world_size > 1, floating-point reduction order differs between ground
truth (single process) and distributed run, causing larger numerical
differences in some layers.

For eigenvectors, use average |cos_sim| instead of minimum - this tolerates
occasional outlier eigenvectors while maintaining a stricter threshold
(1e-3 vs 0.1 that would be needed for min).

For eigenvalue corrections, use atol=0.2 when world_size > 1.

* adjust test + normalize shampoo and tkfac

* minor fixes, correct tensor handling in shampoo and tkfac, introduce apply_hessian (WIP)

---------

Co-authored-by: Guillaume Martres <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants