Convert EKFAC tests to pytest #68

smarter · 2025-11-11T02:51:27Z

This includes using fixtures for ground truth generation and test configuration,
so that we can just do:

uv run pytest -sv tests/ekfac_tests

and ground truth will be auto-generated.

Additionally, this required adding a pass/fail threshold to tests/ekfac_tests/test_eigenvalue_correction.py.

I haven't tested test_apply_ekfac yet.

The keyword parameter `dtype` for AutoModelForCausalLM.from_pretrained does not exist in version 4.54.1 which was present in uv.lock (this parameter used to be called `torch_dtype` which is now a deprecated alias).

Also rework default handling to avoid specifying default values in multiple places.

…ection) This way when we start using pytest, test failures will be properly reported. test_eigenvalue_correction had no explicit criterion for success so I made one up.

tests/ekfac_tests/run_test_compute_ekfac.sh

This includes using fixtures for ground truth generation and test configuration, so that we can just do: uv run pytest -sv tests/ekfac_tests and ground truth will be auto-generated.

Ran "uv pre-commit run --all-files" which reads from .pre-commit-config.yaml Unfortunately pre-commit does not respect tool settings in pyproject.toml, so right now there's conflicting informations in pyproject.toml and .pre-commit-config.yaml and so different settings and tool versions used depending on how we run tools.

test_eigenvalue_corrections had to be disabled due to precision errors: h.6.attn.attention.out_proj: max_rel_diff=2.285% h.6.mlp.c_proj: max_rel_diff=3.599% h.7.attn.attention.out_proj: max_rel_diff=4.041% h.7.mlp.c_proj: max_rel_diff=2.204%

It seems the working-directory parameter in the CI config is ignored if pyproject.toml configures pyright, so tweak that instead.

Overwriting is allowed using the --overwrite flag.

Use loss.sum().backward() to avoid scaling the gradients by 1/B (and the covariance matrix by 1/B^2). Without this change, G2/G1 is empirically ~0.2 with the default set of parameters.

This compares the KFAC approximation against the exact FIM computed on a toy model. We intentionally restrict test conditions to avoid exercising issues with padding and last token gradient which are fixed in the next commit.

When batching sequences of different lengths, we pad shorter sequences. These padding positions aren't real data and shouldn't contribute to the FIM. Similarly, the last position of each sequence has no next token to predict. Invalid positions affected both covariances differently. The activation covariance A was contaminated with out-of-distribution activations for padding. The gradient covariance G was underestimated: gradients are zero for invalid positions, but total_processed included them in the denominator. When sample=True, there was a third issue: sampled labels didn't preserve -100 for padding, so G was corrupted with non-zero gradients. The fix computes valid_masks in pad_and_tensor() and uses it to filter activations and restrict loss computation to valid positions.

CovarianceCollector was called without the target_modules parameter, causing it to hook into all MLP layers instead of just the specified target modules. LambdaCollector and the ground truth collectors already had this parameter set correctly.

This is still missing FSDP support and test_apply_ekfac.py from EleutherAI#68 Co-Authored-By: LouisYRYJ <[email protected]>

* Fix mask bug and add batch size invariance test wih toy model The backward_hook was using g.reshape(-1, O) which includes padding positions in the covariance computation. This causes incorrect results when batches have different sequence lengths. Before this commit, the added test failed with: > FAILED tests/ekfac_tests/test_batch_size_invariance.py::test_trace_batch_invariant[seq_lengths1-20] - AssertionError: Scalars are not close! > > Expected 1.231401894309304 but got 0.8983965093439276. > Absolute difference: 0.33300538496537635 (up to 1e-4 allowed) > Relative difference: 0.27042786478102654 (up to 0.01 allowed) * Fix use_dataset_labels condition and add FIM accuracy test The condition `if not hessian_cfg.use_dataset_labels:` was inverted, causing the empirical Fisher (with dataset labels) to use sampled labels and vice versa. Add test_fim_accuracy.py which verifies that KFAC approximates the Fisher Information Matrix within tolerance for both empirical FIM (dataset labels) and true FIM (sampled labels). * Add ground truth ekfac tests This is still missing FSDP support and test_apply_ekfac.py from #68 Co-Authored-By: LouisYRYJ <[email protected]>

* ekfac implementation done (untested) * remove unnecessary squeeze * add tkfac * fix claude issues * shampoo * minor fix * Add EKFAC tests and fix a couple of bugs (#125) * Fix mask bug and add batch size invariance test wih toy model The backward_hook was using g.reshape(-1, O) which includes padding positions in the covariance computation. This causes incorrect results when batches have different sequence lengths. Before this commit, the added test failed with: > FAILED tests/ekfac_tests/test_batch_size_invariance.py::test_trace_batch_invariant[seq_lengths1-20] - AssertionError: Scalars are not close! > > Expected 1.231401894309304 but got 0.8983965093439276. > Absolute difference: 0.33300538496537635 (up to 1e-4 allowed) > Relative difference: 0.27042786478102654 (up to 0.01 allowed) * Fix use_dataset_labels condition and add FIM accuracy test The condition `if not hessian_cfg.use_dataset_labels:` was inverted, causing the empirical Fisher (with dataset labels) to use sampled labels and vice versa. Add test_fim_accuracy.py which verifies that KFAC approximates the Fisher Information Matrix within tolerance for both empirical FIM (dataset labels) and true FIM (sampled labels). * Add ground truth ekfac tests This is still missing FSDP support and test_apply_ekfac.py from #68 Co-Authored-By: LouisYRYJ <[email protected]> * ekfac_tests/test_batch_size_invariance.py: Fix error thresholds when running on CPU * Cleanup EKFAC tests - Replace set_all_seeds by existing setup_reproducibility - Reuse approximate_hessians instead of doing something equivalent manually. * Add --token_batch_size option to EKFAC tests * Add --n_samples option to EKFAC tests Allow configuring the number of samples from pile-10k dataset via pytest command line option instead of hardcoding 100. The dataset directory is now named dynamically (e.g., pile_100_examples). * hessians: Fix distributed support and test it Restore the calls to dist.barrier that existed in #13, without them the process would hang when running with world_size > 1. For testing, we add _allocate_batches_world to compute the batches for the ground truth. The tests don't pass due to numerical errors, this is handled in the next commit by changing our comparison logic. * ekfac_tests: Use appropriate metrics for each comparison - Eigenvectors: Check |cosine_similarity| ≈ 1 per column, which naturally handles sign ambiguity (eigenvectors are only defined up to sign) - Covariances: Check relative Frobenius norm since values should match exactly - Eigenvalue corrections: Align signs based on eigenvector orientation, then check relative error (λ[i,j] transforms as sign_G[i] * sign_A[j]) - Also reenable CPU tests which pass after this change. * ekfac_tests: Relax thresholds for distributed runs With world_size > 1, floating-point reduction order differs between ground truth (single process) and distributed run, causing larger numerical differences in some layers. For eigenvectors, use average |cos_sim| instead of minimum - this tolerates occasional outlier eigenvectors while maintaining a stricter threshold (1e-3 vs 0.1 that would be needed for min). For eigenvalue corrections, use atol=0.2 when world_size > 1. * adjust test + normalize shampoo and tkfac * minor fixes, correct tensor handling in shampoo and tkfac, introduce apply_hessian (WIP) --------- Co-authored-by: Guillaume Martres <[email protected]>

smarter added 4 commits November 10, 2025 19:55

pyproject.toml: Bump minimal transformers version

dbd5f89

The keyword parameter `dtype` for AutoModelForCausalLM.from_pretrained does not exist in version 4.54.1 which was present in uv.lock (this parameter used to be called `torch_dtype` which is now a deprecated alias).

compute_ekfac_ground_truth: Add model-name and world-size params

9cdb050

Also rework default handling to avoid specifying default values in multiple places.

compute_ekfac_ground_truth: adjust token_batch_size based on model

0384dfe

Add assertions to EKFAC tests (new criterion for test_eigenvalue_corr…

0f5f4a3

…ection) This way when we start using pytest, test failures will be properly reported. test_eigenvalue_correction had no explicit criterion for success so I made one up.

smarter force-pushed the ekfac-pytests branch from df1ef3d to 6928233 Compare November 11, 2025 16:31

LouisYRYJ reviewed Nov 14, 2025

View reviewed changes

tests/ekfac_tests/run_test_compute_ekfac.sh Outdated Show resolved Hide resolved

smarter added 4 commits November 23, 2025 18:16

Convert EKFAC tests to pytest

4285682

This includes using fixtures for ground truth generation and test configuration, so that we can just do: uv run pytest -sv tests/ekfac_tests and ground truth will be auto-generated.

Fix pyright inclusions/exclusions to make the CI green

b6b51ae

It seems the working-directory parameter in the CI config is ignored if pyproject.toml configures pyright, so tweak that instead.

smarter force-pushed the ekfac-pytests branch from c9a5aff to c61a2e1 Compare November 23, 2025 17:16

Emit an error if we're overwriting ground truth with different params

a764d82

Overwriting is allowed using the --overwrite flag.

smarter force-pushed the ekfac-pytests branch 5 times, most recently from 2f2ff46 to 08b9e5e Compare December 10, 2025 19:10

smarter force-pushed the ekfac-pytests branch from 1ade090 to 8ef1e38 Compare December 24, 2025 16:37

smarter added 5 commits December 24, 2025 17:58

Make gradient computation invariant to batch size

9d9d5f5

Use loss.sum().backward() to avoid scaling the gradients by 1/B (and the covariance matrix by 1/B^2). Without this change, G2/G1 is empirically ~0.2 with the default set of parameters.

Make sure all ekfac_tests are run with deterministic seeds

c0637fb

Add KFAC FIM accuracy test with toy model

1e9696e

This compares the KFAC approximation against the exact FIM computed on a toy model. We intentionally restrict test conditions to avoid exercising issues with padding and last token gradient which are fixed in the next commit.

Pass target_modules to CovarianceCollector

d3f3e01

CovarianceCollector was called without the target_modules parameter, causing it to hook into all MLP layers instead of just the specified target modules. LambdaCollector and the ground truth collectors already had this parameter set correctly.

smarter force-pushed the ekfac-pytests branch from 8ef1e38 to d3f3e01 Compare December 24, 2025 16:58

LouisYRYJ merged commit 9f839a3 into EleutherAI:ekfac Jan 12, 2026
1 of 4 checks passed

smarter added a commit to smarter/bergson that referenced this pull request Jan 16, 2026

Add ground truth ekfac tests

562f5e5

This is still missing FSDP support and test_apply_ekfac.py from EleutherAI#68 Co-Authored-By: LouisYRYJ <[email protected]>

smarter mentioned this pull request Jan 16, 2026

Add EKFAC tests and fix a couple of bugs #125

Merged

smarter added a commit to smarter/bergson that referenced this pull request Jan 16, 2026

Add ground truth ekfac tests

cc1a4c1

This is still missing FSDP support and test_apply_ekfac.py from EleutherAI#68 Co-Authored-By: LouisYRYJ <[email protected]>

smarter added a commit to smarter/bergson that referenced this pull request Jan 16, 2026

Add ground truth ekfac tests

1b7bfca

This is still missing FSDP support and test_apply_ekfac.py from EleutherAI#68 Co-Authored-By: LouisYRYJ <[email protected]>

smarter added a commit to smarter/bergson that referenced this pull request Jan 16, 2026

Add ground truth ekfac tests

8c170a5

This is still missing FSDP support and test_apply_ekfac.py from EleutherAI#68 Co-Authored-By: LouisYRYJ <[email protected]>

smarter added a commit to smarter/bergson that referenced this pull request Jan 16, 2026

Add ground truth ekfac tests

3bfbf5c

This is still missing FSDP support and test_apply_ekfac.py from EleutherAI#68 Co-Authored-By: LouisYRYJ <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert EKFAC tests to pytest #68

Convert EKFAC tests to pytest #68

Uh oh!

smarter commented Nov 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Convert EKFAC tests to pytest #68

Convert EKFAC tests to pytest #68

Uh oh!

Conversation

smarter commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smarter commented Nov 11, 2025 •

edited

Loading