Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion .claude/agents/code-tester.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,18 @@ You are SpecForge, an elite Test Engineer specializing in the Lean Ethereum Cons

Generate rigorous, comprehensive unit tests and spec test fillers for the leanSpec repository. Your tests verify spec compliance and ensure cross-client interoperability across all modules.

## Auto-Invoke Skills

### Consensus Testing

When writing tests for consensus-related code, invoke the `/consensus-testing` skill first to load specialized multi-validator testing patterns.

**Triggers to invoke the skill:**
- Test file is in `tests/consensus/`
- Testing functions like `process_block`, `on_block`, `on_attestation`
- Code involves validators, attestations, or justification/finalization
- Fork choice or state transition scenarios with multiple validators

## Workflow (Follow This Order)

### 1. Explore First
Expand All @@ -20,7 +32,8 @@ Generate rigorous, comprehensive unit tests and spec test fillers for the leanSp
- Map out exception types and when they're raised

### 2. Check Existing Tests
- Search `tests/lean_spec/` for related test files
- Search `tests/lean_spec/` for related unit test files
- Search `tests/consensus/` for related spec test filler files
- Match the established style and naming conventions
- Avoid duplicating existing test coverage
- Identify gaps in current coverage
Expand All @@ -37,6 +50,7 @@ Generate rigorous, comprehensive unit tests and spec test fillers for the leanSp

### 5. Verify
- Run `uv run pytest <test_file>` to ensure tests pass
- Run `uv run fill --clean --fork=devnet <test_file>` to ensure test fillers pass
- Run `uv run ruff check <test_file>` for linting
- Run `uv run ruff format <test_file>` for formatting
- Fix any issues before presenting results
Expand Down
152 changes: 152 additions & 0 deletions .claude/skills/consensus-testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
name: consensus-testing
description: "Specialized patterns for testing consensus and fork choice code with multiple validators. Use when writing tests in tests/consensus/, or when testing functions involving validators, attestations, justification, or finalization."
---

# Consensus & Fork Choice Testing Patterns

Testing consensus logic requires understanding how validators interact. Single-validator tests miss critical dynamics.

## Multi-Validator Test Design

**Minimum validator counts by scenario:**
- Basic consensus: 4 validators (allows 1 byzantine, maintains 2/3 honest)
- Justification threshold: 8+ validators (clean 2/3 math)

**Always vary the validator set composition:**
- All validators honest and online
- Supermajority honest (exactly 2/3 + 1)
- At justification threshold (exactly 2/3)
- Below threshold (2/3 - 1, should fail to justify)
- Mixed online/offline validators

## Validator Relationship Scenarios

Test how validators interact, not just individual behavior:

**Attestation patterns:**
- All validators attest to same head (happy path)
- Validators split between two competing heads
- Staggered attestations across slots
- Late attestations arriving after new blocks
- Missing attestations from subset of validators

**Proposer/attester dynamics:**
- Proposer includes own attestation
- Proposer excludes valid attestations (censorship)
- Attestations reference proposer's parent (not proposer's block)
- Multiple blocks proposed for same slot (equivocation)

**Committee behavior:**
- Full committee participation
- Partial committee (threshold edge cases)
- Empty committee attestations
- Cross-committee attestation conflicts

## Fork Choice Scenarios

Fork choice tests must exercise competing chain heads:

**Branch competition:**
```
+-- B2a <- B3a (3 attestations)
genesis <- B1 -+
+-- B2b <- B3b (4 attestations) <- winner
```
- Test that head follows attestation weight
- Verify re-org when new attestations shift weight
- Check tie-breaking rules when weights equal

**Critical scenarios to cover:**
1. **Weight transitions**: Head changes as attestations arrive
2. **Deep re-orgs**: New branch overtakes after multiple slots
3. **Equivocation handling**: Same validator attests to conflicting heads
4. **Checkpoint boundaries**: Behavior at epoch transitions
5. **Finalization effects**: Finalized blocks cannot be re-orged

## Justification & Finalization

The 2/3 supermajority threshold is critical:

**Justification tests:**
- Exactly 2/3 participation -> should justify
- One less than 2/3 -> should NOT justify
- Validators with different effective balances (weighted voting)
- Justification with gaps (skip epochs)

**Finalization tests:**
- Two consecutive justified epochs -> finalization
- Justified but not finalized (gap in justification)
- Finalization with varying participation rates
- Cannot finalize without prior justification

## Timing & Ordering

Consensus is sensitive to when events occur:

**Test event orderings:**
- Attestation before vs after block arrival
- Multiple attestations in same slot vs spread across slots
- Block arrives late (after attestation deadline)
- Out-of-order block delivery (child before parent)

**Slot boundary behavior:**
- Actions at slot start vs slot end
- Crossing epoch boundaries
- Genesis slot special cases

## Spec Filler Patterns for Fork Choice

```python
def test_competing_branches(fork_choice_test: ForkChoiceTestFiller) -> None:
"""Fork choice selects branch with higher attestation weight."""
fork_choice_test(
anchor_state=genesis_state,
anchor_block=genesis_block,
steps=[
# Build competing branches
OnBlock(block=block_2a),
OnBlock(block=block_2b),
# Add attestations favoring branch b
OnAttestation(attestation=att_for_2b_validator_0),
OnAttestation(attestation=att_for_2b_validator_1),
OnAttestation(attestation=att_for_2a_validator_2),
# Verify head follows weight
Checks(head=block_2b.hash_tree_root()),
],
)
```

## State Transition with Multiple Validators

```python
def test_justification_threshold(state_transition_test: StateTransitionTestFiller) -> None:
"""State justifies checkpoint when 2/3 validators attest."""
# Create state with 8 validators
state = create_state_with_validators(count=8)

# Block with attestations from exactly 6/8 validators (75% > 2/3)
block = create_block_with_attestations(
state=state,
attesting_validators=[0, 1, 2, 3, 4, 5], # 6 of 8
)

state_transition_test(
pre=state,
blocks=[block],
post=StateExpectation(
current_justified_checkpoint=expected_checkpoint,
),
)
```

## Common Pitfalls

Avoid these testing mistakes:

1. **Single validator tests** - Miss consensus dynamics entirely
2. **Always-honest scenarios** - Never test byzantine behavior
3. **Ignoring weights** - Validators may have different balances
4. **Fixed ordering** - Real networks have non-deterministic message arrival
5. **Skipping threshold edges** - The 2/3 boundary is where bugs hide
6. **Testing implementation** - Test spec behavior, not internal state
19 changes: 10 additions & 9 deletions tests/consensus/devnet/fc/test_fork_choice_reorgs.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,6 @@ def test_three_block_deep_reorg(
Reorg Details:
- **Depth**: 3 blocks (deepest in this test suite)
- **Trigger**: Alternative fork becomes longer
- **Weight advantage**: 4 proposer attestations vs 3

Why This Matters
----------------
Expand All @@ -245,6 +244,7 @@ def test_three_block_deep_reorg(
about chain history, ensuring safety and liveness even in adversarial scenarios.
"""
fork_choice_test(
anchor_state=generate_pre_state(num_validators=6),
steps=[
# Common base
BlockStep(
Expand Down Expand Up @@ -656,13 +656,13 @@ def test_back_and_forth_reorg_oscillation(
tests fork choice correctness under extreme conditions.

Oscillation Pattern:
Slot 2: Fork A leads (1 block) ← head
Slot 3: Fork B catches up (1 block each) β†’ tie
Slot 4: Fork B extends (2 vs 1) ← head switches to B
Slot 5: Fork A extends (2 vs 2) β†’ tie
Slot 6: Fork A extends (3 vs 2) ← head switches to A
Slot 7: Fork B extends (3 vs 3) β†’ tie
Slot 8: Fork B extends (4 vs 3) ← head switches to B
Slot 2: Fork A leads (1 vs 0) ← head
Slot 2: Fork B created (1 vs 1) β†’ tie, A maintains
Slot 3: Fork B extends (2 vs 1) ← head switches to B (REORG #1)
Slot 3: Fork A extends (2 vs 2) β†’ tie, B maintains
Slot 4: Fork A extends (3 vs 2) ← head switches to A (REORG #2)
Slot 4: Fork B extends (3 vs 3) β†’ tie, A maintains
Slot 5: Fork B extends (4 vs 3) ← head switches to B (REORG #3)

Expected Behavior
-----------------
Expand All @@ -671,7 +671,7 @@ def test_back_and_forth_reorg_oscillation(
3. All reorgs are 1-2 blocks deep
4. Fork choice remains consistent and correct throughout

Reorg Count: 3 reorgs in 6 slots (very high rate)
Reorg Count: 3 reorgs in 4 slots (very high rate)

Why This Matters
----------------
Expand All @@ -694,6 +694,7 @@ def test_back_and_forth_reorg_oscillation(
convergence.
"""
fork_choice_test(
anchor_state=generate_pre_state(num_validators=6),
steps=[
# Common base
BlockStep(
Expand Down
Loading