Skip to content

crl-release-25.2: db: fix TestCompactionCorruption timeout#5817

Open
xxmplus wants to merge 2 commits intocockroachdb:crl-release-25.2from
xxmplus:fix-timeout-5815
Open

crl-release-25.2: db: fix TestCompactionCorruption timeout#5817
xxmplus wants to merge 2 commits intocockroachdb:crl-release-25.2from
xxmplus:fix-timeout-5815

Conversation

@xxmplus
Copy link
Contributor

@xxmplus xxmplus commented Feb 26, 2026

Backports two fixes from master to resolve TestCompactionCorruption timeouts.

  • Backport f2a76e8 ("db: deflake TestCompactionCorruption")
  • Backport 4fa1e6a ("db: deflake TestCompactionCorruption")

Two prior fixes on this branch (2278eff, fb6a541) addressed problem
span expiration by advancing fake time in the wait loops, but the test
continued to fail because the root cause is workload starvation of
virtual rewrite compactions.

The workload goroutine continuously creates L0 files without pausing.
With L0CompactionThreshold=1, this triggers constant score-based L0
compactions that consume all available concurrency slots. Virtual
rewrite compactions — which materialize external files — are the
lowest priority (picked in pickAutoNonScore) and never get scheduled.
This causes the wait-for-no-external-files step to time out.

Commit 1: simplify test structure (backport of f2a76e8)

  • Reduce from 3 external files to 1 (only the missing file matters
    for the test scenario).
  • Scope the workload per-command via workload=(d,w) argument
    instead of global start-workload/stop-workload, narrowing the key
    range to reduce interference with external file compactions.
  • Remove start-workload and stop-workload case handlers.

Commit 2: pause workload during compactions (backport of 4fa1e6a)

  • Pause the workload when compactions are in progress
    (Metrics().Compact.NumInProgress > 0), so flushes don't outpace
    compactions and virtual rewrite compactions get scheduled.
  • Reduce value sizes from 1024-11264 bytes to ExpFloat64-based,
    since L0 compactions are triggered by file count
    (L0CompactionFileThreshold=5), not size.

Fixes #5815

xxmplus and others added 2 commits February 26, 2026 11:23
Cherry-pick of f2a76e8 from master.

Simplify the test structure:
- Reduce from 3 external files to 1 (only the missing one matters)
- Scope the workload per-command via workload=(d,w) argument instead
  of global start-workload/stop-workload, allowing narrower key ranges
  that don't interfere with external file ranges.

Informs cockroachdb#5815

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cherry-pick of 4fa1e6a from master.

The workload goroutine continuously creates L0 files without pausing.
With L0CompactionThreshold=1, this triggers constant score-based L0
compactions that consume all concurrency slots. Virtual rewrite
compactions (which materialize external files) are lowest priority
and never get scheduled, causing wait-for-no-external-files to time
out.

Fix by pausing the workload when compactions are in progress, so
flushes don't outpace compactions and virtual rewrite compactions get
a chance to run. Also reduce value sizes since L0 compactions are
triggered by file count (L0CompactionFileThreshold=5), not size.

Fixes cockroachdb#5815

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xxmplus xxmplus requested a review from a team as a code owner February 26, 2026 19:31
@xxmplus xxmplus requested a review from RaduBerinde February 26, 2026 19:31
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants