Skip to content

Retry failing tests up to N times using the #[Retry] attribute#6742

Draft
sebastianbergmann wants to merge 12 commits into
issue-5718/repeatfrom
feature/retry
Draft

Retry failing tests up to N times using the #[Retry] attribute#6742
sebastianbergmann wants to merge 12 commits into
issue-5718/repeatfrom
feature/retry

Conversation

@sebastianbergmann

Copy link
Copy Markdown
Owner

Summary

This pull request introduces a #[Retry(int $maxAttempts)] attribute. A test method annotated with it is run up to $maxAttempts times; the first attempt whose status is neither failure nor error decides the test's result. A test that passes on a retry does not fail the test run, but it is always listed in the test result summary so that its flakiness remains visible.

This builds on #6591, which has not been merged yet

This branch is based on the issue-5718/repeat branch of #6591 (repeated test execution using --repeat and #[Repeat]), which has not been merged yet. This pull request therefore targets the issue-5718/repeat branch, not main, so that its diff shows only the changes belonging to this pull request.

Once #6591 is merged, this branch will be rebased onto main and this pull request will be retargeted to main. It should not be merged before that has happened.

The dependency is not incidental as the retry feature consumes infrastructure that #6591 introduced:

The two features are intentionally separate pull requests because they have opposite semantics: #[Repeat] runs a test N times and stops at failure: failure is the signal, the feature exists to detect flakiness. #[Retry] runs a test up to N times and stops at success: failure is noise, the feature exists to tolerate flakiness where it cannot reasonably be fixed. #6591 explicitly scopes retry-style semantics out; this pull request is that follow-up.

Use Cases

Tolerating known-flaky tests

Some tests interact with inherently unreliable systems: network services, hardware timing, processes whose scheduling the test cannot control. Such a test failing once is not information; it failing consistently is. #[Retry] lets the test express, in code where it can be reviewed, that a bounded number of additional attempts is acceptable before the failure counts.

Replacing CI-level retry loops

Re-running entire CI jobs to get past one flaky test is expensive and hides which test was flaky. Per-test retries inside a single PHPUnit invocation are cheaper and produce a per-test record of every tolerated failure.

Prior work

Community demand

Retry functionality was requested in the discussion of #6591, referencing the bshaffer/phpunit-retry-annotations package, which provided @retryAttempts annotations for PHPUnit 9 and earlier by overriding test execution in a base test case class.

Why this needs to be in PHPUnit itself

That extension approach is no longer possible: since PHPUnit 10, TestCase::run() and TestCase::runBare() are final, and the event system is deliberately read-only: subscribers can observe test execution but cannot re-run tests. If retry functionality is to exist for PHPUnit 10+, it can only live in PHPUnit itself.

Keeping flakiness visible

The standard objection to built-in retries is that they institutionalize flaky tests: a race condition that fails 30% of the time passes CI forever and is never fixed. This implementation answers that objection with unconditional visibility: every test that passed only after retrying is listed in the test result summary of every run, with the number of failed attempts. This cannot be disabled.

There was 1 retried test:

1) ExampleTest::testOne
2 failed attempts

OK (1 test, 1 assertion)

A test that exhausts all attempts is reported as a regular failure, with the failure of the final attempt identifying itself, for example ExampleTest::testOne (attempt 3 of 3).

#[Retry] semantics

  • #[Retry(3)]: run up to 3 times, the first attempt whose status is neither failure nor error decides the result.
  • Only failure and error trigger a retry. A skipped or incomplete attempt ends the loop and decides the result.
  • The eligibility rules match #[Repeat]: only test methods with an explicit void return type declaration and without #[Depends] dependencies are retried. Using #[Retry] on an ineligible method triggers a test runner warning, as does a $maxAttempts argument that is not a positive integer.
  • #[Retry] combined with #[Repeat] on the same method triggers a test runner warning and #[Retry] is ignored. #[Retry] takes precedence over --repeat.
  • When the test method uses a data provider, each data set is retried independently.
  • There is deliberately no --retry CLI option: retrying is a per-test decision that belongs in the code where it can be reviewed, not a global switch that tolerates flakiness across an entire suite.

Architecture

Test suite structure

A retried test method is wrapped in a RetryTestSuite, a TestSuite subclass modeled after RepeatTestSuite and DataProviderTestSuite. It holds a single eagerly created test case instance plus a factory for creating fresh instances for additional attempts. Every attempt runs on a clean instance, preserving PHPUnit's one-instance-per-test guarantee. Because the suite contains exactly one test, planned test counts, progress output, filters, and sorting are naturally correct; the test suite sorter treats a RetryTestSuite as an atomic unit.

Execution flow and the lifecycle invariant

RetryTestSuite::run() runs each attempt with event collection active (see below). The outcome determines what happens to the collected events:

  • A failed or errored attempt with attempts remaining is tolerated: its collected events are discarded and a single new event, Test\AttemptFailed or Test\AttemptErrored, carrying the test and the throwable, is emitted in their place.
  • The deciding attempt (first non-failing attempt, or the last attempt) has its events forwarded unchanged, with their original telemetry.

The design invariant is that a tolerated attempt emits no test lifecycle events: no Test Preparation Started, Test Prepared, or Test Finished. Every public lifecycle therefore carries a definitive standard outcome, which is what existing event consumers assume. As a result, a retried test counts as exactly one test in planned and actual totals, and the JUnit XML, TeamCity, and TestDox output report only the deciding attempt, without any of these loggers needing to know that retries exist. Consumers that want to observe the discarded attempts subscribe to the new events.

Scoped event collection

The mechanism behind this is a new internal capability of the event system: DeferringDispatcher can temporarily collect dispatched events into an EventCollection instead of forwarding them to its subscribers, controlled through the event facade. Collected events are either replayed through the existing Facade::forward() mechanism, the same mechanism that replays events from child processes, preserving original telemetry, or discarded. Because both the emitter and Facade::forward() dispatch through the DeferringDispatcher, the events a child process sends back during an attempt are collected as well, which is what makes retrying work transparently for tests run in process isolation: the attempt state is passed to the child process like the repetition state of repeated tests, and the test status transferred back from the child process decides whether to retry.

Event system

The TestMethod value object carries attempt and maxAttempts properties, mirroring repetition and totalRepetitions. id() and name() append an (attempt N of M) suffix, but only for the second and subsequent attempts. The first attempt of a retried test is indistinguishable from a normal test in all output, so a retried test that passes immediately, which should be the common case, produces no retry-related noise anywhere. A new TestSuiteForRetriedTestMethod event value object represents the suite, with isForDataSet() and maxAttempts() accessors.

Interaction with dependencies

A test depending on a retried test via #[Depends] runs when any attempt of its dependency passed: the passing attempt registers the method as passed itself, and for data sets the enclosing data provider test suite makes the decision once all data sets have finished, exactly as for repeated tests.

Out of scope

Deliberately not part of this pull request:

  • Delays and back-off between attempts (delaySeconds, multipliers, caps): sleeping inside the single-threaded test runner burns CI time, and second-granularity is wrong for most back-off needs. If demand materializes, a single delay parameter can be added to the attribute without breaking anything.
  • A --retry CLI option and XML configuration: see semantics above.
  • A --fail-on-retried option to make tolerated retries fail the run in strict setups; this could be a follow-up once the feature has settled.

@sebastianbergmann sebastianbergmann added type/enhancement A new idea that should be implemented feature/test-runner CLI test runner labels Jun 12, 2026
@sebastianbergmann sebastianbergmann self-assigned this Jun 12, 2026
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

API Surface Changes

If any of the additions below are not intended as public API, mark them with @internal in the docblock.

New API Surface

Classes

Interfaces

Methods

@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.51338% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.65%. Comparing base (b6252a0) to head (87265f6).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/Framework/RetryTestSuite.php 97.64% 2 Missing ⚠️
Additional details and impacted files
@@                   Coverage Diff                   @@
##             issue-5718/repeat    #6742      +/-   ##
=======================================================
+ Coverage                97.63%   97.65%   +0.02%     
- Complexity                8845     8957     +112     
=======================================================
  Files                      873      881       +8     
  Lines                    27110    27514     +404     
=======================================================
+ Hits                     26468    26870     +402     
- Misses                     642      644       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

…g them

DeferringDispatcher can now temporarily collect dispatched events into an EventCollection instead of forwarding them to its subscribers. The collection is started and stopped through new internal methods on the event facade. The caller decides what happens to the collected events: they can be replayed through the existing Facade::forward() mechanism, which preserves their original telemetry, or discarded.

This is the same pattern that is already used for tests run in process isolation, where the child process collects its events with a CollectingDispatcher and the parent process replays them, only that the collection happens in the parent process itself. Because both the emitter and Facade::forward() dispatch through the DeferringDispatcher, events that a child process sends back while a collection is active are collected as well.

This is groundwork for retrying flaky tests: a failed attempt's events can be discarded and replaced with a summary event, while the deciding attempt's events are forwarded unchanged.
The new Test\AttemptFailed and Test\AttemptErrored events represent an attempt of a retried test that failed or errored but will not fail the test run because another attempt follows. They mirror the existing Test\Failed and Test\Errored events: AttemptFailed carries the test, the throwable, and an optional comparison failure, AttemptErrored carries the test and the throwable.

A tolerated attempt emits exactly one such event and no test lifecycle events (no Test Preparation Started, Test Prepared, or Test Finished). Event consumers that do not know these events therefore see retried tests as a single test whose lifecycle is that of the deciding attempt, and consumers that do know them can observe the discarded attempts.
The #[Retry(int $maxAttempts)] attribute declares that a test method may be run up to $maxAttempts times, with the first attempt whose status is neither failure nor error deciding the test's result. The attribute is parsed into a Retry metadata object.

The attribute parser validates the argument the same way the #[Repeat] attribute is validated: when $maxAttempts is not a positive integer, a test runner warning is emitted and the attribute is ignored. Validating at parse time preserves the positive-int invariant of Metadata\Retry for all consumers of the metadata API.
A TestCase now knows which attempt of a retried test it represents and how many attempts are allowed at most, set via setAttempt(). The TestMethod event value object carries both values, populated by TestMethodBuilder, and offers an isRetried() convenience method.

id() and name() append an "(attempt N of M)" suffix, but only for the second and subsequent attempts. The first attempt of a retried test is indistinguishable from a normal test in output and logs, so a retried test that passes on its first attempt, which should be the common case, produces no retry-related noise. Later attempts have distinct ids, so result collections keyed by test id keep the attempts apart.

Like the repetition state of a repeated test, the attempt state is passed to the child process and replayed on the reconstructed TestCase when a test is run in process isolation, so events emitted in the child process carry the correct attempt identity.
A test method annotated with #[Retry] is wrapped in a RetryTestSuite, a TestSuite subclass that holds a single eagerly created test case instance and a factory for creating fresh instances for additional attempts. Each attempt therefore runs on a clean instance, like any other test. When a test method uses a data provider, each data set gets its own RetryTestSuite, mirroring how repeated tests are modeled.

RetryTestSuite::run() runs each attempt with event collection active. When the attempt's status is failure or error and attempts remain, the collected events are discarded and a single Test\AttemptFailed or Test\AttemptErrored event is emitted in their place. Any other outcome, the last attempt, and a requested stop end the loop: the deciding attempt's events are forwarded unchanged, with their original telemetry. Because a tolerated attempt has no public test lifecycle, the test counts as exactly one test in planned and actual totals, and loggers such as JUnit XML and TeamCity report only the deciding attempt without needing to know about retries. Attempts of a test that uses process isolation work the same way: the events the child process sends back are collected, and the status transferred from the child process is used to decide whether to retry.

The eligibility rules and warnings match those of #[Repeat]: only test methods with a void return type declaration and without dependencies are retried. Combining #[Retry] with #[Repeat] triggers a test runner warning and ignores #[Retry]; #[Retry] takes precedence over --repeat. Only failure and error trigger a retry, a skipped or incomplete attempt ends the loop. A test suite for a retried test method is represented by the new TestSuiteForRetriedTestMethod event value object, the test suite sorter treats a RetryTestSuite as an atomic unit, and the result collector leaves PassedTests registration to the passing attempt itself or, for data sets, to the enclosing data provider test suite.
Tolerating a flaky test must not hide it: a test that only passes after retrying should be visible in every test run, so that its flakiness can be tracked and eventually fixed instead of being institutionalized.

The result collector subscribes to the Test\AttemptFailed and Test\AttemptErrored events and counts the failed attempts per test, keyed by class name, method name, and data set. When the test ultimately fails, its entry is removed again, as the failure itself is already reported together with the number of its final attempt. The remaining entries, the tests whose failed attempts were tolerated, are carried on the TestResult value object, and the default result printer always lists them:

    There was 1 retried test:

    1) ExampleTest::testOne
    2 failed attempts

The run's result is not affected: a test suite whose retried tests all ultimately passed remains "OK".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature/test-runner CLI test runner type/enhancement A new idea that should be implemented

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant