Retry failing tests up to N times using the #[Retry] attribute#6742
Draft
sebastianbergmann wants to merge 12 commits into
Draft
Retry failing tests up to N times using the #[Retry] attribute#6742sebastianbergmann wants to merge 12 commits into
#[Retry] attribute#6742sebastianbergmann wants to merge 12 commits into
Conversation
API Surface ChangesIf any of the additions below are not intended as public API, mark them with New API SurfaceClasses
Interfaces
Methods
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## issue-5718/repeat #6742 +/- ##
=======================================================
+ Coverage 97.63% 97.65% +0.02%
- Complexity 8845 8957 +112
=======================================================
Files 873 881 +8
Lines 27110 27514 +404
=======================================================
+ Hits 26468 26870 +402
- Misses 642 644 +2 ☔ View full report in Codecov by Harness. |
d01830e to
fa32cf9
Compare
485052b to
b6252a0
Compare
…g them DeferringDispatcher can now temporarily collect dispatched events into an EventCollection instead of forwarding them to its subscribers. The collection is started and stopped through new internal methods on the event facade. The caller decides what happens to the collected events: they can be replayed through the existing Facade::forward() mechanism, which preserves their original telemetry, or discarded. This is the same pattern that is already used for tests run in process isolation, where the child process collects its events with a CollectingDispatcher and the parent process replays them, only that the collection happens in the parent process itself. Because both the emitter and Facade::forward() dispatch through the DeferringDispatcher, events that a child process sends back while a collection is active are collected as well. This is groundwork for retrying flaky tests: a failed attempt's events can be discarded and replaced with a summary event, while the deciding attempt's events are forwarded unchanged.
The new Test\AttemptFailed and Test\AttemptErrored events represent an attempt of a retried test that failed or errored but will not fail the test run because another attempt follows. They mirror the existing Test\Failed and Test\Errored events: AttemptFailed carries the test, the throwable, and an optional comparison failure, AttemptErrored carries the test and the throwable. A tolerated attempt emits exactly one such event and no test lifecycle events (no Test Preparation Started, Test Prepared, or Test Finished). Event consumers that do not know these events therefore see retried tests as a single test whose lifecycle is that of the deciding attempt, and consumers that do know them can observe the discarded attempts.
The #[Retry(int $maxAttempts)] attribute declares that a test method may be run up to $maxAttempts times, with the first attempt whose status is neither failure nor error deciding the test's result. The attribute is parsed into a Retry metadata object. The attribute parser validates the argument the same way the #[Repeat] attribute is validated: when $maxAttempts is not a positive integer, a test runner warning is emitted and the attribute is ignored. Validating at parse time preserves the positive-int invariant of Metadata\Retry for all consumers of the metadata API.
A TestCase now knows which attempt of a retried test it represents and how many attempts are allowed at most, set via setAttempt(). The TestMethod event value object carries both values, populated by TestMethodBuilder, and offers an isRetried() convenience method. id() and name() append an "(attempt N of M)" suffix, but only for the second and subsequent attempts. The first attempt of a retried test is indistinguishable from a normal test in output and logs, so a retried test that passes on its first attempt, which should be the common case, produces no retry-related noise. Later attempts have distinct ids, so result collections keyed by test id keep the attempts apart. Like the repetition state of a repeated test, the attempt state is passed to the child process and replayed on the reconstructed TestCase when a test is run in process isolation, so events emitted in the child process carry the correct attempt identity.
A test method annotated with #[Retry] is wrapped in a RetryTestSuite, a TestSuite subclass that holds a single eagerly created test case instance and a factory for creating fresh instances for additional attempts. Each attempt therefore runs on a clean instance, like any other test. When a test method uses a data provider, each data set gets its own RetryTestSuite, mirroring how repeated tests are modeled. RetryTestSuite::run() runs each attempt with event collection active. When the attempt's status is failure or error and attempts remain, the collected events are discarded and a single Test\AttemptFailed or Test\AttemptErrored event is emitted in their place. Any other outcome, the last attempt, and a requested stop end the loop: the deciding attempt's events are forwarded unchanged, with their original telemetry. Because a tolerated attempt has no public test lifecycle, the test counts as exactly one test in planned and actual totals, and loggers such as JUnit XML and TeamCity report only the deciding attempt without needing to know about retries. Attempts of a test that uses process isolation work the same way: the events the child process sends back are collected, and the status transferred from the child process is used to decide whether to retry. The eligibility rules and warnings match those of #[Repeat]: only test methods with a void return type declaration and without dependencies are retried. Combining #[Retry] with #[Repeat] triggers a test runner warning and ignores #[Retry]; #[Retry] takes precedence over --repeat. Only failure and error trigger a retry, a skipped or incomplete attempt ends the loop. A test suite for a retried test method is represented by the new TestSuiteForRetriedTestMethod event value object, the test suite sorter treats a RetryTestSuite as an atomic unit, and the result collector leaves PassedTests registration to the passing attempt itself or, for data sets, to the enclosing data provider test suite.
Tolerating a flaky test must not hide it: a test that only passes after retrying should be visible in every test run, so that its flakiness can be tracked and eventually fixed instead of being institutionalized.
The result collector subscribes to the Test\AttemptFailed and Test\AttemptErrored events and counts the failed attempts per test, keyed by class name, method name, and data set. When the test ultimately fails, its entry is removed again, as the failure itself is already reported together with the number of its final attempt. The remaining entries, the tests whose failed attempts were tolerated, are carried on the TestResult value object, and the default result printer always lists them:
There was 1 retried test:
1) ExampleTest::testOne
2 failed attempts
The run's result is not affected: a test suite whose retried tests all ultimately passed remains "OK".
26fc3f2 to
621dc7b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This pull request introduces a
#[Retry(int $maxAttempts)]attribute. A test method annotated with it is run up to$maxAttemptstimes; the first attempt whose status is neither failure nor error decides the test's result. A test that passes on a retry does not fail the test run, but it is always listed in the test result summary so that its flakiness remains visible.This builds on #6591, which has not been merged yet
This branch is based on the
issue-5718/repeatbranch of #6591 (repeated test execution using--repeatand#[Repeat]), which has not been merged yet. This pull request therefore targets theissue-5718/repeatbranch, notmain, so that its diff shows only the changes belonging to this pull request.Once #6591 is merged, this branch will be rebased onto
mainand this pull request will be retargeted tomain. It should not be merged before that has happened.The dependency is not incidental as the retry feature consumes infrastructure that #6591 introduced:
TestCaseand theTestMethodevent value object mirrors the repetition identity, including its transfer into child processes for process isolation.TestStatusback to the parent-sideTestCaseinstance, added in Repeated test execution using--repeatCLI option and#[Repeat]attribute #6591, is what letsRetryTestSuitedecide whether an isolated attempt failed.RetryTestSuitefollows the suite modeling that Repeated test execution using--repeatCLI option and#[Repeat]attribute #6591 established withRepeatTestSuite(aTestSuitesubclass per test method, nested per data set, treated as an atomic unit by the test suite sorter), and the result collector's handling of per-data-set suites reuses the same deferral rules.The two features are intentionally separate pull requests because they have opposite semantics:
#[Repeat]runs a test N times and stops at failure: failure is the signal, the feature exists to detect flakiness.#[Retry]runs a test up to N times and stops at success: failure is noise, the feature exists to tolerate flakiness where it cannot reasonably be fixed. #6591 explicitly scopes retry-style semantics out; this pull request is that follow-up.Use Cases
Tolerating known-flaky tests
Some tests interact with inherently unreliable systems: network services, hardware timing, processes whose scheduling the test cannot control. Such a test failing once is not information; it failing consistently is.
#[Retry]lets the test express, in code where it can be reviewed, that a bounded number of additional attempts is acceptable before the failure counts.Replacing CI-level retry loops
Re-running entire CI jobs to get past one flaky test is expensive and hides which test was flaky. Per-test retries inside a single PHPUnit invocation are cheaper and produce a per-test record of every tolerated failure.
Prior work
Community demand
Retry functionality was requested in the discussion of #6591, referencing the
bshaffer/phpunit-retry-annotationspackage, which provided@retryAttemptsannotations for PHPUnit 9 and earlier by overriding test execution in a base test case class.Why this needs to be in PHPUnit itself
That extension approach is no longer possible: since PHPUnit 10,
TestCase::run()andTestCase::runBare()are final, and the event system is deliberately read-only: subscribers can observe test execution but cannot re-run tests. If retry functionality is to exist for PHPUnit 10+, it can only live in PHPUnit itself.Keeping flakiness visible
The standard objection to built-in retries is that they institutionalize flaky tests: a race condition that fails 30% of the time passes CI forever and is never fixed. This implementation answers that objection with unconditional visibility: every test that passed only after retrying is listed in the test result summary of every run, with the number of failed attempts. This cannot be disabled.
A test that exhausts all attempts is reported as a regular failure, with the failure of the final attempt identifying itself, for example
ExampleTest::testOne (attempt 3 of 3).#[Retry]semantics#[Retry(3)]: run up to 3 times, the first attempt whose status is neither failure nor error decides the result.#[Repeat]: only test methods with an explicitvoidreturn type declaration and without#[Depends]dependencies are retried. Using#[Retry]on an ineligible method triggers a test runner warning, as does a$maxAttemptsargument that is not a positive integer.#[Retry]combined with#[Repeat]on the same method triggers a test runner warning and#[Retry]is ignored.#[Retry]takes precedence over--repeat.--retryCLI option: retrying is a per-test decision that belongs in the code where it can be reviewed, not a global switch that tolerates flakiness across an entire suite.Architecture
Test suite structure
A retried test method is wrapped in a
RetryTestSuite, aTestSuitesubclass modeled afterRepeatTestSuiteandDataProviderTestSuite. It holds a single eagerly created test case instance plus a factory for creating fresh instances for additional attempts. Every attempt runs on a clean instance, preserving PHPUnit's one-instance-per-test guarantee. Because the suite contains exactly one test, planned test counts, progress output, filters, and sorting are naturally correct; the test suite sorter treats aRetryTestSuiteas an atomic unit.Execution flow and the lifecycle invariant
RetryTestSuite::run()runs each attempt with event collection active (see below). The outcome determines what happens to the collected events:Test\AttemptFailedorTest\AttemptErrored, carrying the test and the throwable, is emitted in their place.The design invariant is that a tolerated attempt emits no test lifecycle events: no Test Preparation Started, Test Prepared, or Test Finished. Every public lifecycle therefore carries a definitive standard outcome, which is what existing event consumers assume. As a result, a retried test counts as exactly one test in planned and actual totals, and the JUnit XML, TeamCity, and TestDox output report only the deciding attempt, without any of these loggers needing to know that retries exist. Consumers that want to observe the discarded attempts subscribe to the new events.
Scoped event collection
The mechanism behind this is a new internal capability of the event system:
DeferringDispatchercan temporarily collect dispatched events into anEventCollectioninstead of forwarding them to its subscribers, controlled through the event facade. Collected events are either replayed through the existingFacade::forward()mechanism, the same mechanism that replays events from child processes, preserving original telemetry, or discarded. Because both the emitter andFacade::forward()dispatch through theDeferringDispatcher, the events a child process sends back during an attempt are collected as well, which is what makes retrying work transparently for tests run in process isolation: the attempt state is passed to the child process like the repetition state of repeated tests, and the test status transferred back from the child process decides whether to retry.Event system
The
TestMethodvalue object carriesattemptandmaxAttemptsproperties, mirroringrepetitionandtotalRepetitions.id()andname()append an(attempt N of M)suffix, but only for the second and subsequent attempts. The first attempt of a retried test is indistinguishable from a normal test in all output, so a retried test that passes immediately, which should be the common case, produces no retry-related noise anywhere. A newTestSuiteForRetriedTestMethodevent value object represents the suite, withisForDataSet()andmaxAttempts()accessors.Interaction with dependencies
A test depending on a retried test via
#[Depends]runs when any attempt of its dependency passed: the passing attempt registers the method as passed itself, and for data sets the enclosing data provider test suite makes the decision once all data sets have finished, exactly as for repeated tests.Out of scope
Deliberately not part of this pull request:
delaySeconds, multipliers, caps): sleeping inside the single-threaded test runner burns CI time, and second-granularity is wrong for most back-off needs. If demand materializes, a single delay parameter can be added to the attribute without breaking anything.--retryCLI option and XML configuration: see semantics above.--fail-on-retriedoption to make tolerated retries fail the run in strict setups; this could be a follow-up once the feature has settled.