Repeated test execution using --repeat CLI option and #[Repeat] attribute#6591
Repeated test execution using --repeat CLI option and #[Repeat] attribute#6591sebastianbergmann wants to merge 22 commits into
--repeat CLI option and #[Repeat] attribute#6591Conversation
|
@nikophil Would be great to get your feedback on this. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6591 +/- ##
============================================
+ Coverage 97.59% 97.63% +0.03%
- Complexity 8755 8845 +90
============================================
Files 869 873 +4
Lines 26751 27110 +359
============================================
+ Hits 26108 26468 +360
+ Misses 643 642 -1 ☔ View full report in Codecov by Harness. |
d2110e5 to
e103270
Compare
|
Hi @sebastianbergmann Here's an example attribute class that briefly describes its purpose: |
Thank you for your suggestion. At this time, I am not able to consider suggestions like this. I think that If and when this work is merged, then and only then am I able to think about further additions. I mean no disrespect, but right now such suggestions are a distraction for me |
|
@sebastianbergmann I think using |
e103270 to
b563551
Compare
Implemented now. |
b563551 to
1cfd4cc
Compare
API Surface ChangesIf any of the additions below are not intended as public API, mark them with New API SurfaceClasses
Methods
Modified API SurfaceMethods
|
199afc4 to
4bb8e05
Compare
4bb8e05 to
46f394a
Compare
5b60cb9 to
8ab2d8f
Compare
3707f99 to
d7e2aa3
Compare
I started to work on this in #6742. |
Previously, RepeatTestSuite was declared as a leaf Test even though it structurally represents a group of test cases (the N repetitions of a test method). This produced two inconsistencies: * TestSuite::addTest() registered the suite under the first repetition's id (Class::method (repetition 1 of N)) rather than the group id, so other repetitions appeared as no members of the group. * Event\TestSuite\TestSuiteBuilder::process() treated the suite as a leaf and surfaced only tests[0]->valueObjectForEvents() into the parent's TestCollection. The collection therefore had one entry per repetition group while count() reported N: the two disagreed. Both issues are removed by treating RepeatTestSuite the same way DataProviderTestSuite is treated: * RepeatTestSuite now extends Framework\TestSuite and is constructed via RepeatTestSuite::fromTests($name, $tests, $failureThreshold). It overrides run() to retain the failure-threshold/abort semantics and delegates provides(), requires(), sortId(), and setDependencies() to its first child / all children. * A dedicated event-level value object TestSuiteForRepeatedTestMethod is introduced alongside TestSuiteForTestMethodWithDataProvider. It exposes className(), methodName(), file(), line(), and an isForRepeatedTestMethod() predicate on the base Event\TestSuite\TestSuite. * Event\TestSuite\TestSuiteBuilder::from() detects RepeatTestSuite and returns the new value object; process() recurses through it like any other framework TestSuite, so each repetition's TestMethod event value object now appears in the parent's TestCollection. * The special-case branches in Framework\TestSuite::addTest() and Runner\Filter\NameFilterIterator::accept() are removed. The inherited instanceof self branch handles registration; the existing TestSuite branch in the filter recurses into children, which then match individually. * Runner\TestResult\Collector::testSuiteFinished() learns about the new value object and, when no repetition of the method failed, records the method as passed via PassedTests::testMethodPassed(), mirroring the data-provider handler. As a consequence of RepeatTestSuite being a real TestSuite, Test Suite Started / Test Suite Finished events are now emitted around each repetition group. The JUnit XML logger correspondingly produces a nested <testsuite> element per repeated method, matching how it already renders data-provider suites.
…oup data providers and repeated tests
…nts that are not positive integers
…ders When a test method uses both a data provider and repetition, each data set gets its own RepeatTestSuite. Collector::testSuiteFinished() registered the test method in PassedTests as soon as the first data set's RepeatTestSuite finished without failures, before the remaining data sets had run. Since PassedTests has no retraction mechanism, a failure in a later data set could not undo the registration, and a test depending on the method via #[Depends] ran even though its dependency had failed. The event value object TestSuiteForRepeatedTestMethod now knows whether it represents the repetitions of a single data set (isForDataSet()), derived in Event\TestSuite\TestSuiteBuilder from the "#" separator it already parses. For such a suite, the Collector no longer registers the test method as passed and instead leaves the decision to the enclosing data provider test suite's finished event, which fires only after all data sets have run and which already performed this registration before repetition support was introduced. Repeated test methods without a data provider are still registered when their RepeatTestSuite finishes, as that is the point at which all repetitions have run. The registration logic, previously duplicated between the data provider and repeated test method branches, is extracted into a shared helper method.
…od in another class When deciding whether a test method that uses a data provider (or, since the introduction of repetition support, a repeated test method) should be registered as passed, Collector compared failure events by method name only. A failure of a same-named method in an unrelated class therefore prevented the registration, and tests depending on the method that had actually passed were wrongly skipped. Failure events are now matched on class name and method name, aligning the check with the Class::method granularity that PassedTests has always used for registration. This cannot cause a failed method to be registered as passed: a method's own failure still matches both comparisons.
…s run in process isolation The child process reconstructs the TestCase from the class and method name and replays per-instance state such as provided data and dependency input, but not the repetition state set by TestBuilder via setRepetition(). The TestMethod value objects for all events emitted in the child process are built from that reconstructed instance, so they reported repetition 1 of 1 regardless of the actual repetition. As a consequence, all repetitions of a test using process isolation had the identical test id and name: the "(repetition N of M)" suffix was missing from debug output and from the JUnit XML and TeamCity loggers, and result collections keyed by test id merged the issues of all repetitions into a single entry. The repetition and the total number of repetitions are now passed to the child process template and replayed alongside the other test case state. For tests that are not repeated this sets the default values and changes nothing.
…works for repeated tests run in process isolation ChildProcessResultProcessor applied the child's test result and assertion count to the parent's TestCase instance, but not the child's TestStatus. The parent-side status therefore remained "unknown" for tests run in process isolation. RepeatTestSuite::run() reads that status to count failures and errors against the failure threshold. Because the status was never populated, the threshold was never reached for repeated tests run in process isolation: all repetitions were executed even after a failure, instead of the remaining repetitions being skipped. The child process now includes its TestStatus in the serialized result, and ChildProcessResultProcessor applies it to the parent instance via a new internal TestCase::setStatus() method. The processor's error paths (output on stderr, a tampered result file, an unparseable result) now also set an error status, so a crashing child process counts toward the failure threshold as well.
RepeatTestSuite::run() iterated over its tests in place and kept all TestCase instances referenced until the entire test run had finished. For the stress-testing use case that repetition is intended for, this meant memory usage grew linearly with the number of repetitions: with --repeat 200 and a test retaining a one-megabyte payload per instance, peak memory was 218 MB. run() now follows the same pattern as TestSuite::run(): the tests are collected from the iterator, the suite's own references are released, and each instance is dropped as soon as its repetition has finished. The same scenario now peaks at 20 MB. The guard against running a suite twice, previously lost in the override, is also restored.
The test suite sorter recursed into RepeatTestSuite like into any other test suite and reordered the repetitions it holds. With --order-by random the repetitions ran shuffled; with --order-by reverse they ran in descending order. Both made the message emitted when the remaining repetitions are skipped after a failure refer to a repetition number that ran before lower-numbered repetitions that were then skipped. RepeatTestSuite is now treated as an atomic unit by the sorter: it is still reordered among its siblings, but its repetitions always run in ascending order. Ordering defects first is unaffected because the sort id of a RepeatTestSuite is the sort id of its repetitions, which the enclosing test suite registers itself.
The TestDox name prettifier produced the same string for every repetition of a repeated test, so the TestDox output listed N identical lines per repeated test method and it was impossible to tell which repetition had failed. The prettified name now carries a "(repetition N of M)" suffix, mirroring how data sets are appended for test methods that use data providers. The memoization key includes the repetition number, as the cache would otherwise return the first repetition's string for all of them. Unlike the data set suffix, the repetition suffix is also appended to names customized with #[TestDox], because a custom text has no placeholder for the repetition and the lines would otherwise remain indistinguishable.
The testSuiteStarted message for a repeated test method's suite carried no locationHint, unlike the suites for test classes and for test methods that use data providers, so IDEs could not navigate from the suite node to the test method. The location hint is built from the class and method name rather than from the suite name, because the suite name of a repetition group that represents a single data set carries a "#dataSetName" suffix that does not belong in a php_qn:// URL.
485052b to
b6252a0
Compare
Use Cases
Stress-testing concurrent or stateful code
Code that manages connections, caches, file handles, or other resources may behave correctly once but leak or corrupt state over repeated invocations.
--repeatprovides a lightweight way to exercise these paths without writing dedicated stress tests.Detecting flaky tests
Tests that pass in isolation but fail intermittently under repeated execution are a common source of CI instability. Causes include shared mutable state, timing-dependent logic, non-deterministic ordering, and resource leaks. Running each test multiple times in a single PHPUnit invocation surfaces these failures without requiring external scripting or CI-level retry loops.
Bounding the cost of repeating failing tests
Repeating a test many times is only useful while it is still producing new information. The
#[Repeat]attribute'sfailureThresholdparameter controls how many failures are allowed to accumulate before the remaining repetitions are skipped: with#[Repeat(100)], a test that fails immediately does not burn 99 more repetitions; with#[Repeat(100, 5)], repetition continues until five failures have been observed, which is useful when the failure pattern itself (how often, on which repetitions) is the information being gathered.Note that
failureThresholddoes not make failures acceptable: every failed repetition is reported as a failure and fails the test run. Retry-style semantics ("pass if enough repetitions pass") are intentionally out of scope for this pull request.Prior work
--repeatbefore PHPUnit 10PHPUnit had a
--repeatCLI option from early versions through PHPUnit 9. Its semantics were fundamentally different from the implementation proposed here: it re-ran the entire test suite N times rather than repeating each test individually.The old implementation worked by adding the suite to itself multiple times in
TestRunner:This produced an execution order of A, B, A, B (interleaved) rather than A, A, B, B (grouped). There was no per-test failure isolation. If test A failed on the second run, all remaining tests in that suite iteration still executed.
The
--repeatoption was removed in 442b9ab as part of the work on PHPUnit 10. I always/only considered the whole-suite repetition model a benchmarking feature that did not fit the direction of PHPUnit 10's redesigned architecture and event system.Community discussion
Shortly after PHPUnit 10's release, issue #5718 was opened requesting that
--repeatbe brought back. It received over 50 thumbs-up reactions, reflecting strong community demand for built-in repetition support. Commenters described use cases including flaky test detection, CI stability, and stress-testing stateful code.At the Code Sprint in Munich in October 2024, a new consensus emerged:
--repeatshould return with per-test repetition semantics rather than the old whole-suite model. Each test would run up to N times, stopping at the first failure. This per-test granularity provides more useful failure isolation and matches the expectations of developers using repetition for flaky test detection.PR #6397
PR #6397 by @nikophil was the first implementation attempt following the new semantics. It introduced the
RepeatTestSuiteconcept: a dedicated wrapper class that holds NTestCaseinstances for a single test method and controls their execution. This design decision informed the architecture suggested in this pull request.Inspiration from JUnit 5
During the discussion on PR #6397, @marcphilipp pointed to JUnit 5's
@RepeatedTestannotation as a reference design.JUnit 5 supports
@RepeatedTest(value = 100, failureThreshold = 1), where each repetition is reported as a child of a container node in the test tree. ThefailureThresholdparameter causes the remaining repetitions to be skipped automatically once the configured number of failures has been reached; failed repetitions are still reported as failures.This directly inspired the
#[Repeat(int $times, int $failureThreshold)]attribute suggested in this pull request. It provides the same per-method granularity and the same threshold semantics.--repeatCLI option and#[Repeat]attribute--repeat <N>#[Repeat(times, failureThreshold)]When
--repeatis used then the semantics of the#[Repeat]attribute takes precedence over the general--repeatsemantics.Both mechanisms validate their input: a
--repeatvalue that is not a positive integer is ignored with a test runner warning, and a#[Repeat]attribute whosetimesorfailureThresholdargument is not a positive integer is ignored with a test runner warning (the test then runs without attribute-based repetition).Architecture
Test suite structure
Repeated tests are wrapped in a
RepeatTestSuite, a subclass ofTestSuitethat is modeled afterDataProviderTestSuite: a test suite that groups the test cases derived from a single test method. ARepeatTestSuiteholds N independentTestCaseinstances for the same test method. Each instance has its ownrepetition(1-based index) andtotalRepetitionsvalues set viaTestCase::setRepetition().Because
RepeatTestSuiteis a realTestSuite, the event system, test suite sorting, and the filter iterators treat it like any other test suite: Test Suite Started / Test Suite Finished events are emitted around each repetition group, and the JUnit XML logger produces a nested<testsuite>element per repeated method, matching how it already renders data provider suites. A dedicated event-level value object,TestSuiteForRepeatedTestMethod, is emitted for these suites, alongside the existingTestSuiteForTestMethodWithDataProvider. ItsisForDataSet()method tells event consumers whether the repetition group represents a single data set of a test method that uses a data provider.The repetitions of a repeated test always run in ascending order. The test suite sorter treats a
RepeatTestSuiteas an atomic unit:--order-by random|reverse|size|durationreorder repeated test methods among their siblings, but never the repetitions within a group.Execution flow
RepeatTestSuite::run()iterates its tests sequentially. EachTestCase::run()goes through the normalTestRunnerpath, emitting the full lifecycle of events (TestPreparationStarted,TestPrepared,TestPassed/TestFailed,TestFinished). LikeTestSuite::run(), it releases eachTestCaseinstance as soon as its repetition has finished, so memory usage does not grow with the number of repetitions.When a test fails or errors, the failure count is incremented. Once the failure count reaches the configured threshold (default 1), all remaining repetitions are skipped via
TestCase::markSkippedForRepeatAbort(), which emits aTestSkippedevent with a message identifying which repetition caused the abort.Not every test can be repeated
TestBuilderchecks two conditions before wrapping a test inRepeatTestSuite:Explicit
voidreturn type declaration: Tests that return values are used by#[Depends]to pass data between tests. Repeating such a test would produce N potentially different return values, creating ambiguity. Only test methods that are explicitly declared to returnvoidare repeated. A test method without any return type declaration, or with any other return type declaration (including union types such asint|string), is not repeated — even though it may effectively return nothing. This is a deliberate decision: requiring the explicit declaration makes eligibility checkable without running the test.No dependencies: Tests attributed with
#[Depends]are not repeated. They run once, after all repetitions of their dependency have passed.Tests that fail these checks run exactly once. With
--repeat, this happens silently: on a codebase whose test methods do not declarevoidreturn types,--repeatwill repeat nothing, by design. Adding the missingvoiddeclarations is the way to opt such tests into repetition. When the#[Repeat]attribute is used on a test method that fails these checks, a test runner warning is emitted, because the attribute expresses the explicit expectation that this particular method be repeated.Interaction with data providers
When a test uses
#[DataProvider]and is eligible for repetition, each data set gets its ownRepeatTestSuite:A failure in one data set's repetitions does not affect other data sets. This provides per-data-set granularity: if data set 0 fails on repetition 2, its remaining repetitions are skipped, but data set 1 still runs all its repetitions independently.
Interaction with dependencies
When test B depends on test A (via
#[Depends]):RepeatTestSuite(if eligible) and runs all N repetitions first.When the dependency uses a data provider, the method is only considered passed once all of its data sets have finished: the result collector defers the decision from the per-data-set
RepeatTestSuiteto the enclosingDataProviderTestSuite, so a failure in a later data set correctly causes dependent tests to be skipped.As a related fix that also benefits tests that are not repeated, the result collector now matches failures by class name and method name when deciding whether a test method with a data provider passed. Previously a failure of a same-named method in an unrelated class wrongly blocked dependent tests of a method that had actually passed.
Interaction with process isolation
Repetition works with
#[RunInSeparateProcess]and#[RunTestsInSeparateProcesses]. Two pieces of state cross the process boundary:repetition,totalRepetitions) is passed to the child process and replayed on the reconstructedTestCase, alongside provided data and dependency input. Events emitted in the child process therefore carry the correct repetition identity.TestStatusis included in the serialized process result and applied to the parent-sideTestCaseinstance.RepeatTestSuitereads this status to count failures, so the failure threshold takes effect for tests run in process isolation, and the remaining repetitions are skipped after the threshold is reached. A child process that crashes (or whose result cannot be read) is recorded as an error on the parent instance and counts toward the threshold as well.Event System
The
TestMethodvalue object carriesrepetitionandtotalRepetitionsproperties, populated from theTestCasebyTestMethodBuilder::fromTestCase(). These properties affect two methods:id()appends(repetition N of M)whentotalRepetitions > 1. This ensures each repetition has a distinct identity in debug output, logging, and result collection.name()appends the same suffix. This appears in failure messages, JUnit XML, and Open Test Reporting output.Both default to
1, so non-repeated tests are completely unaffected.The
isRepeated()convenience method returnstruewhentotalRepetitions > 1.The TestDox output reports each repetition with the same
(repetition N of M)suffix, and the TeamCity logger emits alocationHintfor the test suite of a repeated test method, so IDEs can navigate from the suite node to the test method.Failure threshold
The
failureThresholdparameter (available only via#[Repeat], defaults to 1) controls how many failures may accumulate before the remaining repetitions are skipped:#[Repeat(10)]: Run up to 10 times, skip the remaining repetitions after the first failure (threshold = 1)#[Repeat(10, 3)]: Run up to 10 times, skip the remaining repetitions after 3 failures#[Repeat(10, 3)]with fewer than 3 failures: all 10 repetitions are runThe threshold only controls when repetition stops. It does not change how failures are reported: every failed repetition is reported as a failure, and a single failed repetition is enough for the test run to fail. A repetition group in which no repetition failed is recorded as a passed test for the purposes of
#[Depends].This matches the semantics of JUnit 5's
@RepeatedTestannotation, whosefailureThresholdparameter likewise causes remaining repetitions to be skipped without changing how failed repetitions are reported.