Skip to content

Repeated test execution using --repeat CLI option and #[Repeat] attribute#6591

Open
sebastianbergmann wants to merge 22 commits into
mainfrom
issue-5718/repeat
Open

Repeated test execution using --repeat CLI option and #[Repeat] attribute#6591
sebastianbergmann wants to merge 22 commits into
mainfrom
issue-5718/repeat

Conversation

@sebastianbergmann

@sebastianbergmann sebastianbergmann commented Apr 16, 2026

Copy link
Copy Markdown
Owner

Use Cases

Stress-testing concurrent or stateful code

Code that manages connections, caches, file handles, or other resources may behave correctly once but leak or corrupt state over repeated invocations. --repeat provides a lightweight way to exercise these paths without writing dedicated stress tests.

Detecting flaky tests

Tests that pass in isolation but fail intermittently under repeated execution are a common source of CI instability. Causes include shared mutable state, timing-dependent logic, non-deterministic ordering, and resource leaks. Running each test multiple times in a single PHPUnit invocation surfaces these failures without requiring external scripting or CI-level retry loops.

Bounding the cost of repeating failing tests

Repeating a test many times is only useful while it is still producing new information. The #[Repeat] attribute's failureThreshold parameter controls how many failures are allowed to accumulate before the remaining repetitions are skipped: with #[Repeat(100)], a test that fails immediately does not burn 99 more repetitions; with #[Repeat(100, 5)], repetition continues until five failures have been observed, which is useful when the failure pattern itself (how often, on which repetitions) is the information being gathered.

Note that failureThreshold does not make failures acceptable: every failed repetition is reported as a failure and fails the test run. Retry-style semantics ("pass if enough repetitions pass") are intentionally out of scope for this pull request.

Prior work

--repeat before PHPUnit 10

PHPUnit had a --repeat CLI option from early versions through PHPUnit 9. Its semantics were fundamentally different from the implementation proposed here: it re-ran the entire test suite N times rather than repeating each test individually.

The old implementation worked by adding the suite to itself multiple times in TestRunner:

foreach (range(1, $configuration->repeat()) as $step) {
    $_suite->addTest($suite);
}

This produced an execution order of A, B, A, B (interleaved) rather than A, A, B, B (grouped). There was no per-test failure isolation. If test A failed on the second run, all remaining tests in that suite iteration still executed.

The --repeat option was removed in 442b9ab as part of the work on PHPUnit 10. I always/only considered the whole-suite repetition model a benchmarking feature that did not fit the direction of PHPUnit 10's redesigned architecture and event system.

Community discussion

Shortly after PHPUnit 10's release, issue #5718 was opened requesting that --repeat be brought back. It received over 50 thumbs-up reactions, reflecting strong community demand for built-in repetition support. Commenters described use cases including flaky test detection, CI stability, and stress-testing stateful code.

At the Code Sprint in Munich in October 2024, a new consensus emerged: --repeat should return with per-test repetition semantics rather than the old whole-suite model. Each test would run up to N times, stopping at the first failure. This per-test granularity provides more useful failure isolation and matches the expectations of developers using repetition for flaky test detection.

PR #6397

PR #6397 by @nikophil was the first implementation attempt following the new semantics. It introduced the RepeatTestSuite concept: a dedicated wrapper class that holds N TestCase instances for a single test method and controls their execution. This design decision informed the architecture suggested in this pull request.

Inspiration from JUnit 5

During the discussion on PR #6397, @marcphilipp pointed to JUnit 5's @RepeatedTest annotation as a reference design.

JUnit 5 supports @RepeatedTest(value = 100, failureThreshold = 1), where each repetition is reported as a child of a container node in the test tree. The failureThreshold parameter causes the remaining repetitions to be skipped automatically once the configured number of failures has been reached; failed repetitions are still reported as failures.

This directly inspired the #[Repeat(int $times, int $failureThreshold)] attribute suggested in this pull request. It provides the same per-method granularity and the same threshold semantics.

--repeat CLI option and #[Repeat] attribute

Mechanism Scope Granularity
--repeat <N> All eligible tests Global, from CLI
#[Repeat(times, failureThreshold)] Single test method Per-method, in source code

When --repeat is used then the semantics of the #[Repeat] attribute takes precedence over the general --repeat semantics.

Both mechanisms validate their input: a --repeat value that is not a positive integer is ignored with a test runner warning, and a #[Repeat] attribute whose times or failureThreshold argument is not a positive integer is ignored with a test runner warning (the test then runs without attribute-based repetition).

Architecture

Test suite structure

Repeated tests are wrapped in a RepeatTestSuite, a subclass of TestSuite that is modeled after DataProviderTestSuite: a test suite that groups the test cases derived from a single test method. A RepeatTestSuite holds N independent TestCase instances for the same test method. Each instance has its own repetition (1-based index) and totalRepetitions values set via TestCase::setRepetition().

TestSuite (class level)
  RepeatTestSuite [testOne, 3 repetitions]
    TestCase (testOne, repetition 1 of 3)
    TestCase (testOne, repetition 2 of 3)
    TestCase (testOne, repetition 3 of 3)
  RepeatTestSuite [testTwo, 3 repetitions]
    TestCase (testTwo, repetition 1 of 3)
    ...

Because RepeatTestSuite is a real TestSuite, the event system, test suite sorting, and the filter iterators treat it like any other test suite: Test Suite Started / Test Suite Finished events are emitted around each repetition group, and the JUnit XML logger produces a nested <testsuite> element per repeated method, matching how it already renders data provider suites. A dedicated event-level value object, TestSuiteForRepeatedTestMethod, is emitted for these suites, alongside the existing TestSuiteForTestMethodWithDataProvider. Its isForDataSet() method tells event consumers whether the repetition group represents a single data set of a test method that uses a data provider.

The repetitions of a repeated test always run in ascending order. The test suite sorter treats a RepeatTestSuite as an atomic unit: --order-by random|reverse|size|duration reorder repeated test methods among their siblings, but never the repetitions within a group.

Execution flow

RepeatTestSuite::run() iterates its tests sequentially. Each TestCase::run() goes through the normal TestRunner path, emitting the full lifecycle of events (TestPreparationStarted, TestPrepared, TestPassed/TestFailed, TestFinished). Like TestSuite::run(), it releases each TestCase instance as soon as its repetition has finished, so memory usage does not grow with the number of repetitions.

When a test fails or errors, the failure count is incremented. Once the failure count reaches the configured threshold (default 1), all remaining repetitions are skipped via TestCase::markSkippedForRepeatAbort(), which emits a TestSkipped event with a message identifying which repetition caused the abort.

Not every test can be repeated

TestBuilder checks two conditions before wrapping a test in RepeatTestSuite:

  1. Explicit void return type declaration: Tests that return values are used by #[Depends] to pass data between tests. Repeating such a test would produce N potentially different return values, creating ambiguity. Only test methods that are explicitly declared to return void are repeated. A test method without any return type declaration, or with any other return type declaration (including union types such as int|string), is not repeated — even though it may effectively return nothing. This is a deliberate decision: requiring the explicit declaration makes eligibility checkable without running the test.

  2. No dependencies: Tests attributed with #[Depends] are not repeated. They run once, after all repetitions of their dependency have passed.

Tests that fail these checks run exactly once. With --repeat, this happens silently: on a codebase whose test methods do not declare void return types, --repeat will repeat nothing, by design. Adding the missing void declarations is the way to opt such tests into repetition. When the #[Repeat] attribute is used on a test method that fails these checks, a test runner warning is emitted, because the attribute expresses the explicit expectation that this particular method be repeated.

Interaction with data providers

When a test uses #[DataProvider] and is eligible for repetition, each data set gets its own RepeatTestSuite:

DataProviderTestSuite (testFoo)
  RepeatTestSuite [data set #0, 3 repetitions]
    TestCase (testFoo, data set #0, repetition 1 of 3)
    TestCase (testFoo, data set #0, repetition 2 of 3)
    TestCase (testFoo, data set #0, repetition 3 of 3)
  RepeatTestSuite [data set #1, 3 repetitions]
    TestCase (testFoo, data set #1, repetition 1 of 3)
    ...

A failure in one data set's repetitions does not affect other data sets. This provides per-data-set granularity: if data set 0 fails on repetition 2, its remaining repetitions are skipped, but data set 1 still runs all its repetitions independently.

Interaction with dependencies

When test B depends on test A (via #[Depends]):

  • Test A is wrapped in a RepeatTestSuite (if eligible) and runs all N repetitions first.
  • Only if all repetitions of A pass does test B start.
  • Test B itself is not repeated (it has a dependency).
  • If A fails any repetition, B is skipped due to the unsatisfied dependency.

When the dependency uses a data provider, the method is only considered passed once all of its data sets have finished: the result collector defers the decision from the per-data-set RepeatTestSuite to the enclosing DataProviderTestSuite, so a failure in a later data set correctly causes dependent tests to be skipped.

As a related fix that also benefits tests that are not repeated, the result collector now matches failures by class name and method name when deciding whether a test method with a data provider passed. Previously a failure of a same-named method in an unrelated class wrongly blocked dependent tests of a method that had actually passed.

Interaction with process isolation

Repetition works with #[RunInSeparateProcess] and #[RunTestsInSeparateProcesses]. Two pieces of state cross the process boundary:

  • The repetition state (repetition, totalRepetitions) is passed to the child process and replayed on the reconstructed TestCase, alongside provided data and dependency input. Events emitted in the child process therefore carry the correct repetition identity.
  • The child's TestStatus is included in the serialized process result and applied to the parent-side TestCase instance. RepeatTestSuite reads this status to count failures, so the failure threshold takes effect for tests run in process isolation, and the remaining repetitions are skipped after the threshold is reached. A child process that crashes (or whose result cannot be read) is recorded as an error on the parent instance and counts toward the threshold as well.

Event System

The TestMethod value object carries repetition and totalRepetitions properties, populated from the TestCase by TestMethodBuilder::fromTestCase(). These properties affect two methods:

  • id() appends (repetition N of M) when totalRepetitions > 1. This ensures each repetition has a distinct identity in debug output, logging, and result collection.
  • name() appends the same suffix. This appears in failure messages, JUnit XML, and Open Test Reporting output.

Both default to 1, so non-repeated tests are completely unaffected.

The isRepeated() convenience method returns true when totalRepetitions > 1.

The TestDox output reports each repetition with the same (repetition N of M) suffix, and the TeamCity logger emits a locationHint for the test suite of a repeated test method, so IDEs can navigate from the suite node to the test method.

Failure threshold

The failureThreshold parameter (available only via #[Repeat], defaults to 1) controls how many failures may accumulate before the remaining repetitions are skipped:

  • #[Repeat(10)]: Run up to 10 times, skip the remaining repetitions after the first failure (threshold = 1)
  • #[Repeat(10, 3)]: Run up to 10 times, skip the remaining repetitions after 3 failures
  • #[Repeat(10, 3)] with fewer than 3 failures: all 10 repetitions are run

The threshold only controls when repetition stops. It does not change how failures are reported: every failed repetition is reported as a failure, and a single failed repetition is enough for the test run to fail. A repetition group in which no repetition failed is recorded as a passed test for the purposes of #[Depends].

This matches the semantics of JUnit 5's @RepeatedTest annotation, whose failureThreshold parameter likewise causes remaining repetitions to be skipped without changing how failed repetitions are reported.

@sebastianbergmann sebastianbergmann added type/enhancement A new idea that should be implemented feature/test-runner CLI test runner labels Apr 16, 2026
@sebastianbergmann

Copy link
Copy Markdown
Owner Author

@nikophil Would be great to get your feedback on this.

@codecov

codecov Bot commented Apr 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.63%. Comparing base (35251c8) to head (b6252a0).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6591      +/-   ##
============================================
+ Coverage     97.59%   97.63%   +0.03%     
- Complexity     8755     8845      +90     
============================================
  Files           869      873       +4     
  Lines         26751    27110     +359     
============================================
+ Hits          26108    26468     +360     
+ Misses          643      642       -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@jakubostalec

Copy link
Copy Markdown

Hi @sebastianbergmann
Could you add a feature like the one in the bshaffer/phpunit-retry-annotations package? Something like "retry until the test passes.".

Here's an example attribute class that briefly describes its purpose:

#[\Attribute(\Attribute::TARGET_METHOD)]
final class Retry
{
    /**
     * @param int                           $attempts        Max number of attempts (including the first run)
     * @param int                           $delaySeconds    Fixed delay in seconds between retries
     * @param int                           $delayMultiplier When > 0, delay = attempt * multiplier (linear back-off)
     * @param int                           $maxDelaySeconds Cap for multiplier-based delay (0 = no cap)
     * @param class-string<\Throwable>|null $onlyOnException Retry only when this exception type is thrown
     */
    public function __construct(
        public readonly int $attempts = 3,
        public readonly int $delaySeconds = 0,
        public readonly int $delayMultiplier = 0,
        public readonly int $maxDelaySeconds = 0,
        public readonly ?string $onlyOnException = null,
    ) {
    }
}

@sebastianbergmann

Copy link
Copy Markdown
Owner Author

Could you add a feature like the one in the bshaffer/phpunit-retry-annotations package? Something like "retry until the test passes.".

Thank you for your suggestion.

At this time, I am not able to consider suggestions like this. I think that --repeat and #[Repeat] should be implemented together at the same time, but this already is a lot. Maybe too much and we need to see how it goes.

If and when this work is merged, then and only then am I able to think about further additions.

I mean no disrespect, but right now such suggestions are a distraction for me

@stof

stof commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

@sebastianbergmann I think using #[Repeat] on a non eligible test should trigger a PHPUnit warning to make the dev aware that their test that is intended to be always repeated is actually not repeated, as this attribute makes it clear that repetition of this test is intended.

@sebastianbergmann

Copy link
Copy Markdown
Owner Author

I think using #[Repeat] on a non eligible test should trigger a PHPUnit warning

Implemented now.

@github-actions

github-actions Bot commented May 10, 2026

Copy link
Copy Markdown

API Surface Changes

If any of the additions below are not intended as public API, mark them with @internal in the docblock.

New API Surface

Classes

Methods

Modified API Surface

Methods

  • PHPUnit\TextUI\Configuration\Configuration::__construct
    - public function __construct(array $cliArguments, ?string $testFilesFile, ?string $configurationFile, ?string $bootstrap, array $bootstrapForTestSuite, bool $cacheResult, ?string $cacheDirectory, ?string $coverageCacheDirectory, Source $source, string $testResultCacheFile, ?string $coverageClover, ?string $coverageCobertura, ?string $coverageCrap4j, int $coverageCrap4jThreshold, ?string $coverageHtml, int $coverageHtmlLowUpperBound, int $coverageHtmlHighLowerBound, string $coverageHtmlColorSuccessLow, string $coverageHtmlColorSuccessLowDark, string $coverageHtmlColorSuccessMedium, string $coverageHtmlColorSuccessMediumDark, string $coverageHtmlColorSuccessHigh, string $coverageHtmlColorSuccessHighDark, string $coverageHtmlColorSuccessBar, string $coverageHtmlColorSuccessBarDark, string $coverageHtmlColorWarning, string $coverageHtmlColorWarningDark, string $coverageHtmlColorWarningBar, string $coverageHtmlColorWarningBarDark, string $coverageHtmlColorDanger, string $coverageHtmlColorDangerDark, string $coverageHtmlColorDangerBar, string $coverageHtmlColorDangerBarDark, string $coverageHtmlColorBreadcrumbs, string $coverageHtmlColorBreadcrumbsDark, ?string $coverageHtmlCustomCssFile, ?string $coverageOpenClover, ?string $coveragePhp, ?string $coverageText, bool $coverageTextShowUncoveredFiles, bool $coverageTextShowOnlySummary, ?string $coverageXml, bool $coverageXmlIncludeSource, bool $pathCoverage, bool $branchCoverage, bool $ignoreDeprecatedCodeUnitsFromCodeCoverage, bool $disableCodeCoverageIgnore, bool $disableCoverageTargeting, bool $failOnAllIssues, bool $failOnDeprecation, bool $failOnPhpunitDeprecation, bool $failOnPhpunitNotice, bool $failOnPhpunitWarning, bool $failOnEmptyTestSuite, bool $failOnIncomplete, bool $failOnNotice, bool $failOnRisky, bool $failOnSkipped, bool $failOnWarning, bool $doNotFailOnDeprecation, bool $doNotFailOnPhpunitDeprecation, bool $doNotFailOnPhpunitNotice, bool $doNotFailOnPhpunitWarning, bool $doNotFailOnEmptyTestSuite, bool $doNotFailOnIncomplete, bool $doNotFailOnNotice, bool $doNotFailOnRisky, bool $doNotFailOnSkipped, bool $doNotFailOnWarning, int $stopOnDefect, int $stopOnDeprecation, ?string $specificDeprecationToStopOn, int $stopOnError, int $stopOnFailure, int $stopOnIncomplete, int $stopOnNotice, int $stopOnRisky, int $stopOnSkipped, int $stopOnWarning, bool $outputToStandardErrorStream, int $columns, bool $noExtensions, ?string $pharExtensionDirectory, array $extensionBootstrappers, bool $backupGlobals, bool $backupStaticProperties, bool $beStrictAboutChangesToGlobalState, bool $colors, bool $processIsolation, bool $enforceTimeLimit, int $defaultTimeLimit, int $diffContext, int $timeoutForSmallTests, int $timeoutForMediumTests, int $timeoutForLargeTests, bool $reportUselessTests, bool $strictCoverage, bool $requireCoverageContribution, bool $disallowTestOutput, bool $displayDetailsOnAllIssues, bool $displayDetailsOnIncompleteTests, bool $displayDetailsOnSkippedTests, bool $displayDetailsOnTestsThatTriggerDeprecations, bool $displayDetailsOnPhpunitDeprecations, bool $displayDetailsOnPhpunitNotices, bool $displayDetailsOnTestsThatTriggerErrors, bool $displayDetailsOnTestsThatTriggerNotices, bool $displayDetailsOnTestsThatTriggerWarnings, bool $reverseDefectList, bool $requireCoverageMetadata, bool $requireSealedMockObjects, bool $noProgress, bool $noResults, bool $noOutput, int $executionOrder, int $executionOrderDefects, bool $resolveDependencies, ?string $logfileTeamcity, ?string $logfileJunit, ?string $logfileOtr, bool $includeGitInformation, bool $includeGitInformationInOtrLogfile, ?string $logfileTestdoxHtml, ?string $logfileTestdoxText, ?string $logEventsText, ?string $logEventsVerboseText, bool $compactOutput, bool $teamCityOutput, bool $testDoxOutput, bool $testDoxOutputSummary, ?array $testsCovering, ?array $testsUsing, ?array $testsRequiringPhpExtension, ?string $filter, ?string $excludeFilter, ?string $testIdFilterFile, ?string $testIdFilter, array $groups, array $excludeGroups, int $randomOrderSeed, bool $includeUncoveredFiles, TestSuiteCollection $testSuite, string $includeTestSuite, string $excludeTestSuite, ?string $defaultTestSuite, bool $ignoreTestSelectionInXmlConfiguration, array $testSuffixes, Php $php, bool $controlGarbageCollector, int $numberOfTestsBeforeGarbageCollection, ?string $generateBaseline, bool $debug, bool $withTelemetry, int $shortenArraysForExportThreshold)
    + public function __construct(array $cliArguments, ?string $testFilesFile, ?string $configurationFile, ?string $bootstrap, array $bootstrapForTestSuite, bool $cacheResult, ?string $cacheDirectory, ?string $coverageCacheDirectory, Source $source, string $testResultCacheFile, ?string $coverageClover, ?string $coverageCobertura, ?string $coverageCrap4j, int $coverageCrap4jThreshold, ?string $coverageHtml, int $coverageHtmlLowUpperBound, int $coverageHtmlHighLowerBound, string $coverageHtmlColorSuccessLow, string $coverageHtmlColorSuccessLowDark, string $coverageHtmlColorSuccessMedium, string $coverageHtmlColorSuccessMediumDark, string $coverageHtmlColorSuccessHigh, string $coverageHtmlColorSuccessHighDark, string $coverageHtmlColorSuccessBar, string $coverageHtmlColorSuccessBarDark, string $coverageHtmlColorWarning, string $coverageHtmlColorWarningDark, string $coverageHtmlColorWarningBar, string $coverageHtmlColorWarningBarDark, string $coverageHtmlColorDanger, string $coverageHtmlColorDangerDark, string $coverageHtmlColorDangerBar, string $coverageHtmlColorDangerBarDark, string $coverageHtmlColorBreadcrumbs, string $coverageHtmlColorBreadcrumbsDark, ?string $coverageHtmlCustomCssFile, ?string $coverageOpenClover, ?string $coveragePhp, ?string $coverageText, bool $coverageTextShowUncoveredFiles, bool $coverageTextShowOnlySummary, ?string $coverageXml, bool $coverageXmlIncludeSource, bool $pathCoverage, bool $branchCoverage, bool $ignoreDeprecatedCodeUnitsFromCodeCoverage, bool $disableCodeCoverageIgnore, bool $disableCoverageTargeting, bool $failOnAllIssues, bool $failOnDeprecation, bool $failOnPhpunitDeprecation, bool $failOnPhpunitNotice, bool $failOnPhpunitWarning, bool $failOnEmptyTestSuite, bool $failOnIncomplete, bool $failOnNotice, bool $failOnRisky, bool $failOnSkipped, bool $failOnWarning, bool $doNotFailOnDeprecation, bool $doNotFailOnPhpunitDeprecation, bool $doNotFailOnPhpunitNotice, bool $doNotFailOnPhpunitWarning, bool $doNotFailOnEmptyTestSuite, bool $doNotFailOnIncomplete, bool $doNotFailOnNotice, bool $doNotFailOnRisky, bool $doNotFailOnSkipped, bool $doNotFailOnWarning, int $stopOnDefect, int $stopOnDeprecation, ?string $specificDeprecationToStopOn, int $stopOnError, int $stopOnFailure, int $stopOnIncomplete, int $stopOnNotice, int $stopOnRisky, int $stopOnSkipped, int $stopOnWarning, bool $outputToStandardErrorStream, int $columns, bool $noExtensions, ?string $pharExtensionDirectory, array $extensionBootstrappers, bool $backupGlobals, bool $backupStaticProperties, bool $beStrictAboutChangesToGlobalState, bool $colors, bool $processIsolation, bool $enforceTimeLimit, int $defaultTimeLimit, int $diffContext, int $timeoutForSmallTests, int $timeoutForMediumTests, int $timeoutForLargeTests, bool $reportUselessTests, bool $strictCoverage, bool $requireCoverageContribution, bool $disallowTestOutput, bool $displayDetailsOnAllIssues, bool $displayDetailsOnIncompleteTests, bool $displayDetailsOnSkippedTests, bool $displayDetailsOnTestsThatTriggerDeprecations, bool $displayDetailsOnPhpunitDeprecations, bool $displayDetailsOnPhpunitNotices, bool $displayDetailsOnTestsThatTriggerErrors, bool $displayDetailsOnTestsThatTriggerNotices, bool $displayDetailsOnTestsThatTriggerWarnings, bool $reverseDefectList, bool $requireCoverageMetadata, bool $requireSealedMockObjects, bool $noProgress, bool $noResults, bool $noOutput, int $executionOrder, int $executionOrderDefects, bool $resolveDependencies, ?string $logfileTeamcity, ?string $logfileJunit, ?string $logfileOtr, bool $includeGitInformation, bool $includeGitInformationInOtrLogfile, ?string $logfileTestdoxHtml, ?string $logfileTestdoxText, ?string $logEventsText, ?string $logEventsVerboseText, bool $compactOutput, bool $teamCityOutput, bool $testDoxOutput, bool $testDoxOutputSummary, ?array $testsCovering, ?array $testsUsing, ?array $testsRequiringPhpExtension, ?string $filter, ?string $excludeFilter, ?string $testIdFilterFile, ?string $testIdFilter, array $groups, array $excludeGroups, int $randomOrderSeed, int $repeat, bool $includeUncoveredFiles, TestSuiteCollection $testSuite, string $includeTestSuite, string $excludeTestSuite, ?string $defaultTestSuite, bool $ignoreTestSelectionInXmlConfiguration, array $testSuffixes, Php $php, bool $controlGarbageCollector, int $numberOfTestsBeforeGarbageCollection, ?string $generateBaseline, bool $debug, bool $withTelemetry, int $shortenArraysForExportThreshold)

@sebastianbergmann sebastianbergmann force-pushed the issue-5718/repeat branch 5 times, most recently from 199afc4 to 4bb8e05 Compare May 19, 2026 13:49
@sebastianbergmann sebastianbergmann force-pushed the issue-5718/repeat branch 2 times, most recently from 5b60cb9 to 8ab2d8f Compare June 5, 2026 06:13
@sebastianbergmann sebastianbergmann force-pushed the issue-5718/repeat branch 2 times, most recently from 3707f99 to d7e2aa3 Compare June 12, 2026 08:20
@sebastianbergmann sebastianbergmann marked this pull request as ready for review June 12, 2026 08:21
@sebastianbergmann

Copy link
Copy Markdown
Owner Author

Could you add a feature like the one in the bshaffer/phpunit-retry-annotations package? Something like "retry until the test passes.".

I started to work on this in #6742.

Previously, RepeatTestSuite was declared as a leaf Test even though it structurally represents a group of test cases (the N repetitions of a test method). This produced two inconsistencies:

* TestSuite::addTest() registered the suite under the first repetition's id (Class::method (repetition 1 of N)) rather than the group id, so other repetitions appeared as no members of the group.

* Event\TestSuite\TestSuiteBuilder::process() treated the suite as a leaf and surfaced only tests[0]->valueObjectForEvents() into the parent's TestCollection. The collection therefore had one entry per repetition group while count() reported N: the two disagreed.

Both issues are removed by treating RepeatTestSuite the same way DataProviderTestSuite is treated:

* RepeatTestSuite now extends Framework\TestSuite and is constructed via RepeatTestSuite::fromTests($name, $tests, $failureThreshold). It overrides run() to retain the failure-threshold/abort semantics and delegates provides(), requires(), sortId(), and setDependencies() to its first child / all children.

* A dedicated event-level value object TestSuiteForRepeatedTestMethod is introduced alongside TestSuiteForTestMethodWithDataProvider. It exposes className(), methodName(), file(), line(), and an isForRepeatedTestMethod() predicate on the base Event\TestSuite\TestSuite.

* Event\TestSuite\TestSuiteBuilder::from() detects RepeatTestSuite and returns the new value object; process() recurses through it like any other framework TestSuite, so each repetition's TestMethod event value object now appears in the parent's TestCollection.

* The special-case branches in Framework\TestSuite::addTest() and Runner\Filter\NameFilterIterator::accept() are removed. The inherited instanceof self branch handles registration; the existing TestSuite branch in the filter recurses into children, which then match individually.

* Runner\TestResult\Collector::testSuiteFinished() learns about the new value object and, when no repetition of the method failed, records the method as passed via PassedTests::testMethodPassed(), mirroring the data-provider handler.

As a consequence of RepeatTestSuite being a real TestSuite, Test Suite Started / Test Suite Finished events are now emitted around each repetition group. The JUnit XML logger correspondingly produces a nested <testsuite> element per repeated method, matching how it already renders data-provider suites.
…ders

When a test method uses both a data provider and repetition, each data set gets its own RepeatTestSuite. Collector::testSuiteFinished() registered the test method in PassedTests as soon as the first data set's RepeatTestSuite finished without failures, before the remaining data sets had run. Since PassedTests has no retraction mechanism, a failure in a later data set could not undo the registration, and a test depending on the method via #[Depends] ran even though its dependency had failed.

The event value object TestSuiteForRepeatedTestMethod now knows whether it represents the repetitions of a single data set (isForDataSet()), derived in Event\TestSuite\TestSuiteBuilder from the "#" separator it already parses. For such a suite, the Collector no longer registers the test method as passed and instead leaves the decision to the enclosing data provider test suite's finished event, which fires only after all data sets have run and which already performed this registration before repetition support was introduced. Repeated test methods without a data provider are still registered when their RepeatTestSuite finishes, as that is the point at which all repetitions have run.

The registration logic, previously duplicated between the data provider and repeated test method branches, is extracted into a shared helper method.
…od in another class

When deciding whether a test method that uses a data provider (or, since the introduction of repetition support, a repeated test method) should be registered as passed, Collector compared failure events by method name only. A failure of a same-named method in an unrelated class therefore prevented the registration, and tests depending on the method that had actually passed were wrongly skipped.

Failure events are now matched on class name and method name, aligning the check with the Class::method granularity that PassedTests has always used for registration. This cannot cause a failed method to be registered as passed: a method's own failure still matches both comparisons.
…s run in process isolation

The child process reconstructs the TestCase from the class and method name and replays per-instance state such as provided data and dependency input, but not the repetition state set by TestBuilder via setRepetition(). The TestMethod value objects for all events emitted in the child process are built from that reconstructed instance, so they reported repetition 1 of 1 regardless of the actual repetition.

As a consequence, all repetitions of a test using process isolation had the identical test id and name: the "(repetition N of M)" suffix was missing from debug output and from the JUnit XML and TeamCity loggers, and result collections keyed by test id merged the issues of all repetitions into a single entry.

The repetition and the total number of repetitions are now passed to the child process template and replayed alongside the other test case state. For tests that are not repeated this sets the default values and changes nothing.
…works for repeated tests run in process isolation

ChildProcessResultProcessor applied the child's test result and assertion count to the parent's TestCase instance, but not the child's TestStatus. The parent-side status therefore remained "unknown" for tests run in process isolation.

RepeatTestSuite::run() reads that status to count failures and errors against the failure threshold. Because the status was never populated, the threshold was never reached for repeated tests run in process isolation: all repetitions were executed even after a failure, instead of the remaining repetitions being skipped.

The child process now includes its TestStatus in the serialized result, and ChildProcessResultProcessor applies it to the parent instance via a new internal TestCase::setStatus() method. The processor's error paths (output on stderr, a tampered result file, an unparseable result) now also set an error status, so a crashing child process counts toward the failure threshold as well.
RepeatTestSuite::run() iterated over its tests in place and kept all TestCase instances referenced until the entire test run had finished. For the stress-testing use case that repetition is intended for, this meant memory usage grew linearly with the number of repetitions: with --repeat 200 and a test retaining a one-megabyte payload per instance, peak memory was 218 MB.

run() now follows the same pattern as TestSuite::run(): the tests are collected from the iterator, the suite's own references are released, and each instance is dropped as soon as its repetition has finished. The same scenario now peaks at 20 MB. The guard against running a suite twice, previously lost in the override, is also restored.
The test suite sorter recursed into RepeatTestSuite like into any other test suite and reordered the repetitions it holds. With --order-by random the repetitions ran shuffled; with --order-by reverse they ran in descending order. Both made the message emitted when the remaining repetitions are skipped after a failure refer to a repetition number that ran before lower-numbered repetitions that were then skipped.

RepeatTestSuite is now treated as an atomic unit by the sorter: it is still reordered among its siblings, but its repetitions always run in ascending order. Ordering defects first is unaffected because the sort id of a RepeatTestSuite is the sort id of its repetitions, which the enclosing test suite registers itself.
The TestDox name prettifier produced the same string for every repetition of a repeated test, so the TestDox output listed N identical lines per repeated test method and it was impossible to tell which repetition had failed.

The prettified name now carries a "(repetition N of M)" suffix, mirroring how data sets are appended for test methods that use data providers. The memoization key includes the repetition number, as the cache would otherwise return the first repetition's string for all of them. Unlike the data set suffix, the repetition suffix is also appended to names customized with #[TestDox], because a custom text has no placeholder for the repetition and the lines would otherwise remain indistinguishable.
The testSuiteStarted message for a repeated test method's suite carried no locationHint, unlike the suites for test classes and for test methods that use data providers, so IDEs could not navigate from the suite node to the test method.

The location hint is built from the class and method name rather than from the suite name, because the suite name of a repetition group that represents a single data set carries a "#dataSetName" suffix that does not belong in a php_qn:// URL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature/test-runner CLI test runner type/enhancement A new idea that should be implemented

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants