Skip to content

Handle leading combinator selectors (>, +, ~) in SelectorConverter#119

Open
eyupcanakman wants to merge 1 commit intovoku:masterfrom
eyupcanakman:fix/leading-combinator-selectors
Open

Handle leading combinator selectors (>, +, ~) in SelectorConverter#119
eyupcanakman wants to merge 1 commit intovoku:masterfrom
eyupcanakman:fix/leading-combinator-selectors

Conversation

@eyupcanakman
Copy link

@eyupcanakman eyupcanakman commented Feb 14, 2026

Fixes #102

Problem

Selectors starting with a combinator like > span throw an error when used with element->find():

Expected selector, but " at 0> found.

This happens because Symfony's CssSelector doesn't support standalone combinators — they're only valid in a compound expression like div > span.

Solution

Intercept selectors starting with >, +, or ~ in SelectorConverter::toXPath() before they reach Symfony. The combinator is stripped, the rest is converted normally via Symfony, then the descendant-or-self:: axis prefix is replaced with the correct XPath axis:

Combinator XPath replacement Meaning
> /*/ Direct children only
+ /*/following-sibling::*[1]/self:: Immediately adjacent sibling
~ /*/following-sibling:: All following siblings

Tests

Added SelectorConverterChildCombinatorTest.php with 10 tests covering:

  • Unit tests for XPath output of each combinator type
  • Nested combinators (> p > span)
  • Combinator with class selector (> span.highlight)
  • Empty combinator error handling
  • Integration tests using element->find() matching the original issue's reproduction case
  • Verification that > span only matches direct children, not nested descendants

This change is Reviewable

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for CSS selectors with child (>), adjacent sibling (+), and general sibling (~) combinators in selector conversion to XPath.
  • Tests

    • Added comprehensive test suite covering child combinators, nested selectors, class-based selectors, adjacent and general sibling selectors, and integration scenarios.

Selectors like "> span" now work when called from element->find(),
fixing issue voku#102. The combinator is intercepted before reaching
Symfony's CssSelector and converted to the correct XPath axis.
@gemini-code-assist
Copy link

Summary of Changes

Hello @eyupcanakman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where CSS selectors starting with combinators (e.g., "> span") failed when used with element->find() due to limitations in Symfony's CssSelector component. The solution involves intercepting these selectors, stripping the leading combinator, and then dynamically constructing the correct XPath by replacing the default descendant-or-self:: axis with the appropriate child or sibling axis, thereby extending the functionality and robustness of the SelectorConverter.

Highlights

  • Enhanced CSS Selector Parsing: The SelectorConverter now correctly interprets CSS selectors that begin with combinators like ">" (child), "+" (adjacent sibling), and "~" (general sibling), resolving previous errors encountered with element->find().
  • Custom XPath Conversion Logic: A pre-processing step was introduced to strip the leading combinator, convert the remainder of the selector to XPath, and then replace the default descendant-or-self:: axis with the specific XPath equivalent for direct children, adjacent siblings, or general siblings.
  • Robust Error Handling: A RuntimeException is now explicitly thrown if a selector string incorrectly ends with a combinator, preventing ambiguous or invalid parsing.
  • Comprehensive Test Coverage: A new test file, SelectorConverterChildCombinatorTest.php, was added, featuring 10 unit and integration tests to ensure the accurate conversion and functionality of all supported leading combinators, including nested scenarios and direct child matching.
Changelog
  • src/voku/helper/SelectorConverter.php
    • Implemented pre-processing logic for CSS selectors starting with ">" (child), "+" (adjacent sibling), or "~" (general sibling) combinators.
    • Added a RuntimeException for selectors ending with a combinator.
    • Modified XPath generation to correctly translate these leading combinators into their respective XPath axes.
  • tests/SelectorConverterChildCombinatorTest.php
    • Added a new test file SelectorConverterChildCombinatorTest.php.
    • Included unit tests for toXPath() covering child, adjacent, and general sibling combinators.
    • Added integration tests for element->find() to verify correct matching behavior, including direct children and the original issue reproduction.
Activity
  • Opened to fix issue usage of near child (> operator) not working #102, which reported errors when using CSS selectors with leading combinators like "> span".
  • Implemented a new parsing mechanism in SelectorConverter to correctly handle and convert these leading combinators into valid XPath expressions.
  • Introduced SelectorConverterChildCombinatorTest.php with 10 new tests, covering various scenarios for child and sibling combinators, including unit and integration tests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@sonarqubecloud
Copy link

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request aims to address the issue of leading combinators in CSS selectors by intercepting them before they reach the Symfony CssSelector component. However, the current implementation introduces a potential Denial of Service (DoS) vulnerability due to uncontrolled recursion and risks memory exhaustion from an unbounded static cache for compiled selectors. Additionally, the use of str_replace on the entire XPath string can lead to incorrect results for nested selectors and fails to handle special selectors like text or comment.

throw new \RuntimeException('Selector cannot end with a combinator: ' . $selector);
}

$restXPath = self::toXPath($restSelector, $ignoreCssSelectorErrors, $isForHtml);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The toXPath method recursively calls itself when a CSS selector starts with a combinator (>, +, or ~). An attacker can provide a maliciously crafted selector with a large number of leading combinators (e.g., >>>>...) to trigger a stack overflow or exceed the PHP recursion limit. This will cause the PHP process to crash or terminate abruptly, leading to a Denial of Service (DoS). To remediate this, consider using a loop to strip leading combinators instead of recursion, or implement a maximum recursion depth check.

Comment on lines +61 to +73
switch ($combinator) {
case '>':
$xPathQuery = \str_replace('descendant-or-self::', '/*/', $restXPath);
break;
case '+':
$xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::*[1]/self::', $restXPath);
break;
case '~':
$xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::', $restXPath);
break;
default:
throw new \RuntimeException('Unexpected combinator: ' . $combinator);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of str_replace to modify the XPath axis is problematic for two reasons:

  1. Nested Selectors: It replaces all occurrences of descendant-or-self::. If the rest of the selector contains nested descendant relationships (e.g., > div span), Symfony might produce an XPath with multiple descendant-or-self:: (or similar) axes. Replacing all of them would incorrectly turn all descendants into direct children throughout the entire chain (e.g., turning > div span into /*/div/*/span).
  2. Special Selectors: It does not handle the // prefix returned by the special text and comment selectors (lines 36-43). For example, > text would result in //text() instead of /*/text(), causing it to find all descendant text nodes instead of just direct children.

It is safer to only replace the leading axis prefix of the restXPath string.

            switch ($combinator) {
                case '>':
                    $replacement = '/*/';
                    break;
                case '+':
                    $replacement = '/*/following-sibling::*[1]/self::';
                    break;
                case '~':
                    $replacement = '/*/following-sibling::';
                    break;
                default:
                    throw new \RuntimeException('Unexpected combinator: ' . $combinator);
            }

            if (\strpos($restXPath, 'descendant-or-self::') === 0) {
                $xPathQuery = $replacement . \substr($restXPath, 20);
            } elseif (\strpos($restXPath, '//') === 0) {
                $xPathQuery = $replacement . \substr($restXPath, 2);
            } else {
                $xPathQuery = $replacement . $restXPath;
            }

throw new \RuntimeException('Unexpected combinator: ' . $combinator);
}

self::$compiled[$selector] = $xPathQuery;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The results of XPath conversions are stored in a static array self::$compiled without any size limit or eviction policy. An attacker can send a large number of unique CSS selectors to the application, causing the cache to grow indefinitely until the server's memory is exhausted. This leads to a Denial of Service (DoS) as the PHP process will be terminated by the OOM killer or hit the memory_limit. This is particularly relevant in long-running PHP environments (e.g., Swoole, RoadRunner, or long-lived CLI workers). To remediate this, implement a maximum size for the cache with an eviction policy like LRU (Least Recently Used).

@coderabbitai
Copy link

coderabbitai bot commented Feb 14, 2026

Walkthrough

The changes add support for CSS selectors beginning with combinators (>, +, ~) to SelectorConverter.php, enabling direct child, adjacent sibling, and general sibling XPath conversions. A comprehensive test suite validates this functionality across various selector patterns and integration scenarios.

Changes

Cohort / File(s) Summary
Combinator Selector Support
src/voku/helper/SelectorConverter.php
Adds detection and handling of leading CSS combinators (>, +, ~) in selectors. Trims input, identifies the combinator type, converts the remaining selector to XPath recursively, applies combinator-specific XPath transformations, and caches results. Throws RuntimeException if content after combinator is empty.
Combinator Conversion Tests
tests/SelectorConverterChildCombinatorTest.php
New comprehensive test suite with 10 test methods covering child combinator (>), adjacent sibling (+), and general sibling (~) selectors. Includes unit tests for XPath generation, error handling for invalid selectors, and integration tests with HtmlDomParser to verify end-to-end functionality. Uses reflection to clear internal cache between tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppy CSS combinators now hop and skip,
Direct children, siblings in tight grip,
Leading >, +, ~ no longer fail,
XPath translations spin the tale,
Tests ensure each selector will prevail! 🌿✨

🚥 Pre-merge checks | ✅ 4 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ⚠️ Unable to check for merge conflicts: Stream setup permanently failed: Sandbox setup timed out after 180000ms
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: handling leading combinator selectors (>, +, ~) in SelectorConverter, which is the primary focus of this PR.
Linked Issues check ✅ Passed The PR fully addresses issue #102 by implementing support for leading combinator selectors (>, +, ~) in SelectorConverter::toXPath(), enabling element->find('>span') and similar queries to work as expected.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing leading combinator selector support: modifications to SelectorConverter.php and comprehensive test coverage in SelectorConverterChildCombinatorTest.php.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch fix/leading-combinator-selectors
  • Post resolved changes as copyable diffs in a comment

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee96621bdf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


switch ($combinator) {
case '>':
$xPathQuery = \str_replace('descendant-or-self::', '/*/', $restXPath);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace only leading scope axis for combinator rewrites

The combinator branch rewrites XPath with a global str_replace, which also changes descendant axes inside the remainder selector instead of only the implicit scope prefix. For selectors like > div span, Symfony emits an XPath containing an inner /descendant-or-self::*/ for the descendant combinator, and this code rewrites that too, effectively turning descendant matching into child-only traversal and missing valid results. The rewrite should be limited to the leading descendant-or-self:: segment only.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/voku/helper/SelectorConverter.php`:
- Around line 61-73: The current switch in SelectorConverter.php replaces all
occurrences of 'descendant-or-self::' in $restXPath using str_replace, which
breaks grouped selectors; change each case handling the $combinator ('>', '+',
'~') to replace only the first occurrence in $restXPath (e.g., use preg_replace
with a limit of 1 or a function that replaces the first match) when assigning
$xPathQuery so only the leading axis is transformed and grouped selectors
(pipes) remain correct; update the cases for '>', '+', and '~' that currently
operate on $restXPath/$xPathQuery and keep the default RuntimeException
unchanged.
🧹 Nitpick comments (2)
src/voku/helper/SelectorConverter.php (1)

50-51: Minor: $selector is already used as cache key on line 31, but ltrim is applied again on line 50.

If the caller passes a selector with leading whitespace (e.g., " > span"), it will bypass the early cache check for the trimmed equivalent "> span" if that was cached previously. This is consistent with pre-existing behavior (the // check on line 45 and text/comment checks also use the original string), but worth noting for awareness — not blocking.

tests/SelectorConverterChildCombinatorTest.php (1)

52-64: Missing integration tests for + and ~ combinators.

The adjacent (+) and general (~) sibling combinator tests only assert XPath string content but don't verify actual DOM querying behavior. Given the XPath correctness concern flagged in the source file, adding integration tests (similar to testIssue102Reproduction) for these combinators would help catch regressions.

Comment on lines +61 to +73
switch ($combinator) {
case '>':
$xPathQuery = \str_replace('descendant-or-self::', '/*/', $restXPath);
break;
case '+':
$xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::*[1]/self::', $restXPath);
break;
case '~':
$xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::', $restXPath);
break;
default:
throw new \RuntimeException('Unexpected combinator: ' . $combinator);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

str_replace replaces all occurrences of descendant-or-self:: — breaks grouped selectors.

If restSelector is a grouped selector (e.g., > .a, .b), Symfony produces XPath with multiple descendant-or-self:: axes (one per group, joined by |). str_replace would replace all of them, incorrectly applying the combinator axis to every group member.

Consider using preg_replace with a limit of 1 to replace only the leading axis:

Proposed fix
-            switch ($combinator) {
-                case '>':
-                    $xPathQuery = \str_replace('descendant-or-self::', '/*/', $restXPath);
-                    break;
-                case '+':
-                    $xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::*[1]/self::', $restXPath);
-                    break;
-                case '~':
-                    $xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::', $restXPath);
-                    break;
-                default:
-                    throw new \RuntimeException('Unexpected combinator: ' . $combinator);
-            }
+            switch ($combinator) {
+                case '>':
+                    $replacement = '/*/';
+                    break;
+                case '+':
+                    $replacement = '/*/following-sibling::*[1]/self::';
+                    break;
+                case '~':
+                    $replacement = '/*/following-sibling::';
+                    break;
+                default:
+                    throw new \RuntimeException('Unexpected combinator: ' . $combinator);
+            }
+            $xPathQuery = \preg_replace('/descendant-or-self::/', $replacement, $restXPath, 1);
🤖 Prompt for AI Agents
In `@src/voku/helper/SelectorConverter.php` around lines 61 - 73, The current
switch in SelectorConverter.php replaces all occurrences of
'descendant-or-self::' in $restXPath using str_replace, which breaks grouped
selectors; change each case handling the $combinator ('>', '+', '~') to replace
only the first occurrence in $restXPath (e.g., use preg_replace with a limit of
1 or a function that replaces the first match) when assigning $xPathQuery so
only the leading axis is transformed and grouped selectors (pipes) remain
correct; update the cases for '>', '+', and '~' that currently operate on
$restXPath/$xPathQuery and keep the default RuntimeException unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

usage of near child (> operator) not working

1 participant

Comments