Skip to content

python: Handle paths through pathlib #541

Merged
mpkorstanje merged 4 commits intomainfrom
debt/py-paths
Feb 16, 2026
Merged

python: Handle paths through pathlib #541
mpkorstanje merged 4 commits intomainfrom
debt/py-paths

Conversation

@mpkorstanje
Copy link
Contributor

@mpkorstanje mpkorstanje commented Feb 10, 2026

🤔 What's changed?

The Python implementation now uses Path to represent paths to avoid ambiguity.

⚡️ What's your motivation?

An anonymous report noted that the TokenScanner included hidden file reading behavior. Relevant parts of the email are reproduced below.

Hidden File-Reading behavior in Parser.parse() leading to LFI
....
The vulnerability stems from the interaction between Parser.parse and TokenScanner's design:

  1. Misleading API in Parser: In Parser, the parse method's parameter is named 'token_scanner_or_str'. This naming suggests the method accepts either a TokenScanner object or a Gherkin source string. However, it implicitly supports file paths because it passes the string directly to the TokenScanner constructor:

    token_scanner = (
    TokenScanner(token_scanner_or_str)
    if isinstance(token_scanner_or_str, str)
    else token_scanner_or_str
    )

  2. Hidden Path Handling in TokenScanner: While TokenScanner is explicitly designed to handle both paths and strings (as seen in its 'path_or_str' parameter), this dual-purpose behavior is hidden when accessed via Parser.parse(). The TokenScanner performs an os.path.exists() check to determine the input type, leading to unexpected File I/O when the developer intends to parse a source string:

    def __init__(self, path_or_str: str) -> None:
    if os.path.exists(path_or_str):
    self.io = open(path_or_str, encoding="utf8")
    else:
    self.io = io.StringIO(path_or_str)
    self.line_number = 0

Impact: Sensitive Information Disclosure via CompositeParserException
The danger is that the Parser.parse() API masks its file-reading capability. A developer passing a string to this method would expect it to be parsed as a Gherkin source, but if the string matches a file path on the server, the library silently switches to reading that file.

Example Scenario & Proof of Concept:
To reproduce the issue:

  1. Create a file named ".env" in the working directory with some content (e.g., "SECRET_KEY=12345").
  2. Run the following Python code:
from gherkin import Parser
   parser = Parser()
   try:
       parser.parse(".env")
   except Exception as e:
       print(e)

Observation:
Instead of parsing the string ".env", the library opens and reads the actual ".env" file. Since the file content is not valid Gherkin, it raises a CompositeParserException. Crucially, the error messages within this exception contain the actual content of the lines from the .env file, allowing an attacker to read sensitive information.

....

Note: This vulnerability was discovered manually by me during development. I have used an AI assistant solely for drafting and formatting the technical report.

🏷️ What kind of change is this?

  • 💥 Breaking change (incompatible changes to the API)

📋 Checklist:

  • I agree to respect and uphold the Cucumber Community Code of Conduct
  • I've changed the behaviour of the code
    • I have added/updated tests to cover my changes.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.
  • Users should know about my change
    • I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

@mpkorstanje mpkorstanje mentioned this pull request Feb 10, 2026
7 tasks
@mpkorstanje mpkorstanje requested a review from youtux February 10, 2026 18:23
Co-authored-by: Kieran Ryan <kierankilkenny@gmail.com>
@mpkorstanje mpkorstanje merged commit ce2a34c into main Feb 16, 2026
6 checks passed
@mpkorstanje mpkorstanje deleted the debt/py-paths branch February 16, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants