Skip to content

fix(client): validate pathspec format for Flow/Run/Step/Task/DataArti…#3085

Open
ragurajakrishnan15 wants to merge 7 commits intoNetflix:masterfrom
ragurajakrishnan15:fix/client-pathspec-validation-948
Open

fix(client): validate pathspec format for Flow/Run/Step/Task/DataArti…#3085
ragurajakrishnan15 wants to merge 7 commits intoNetflix:masterfrom
ragurajakrishnan15:fix/client-pathspec-validation-948

Conversation

@ragurajakrishnan15
Copy link
Copy Markdown

fix(client): validate pathspec format for Client API (#948)

Made-with: Cursor

PR Type

  • Bug fix
  • New feature
  • Core Runtime change (higher bar -- see CONTRIBUTING.md)
  • Docs / tooling
  • Refactoring

Summary

Client constructors (Flow, Run, Step, Task, DataArtifact) now reject malformed pathspecs up front: missing/empty pathspecs, empty segments (//, leading /), wrong number of /-separated parts, and misleading trailing slashes (e.g. a path that normalises to a Step-shaped string passed to Run). Users get MetaflowInvalidPathspec with a clear message instead of partially constructed objects or confusing follow-on errors.

Issue

Fixes #948

Related: #3047 (overlapping community PR; happy to consolidate or drop this if maintainers prefer a single implementation).

Reproduction

Runtime: Client API only (no flow execution; local Python).

Commands to run:

cd test/unit
PYTHONPATH=../.. python -m pytest test_pathspec_validation.py -v

On environments where import metaflow fails during test collection (e.g. some Windows setups without fcntl), run the same command on Linux or rely on CI.

Where evidence shows up: Python exception type and message in the REPL or test output.

Before (error / log snippet)
# Example from #948: too few components for Task — confusing errors or odd object state.
Task("MyFlow/1234/my_step")  # 3 parts; Task needs 4

# Trailing slash could change segment counts incorrectly for Run.
Run("HelloFlow/1/start/")
After (evidence that fix works)
from metaflow import Task, Run
from metaflow.exception import MetaflowInvalidPathspec

Task("MyFlow/1234/my_step")  # MetaflowInvalidPathspec
Run("HelloFlow/1/start/")    # MetaflowInvalidPathspec

Valid-shaped pathspecs still raise MetaflowNotFound when no metadata exists — unchanged; tests assert MetaflowInvalidPathspec is not raised for valid shapes.

Root Cause

Pathspec strings were not fully validated before metadata lookup. Wrong segment counts (and slash edge cases) could lead to inconsistent client objects or confusing errors instead of an immediate, explicit MetaflowInvalidPathspec.

Why This Fix Is Correct

Validation runs once, before _get_object(), using the same internal object kinds (flow, run, step, task, artifact) as the Client API. Trailing slashes are normalised with rstrip('/') so Name and Name/ behave the same; empty interior segments are rejected. Stored _pathspec matches the normalised form.

Failure Modes Considered

  1. Overly strict IDs: We do not validate run or task ID format (numeric vs string); only structure and non-empty segments, so backend-specific IDs stay valid.
  2. Internal construction: Objects built with _object=... and without a user pathspec skip pathspec validation; only the user-facing pathspec + _object is None path is validated. Pickle/__setstate__ paths still pass explicit pathspecs through the same constructor logic as before.

Tests

  • Unit tests added/updated
  • Reproduction script provided (required for Core Runtime)
  • CI passes
  • If tests are impractical: explain why below and provide manual evidence above

Added: test/unit/test_pathspec_validation.py

Non-Goals

  • No changes to metadata providers, runtime, or orchestrator plugins.
  • No change to what counts as a valid run/task ID beyond slash-separated structure.

AI Tool Usage

  • No AI tools were used in this contribution
  • AI tools were used (describe below)

Tool: Cursor (AI-assisted editing and review).

Use: Implementation outline, test cases, and PR description wording.

Review: I read the changed code in metaflow/client/core.py, reasoned through edge cases (trailing slash, //, wrong counts), and aligned tests with that behaviour. Confirm pytest on Linux or via CI where metaflow imports cleanly.


…fact (Netflix#948)

- Add _validate_pathspec() with empty-segment checks and part counts
- Normalise trailing slashes via rstrip('/') before split
- Raise MetaflowInvalidPathspec when pathspec is missing for user construction
- Add unit tests aligned with PR Netflix#3047 plus Run step-like trailing-slash case

Made-with: Cursor
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 7, 2026

Greptile Summary

This PR adds upfront pathspec validation (_validate_pathspec) to the five Client API constructors and a comprehensive test suite for the new behaviour. The intent is sound, but metaflow/client/core.py currently has a blocking defect that prevents the module from being imported at all — see the open inline thread on the file for the full diagnosis and a corrected function body. The test file and .gitattributes change are both clean.

Confidence Score: 1/5

Not safe to merge — the module fails to import due to a SyntaxError in _validate_pathspec.

A stray dangling expression and undeclared variables (fmt, expected_parts) in _validate_pathspec cause a SyntaxError at module load time, making import metaflow fail entirely. This is a P0 blocker that must be resolved before this PR can be merged. The test file and .gitattributes change are clean; the fix itself is conceptually correct and well-tested.

metaflow/client/core.py — the _validate_pathspec function body requires a complete rewrite to resolve the SyntaxError, undefined variable references, and duplicate dead code (see open inline thread).

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename Overview
metaflow/client/core.py Adds _validate_pathspec and refactors __init__, but the function body contains a stray dangling % (...) fragment (lines 99–100) and uses undeclared fmt/expected_parts, causing a SyntaxError that makes the module unimportable.
test/unit/test_pathspec_validation.py Well-structured test suite covering None, empty, wrong-part-count, empty-segment, and trailing-slash cases for all five client types; helper functions are clean and tests are clearly annotated.
.gitattributes New file enforcing LF line endings for CI shell scripts; straightforward and correct.

Reviews (5): Last reviewed commit: "Merge branch 'fix/client-pathspec-valida..." | Re-trigger Greptile

Comment thread metaflow/client/core.py Outdated
Comment thread metaflow/client/core.py Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 7, 2026

Tip:

Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.

Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

ragurajakrishnan15 and others added 3 commits April 7, 2026 18:44
Windows checkouts with core.autocrlf could leave CRLF in .sh files and fail
local shellcheck in pre-commit. eol=lf keeps scripts Unix-safe.

Made-with: Cursor
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Comment thread metaflow/client/core.py
Comment on lines +94 to +106
if pathspec is None or pathspec == "" or pathspec.strip("/") == "":
raise MetaflowInvalidPathspec(
"pathspec for %s cannot be empty or None; expected %s(%s)"
% (class_name, class_name, fmt)
)
% (class_name, class_name, fmt)
)

if pathspec == "" or pathspec.strip("/") == "":
raise MetaflowInvalidPathspec(
"pathspec for %s cannot be empty; expected %s(%s)"
% (class_name, class_name, fmt)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 SyntaxError + NameError — module is unimportable

Lines 99–100 are stray tokens left over from an incomplete edit. After the raise statement closes on line 98, the % (class_name, class_name, fmt) / ) pair on lines 99–100 is not a valid Python statement and causes an IndentationError/SyntaxError at import time — import metaflow fails entirely. Additionally, fmt and expected_parts are used throughout the function but are never assigned; they need to be unpacked from _PATHSPEC_FORMATS[obj_name]. The duplicate empty-check on lines 102–106 is also dead code once the first guard is present.

The corrected function body should be:

def _validate_pathspec(pathspec: Optional[str], obj_name: str) -> None:
    if obj_name not in _PATHSPEC_FORMATS:
        return

    expected_parts, fmt = _PATHSPEC_FORMATS[obj_name]
    class_name = _CLASS_DISPLAY_NAMES.get(obj_name, obj_name)

    if pathspec is None or pathspec == "" or pathspec.strip("/") == "":
        raise MetaflowInvalidPathspec(
            "pathspec for %s cannot be empty or None; expected %s(%s)"
            % (class_name, class_name, fmt)
        )

    normalized = pathspec.rstrip("/")
    parts = normalized.split("/")

    if any(p == "" for p in parts):
        raise MetaflowInvalidPathspec(
            "Invalid pathspec '%s' for %s: pathspec contains empty segments. "
            "Expected %s(%s)." % (pathspec, class_name, class_name, fmt)
        )

    if len(parts) != expected_parts:
        raise MetaflowInvalidPathspec(
            "Invalid pathspec '%s' for %s: expected %s(%s) (%d part%s), got %d part%s."
            % (
                pathspec,
                class_name,
                class_name,
                fmt,
                expected_parts,
                "" if expected_parts == 1 else "s",
                len(parts),
                "" if len(parts) == 1 else "s",
            )
        )

Addresses review feedback: None pathspec now always goes through
_validate_pathspec for typed objects (full error with expected format).
The explicit None check after validation only applies when _NAME is not
in _PATHSPEC_FORMATS (e.g. base MetaflowObject).

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pathspec to create Flow/Run/Step/Task/DataArtifact is not validated

1 participant