fix(client): validate pathspec format for Flow/Run/Step/Task/DataArti…#3085
fix(client): validate pathspec format for Flow/Run/Step/Task/DataArti…#3085ragurajakrishnan15 wants to merge 7 commits intoNetflix:masterfrom
Conversation
…fact (Netflix#948) - Add _validate_pathspec() with empty-segment checks and part counts - Normalise trailing slashes via rstrip('/') before split - Raise MetaflowInvalidPathspec when pathspec is missing for user construction - Add unit tests aligned with PR Netflix#3047 plus Run step-like trailing-slash case Made-with: Cursor
Greptile SummaryThis PR adds upfront pathspec validation ( Confidence Score: 1/5Not safe to merge — the module fails to import due to a SyntaxError in A stray dangling expression and undeclared variables ( metaflow/client/core.py — the
|
| Filename | Overview |
|---|---|
| metaflow/client/core.py | Adds _validate_pathspec and refactors __init__, but the function body contains a stray dangling % (...) fragment (lines 99–100) and uses undeclared fmt/expected_parts, causing a SyntaxError that makes the module unimportable. |
| test/unit/test_pathspec_validation.py | Well-structured test suite covering None, empty, wrong-part-count, empty-segment, and trailing-slash cases for all five client types; helper functions are clean and tests are clearly annotated. |
| .gitattributes | New file enforcing LF line endings for CI shell scripts; straightforward and correct. |
Reviews (5): Last reviewed commit: "Merge branch 'fix/client-pathspec-valida..." | Re-trigger Greptile
Made-with: Cursor
|
Tip: Greploops — Automatically fix all review issues by running Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal. |
Windows checkouts with core.autocrlf could leave CRLF in .sh files and fail local shellcheck in pre-commit. eol=lf keeps scripts Unix-safe. Made-with: Cursor
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
| if pathspec is None or pathspec == "" or pathspec.strip("/") == "": | ||
| raise MetaflowInvalidPathspec( | ||
| "pathspec for %s cannot be empty or None; expected %s(%s)" | ||
| % (class_name, class_name, fmt) | ||
| ) | ||
| % (class_name, class_name, fmt) | ||
| ) | ||
|
|
||
| if pathspec == "" or pathspec.strip("/") == "": | ||
| raise MetaflowInvalidPathspec( | ||
| "pathspec for %s cannot be empty; expected %s(%s)" | ||
| % (class_name, class_name, fmt) | ||
| ) |
There was a problem hiding this comment.
SyntaxError + NameError — module is unimportable
Lines 99–100 are stray tokens left over from an incomplete edit. After the raise statement closes on line 98, the % (class_name, class_name, fmt) / ) pair on lines 99–100 is not a valid Python statement and causes an IndentationError/SyntaxError at import time — import metaflow fails entirely. Additionally, fmt and expected_parts are used throughout the function but are never assigned; they need to be unpacked from _PATHSPEC_FORMATS[obj_name]. The duplicate empty-check on lines 102–106 is also dead code once the first guard is present.
The corrected function body should be:
def _validate_pathspec(pathspec: Optional[str], obj_name: str) -> None:
if obj_name not in _PATHSPEC_FORMATS:
return
expected_parts, fmt = _PATHSPEC_FORMATS[obj_name]
class_name = _CLASS_DISPLAY_NAMES.get(obj_name, obj_name)
if pathspec is None or pathspec == "" or pathspec.strip("/") == "":
raise MetaflowInvalidPathspec(
"pathspec for %s cannot be empty or None; expected %s(%s)"
% (class_name, class_name, fmt)
)
normalized = pathspec.rstrip("/")
parts = normalized.split("/")
if any(p == "" for p in parts):
raise MetaflowInvalidPathspec(
"Invalid pathspec '%s' for %s: pathspec contains empty segments. "
"Expected %s(%s)." % (pathspec, class_name, class_name, fmt)
)
if len(parts) != expected_parts:
raise MetaflowInvalidPathspec(
"Invalid pathspec '%s' for %s: expected %s(%s) (%d part%s), got %d part%s."
% (
pathspec,
class_name,
class_name,
fmt,
expected_parts,
"" if expected_parts == 1 else "s",
len(parts),
"" if len(parts) == 1 else "s",
)
)Addresses review feedback: None pathspec now always goes through _validate_pathspec for typed objects (full error with expected format). The explicit None check after validation only applies when _NAME is not in _PATHSPEC_FORMATS (e.g. base MetaflowObject). Made-with: Cursor
…om/ragurajakrishnan15/metaflow into fix/client-pathspec-validation-948 Made-with: Cursor
fix(client): validate pathspec format for Client API (#948)
_validate_pathspec()with empty-segment checks and part countsrstrip('/')before splitMetaflowInvalidPathspecwhen pathspec is missing for user constructionRunstep-like trailing-slash caseMade-with: Cursor
PR Type
Summary
Client constructors (
Flow,Run,Step,Task,DataArtifact) now reject malformed pathspecs up front: missing/empty pathspecs, empty segments (//, leading/), wrong number of/-separated parts, and misleading trailing slashes (e.g. a path that normalises to a Step-shaped string passed toRun). Users getMetaflowInvalidPathspecwith a clear message instead of partially constructed objects or confusing follow-on errors.Issue
Fixes #948
Related: #3047 (overlapping community PR; happy to consolidate or drop this if maintainers prefer a single implementation).
Reproduction
Runtime: Client API only (no flow execution; local Python).
Commands to run:
cd test/unit PYTHONPATH=../.. python -m pytest test_pathspec_validation.py -vOn environments where
import metaflowfails during test collection (e.g. some Windows setups withoutfcntl), run the same command on Linux or rely on CI.Where evidence shows up: Python exception type and message in the REPL or test output.
Before (error / log snippet)
After (evidence that fix works)
Valid-shaped pathspecs still raise
MetaflowNotFoundwhen no metadata exists — unchanged; tests assertMetaflowInvalidPathspecis not raised for valid shapes.Root Cause
Pathspec strings were not fully validated before metadata lookup. Wrong segment counts (and slash edge cases) could lead to inconsistent client objects or confusing errors instead of an immediate, explicit
MetaflowInvalidPathspec.Why This Fix Is Correct
Validation runs once, before
_get_object(), using the same internal object kinds (flow,run,step,task,artifact) as the Client API. Trailing slashes are normalised withrstrip('/')soNameandName/behave the same; empty interior segments are rejected. Stored_pathspecmatches the normalised form.Failure Modes Considered
_object=...and without a user pathspec skip pathspec validation; only the user-facingpathspec+_object is Nonepath is validated. Pickle/__setstate__paths still pass explicit pathspecs through the same constructor logic as before.Tests
Added:
test/unit/test_pathspec_validation.pyNon-Goals
AI Tool Usage
Tool: Cursor (AI-assisted editing and review).
Use: Implementation outline, test cases, and PR description wording.
Review: I read the changed code in
metaflow/client/core.py, reasoned through edge cases (trailing slash,//, wrong counts), and aligned tests with that behaviour. Confirmpyteston Linux or via CI wheremetaflowimports cleanly.