Skip to content

fix(cargo): harden README fetch, clarify status handling, strengthen tests#1330

Merged
rdimitrov merged 1 commit into
mainfrom
followup/cargo-hardening
Jun 4, 2026
Merged

fix(cargo): harden README fetch, clarify status handling, strengthen tests#1330
rdimitrov merged 1 commit into
mainfrom
followup/cargo-hardening

Conversation

@rdimitrov
Copy link
Copy Markdown
Member

Follow-up to #1207, addressing items from the cargo validator review. Scoped to cargo + shared helpers — no behavior change for existing npm/pypi/nuget/oci/mcpb publishers. Safe to merge and promote.

Changes

SSRF hardening of the two-call README retrieval

  • Pin the step-2 README URL to an allowed host (static.crates.io in prod; the base host under test) before fetching, so a metadata response can't steer the validator at an internal/attacker host.
  • CheckRedirect policy pins every redirect hop to the same allowlist (the initial URL being pinned isn't enough if an upstream 3xx is followed).
  • Cap the README body with io.LimitReader (5 MiB).

Not publisher-exploitable today (the URL comes from crates.io), so this is defense-in-depth — but it makes the "host is pinned" guarantee hold in code rather than rely on crates.io's behavior.

Clearer status handling (previously any non-5xx/non-200 collapsed to "not found")

  • 429 → reported as transient/retryable, at both the metadata and README steps.
  • 403 → disambiguated via the crate-version endpoint: genuinely-missing stays "not found"; exists-but-no-README gets an actionable "add a README with mcp-name and republish" message (mirrors the NuGet validator). This continues the 5xx-vs-403 diagnostic split @P4ST4S started in the feat: add cargo (crates.io) as a package registry type #1207 review — thanks!

Shared containsMCPNameToken helper
Boundary-anchored ownership-token match (prevents prefix confusion, e.g. a README declaring io.github.acme/widget-pro satisfying a claim for io.github.acme/widget), used here by the cargo validator only. Adopting it in PyPI/NuGet is a separate follow-up because it's a behavior change for those already-live registries.

Tests & docs
Hermetic positive ServerNameFormats; combined fixture gains a version-existence endpoint + cases for 403-missing vs 403-no-README, 429, and prefix-confusion rejection; new foreign-README-host (SSRF) test; internal test for the helper. Docs: list crates.io in the registry-requirements list + the Package model doc.

Testing

go build, go vet, full ./internal/validators/... suite (incl. live cargo positive), make check-schema, validate-examples all pass. No new golangci-lint findings.

🤖 Generated with Claude Code

…tests

Follow-up to #1207, addressing items from review of the cargo validator.
Scoped to cargo + shared helpers only — no behavior change for existing
npm/pypi/nuget/oci/mcpb publishers.

SSRF hardening of the two-call README retrieval:
- Pin the step-2 README URL to an allowed host (static.crates.io in prod; the
  base host under test) before fetching, so a metadata response cannot steer the
  validator at an internal/attacker host.
- CheckRedirect policy pins every redirect hop to the same allowlist.
- Cap the README body with io.LimitReader (5 MiB).

Clearer status handling (previously any non-5xx/non-200 collapsed to "not found"):
- 429 is reported as transient/retryable rather than "not found", at both steps.
- A 403 from static.crates.io is disambiguated via the crate-version endpoint:
  genuinely-missing stays "not found"; exists-but-no-README gets an actionable
  "add a README with mcp-name and republish" message (mirrors NuGet). This
  continues the 5xx-vs-403 diagnostic split @P4ST4S started in the #1207 review.

Add a shared, boundary-anchored containsMCPNameToken helper (prevents prefix
confusion, e.g. a README declaring io.github.acme/widget-pro satisfying a claim
for io.github.acme/widget) and use it from the cargo validator. Adopting it in
the PyPI/NuGet validators is a follow-up (it is a behavior change for those
already-live registries).

Tests: hermetic positive ServerNameFormats; combined fixture gains a
version-existence endpoint + cases for 403-missing vs 403-no-readme, 429, and
prefix-confusion rejection; new foreign-README-host (SSRF) test; internal test
for the helper. Docs: list crates.io in registry requirements + the Package doc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rdimitrov rdimitrov force-pushed the followup/cargo-hardening branch from bc87b70 to d796057 Compare June 4, 2026 20:08
@rdimitrov rdimitrov merged commit 7a6d1bc into main Jun 4, 2026
6 checks passed
@rdimitrov rdimitrov deleted the followup/cargo-hardening branch June 4, 2026 21:34
rdimitrov added a commit that referenced this pull request Jun 5, 2026
…en 403

Three follow-up hardening fixes to the cargo validator (review of #1330):

- SSRF: the README host pin and the redirect policy keyed on Hostname() only, so
  any port/scheme on crates.io/static.crates.io was accepted. cargoURLAllowed now
  additionally requires https + the default port for the real crates.io base
  (test/mock bases still match on host only, so httptest fixtures keep working).
- Transient 403 disambiguation: when the README CDN 403s, the crate-version
  existence probe previously reported "not found" if the probe itself failed
  (429/5xx/network) — the same misclassification the 5xx handling fixed one layer
  up. probeCargoVersion now returns a four-state result and a transient probe
  yields a retryable message instead of "not found".
- A 403 with the crate present no longer flatly asserts "no rendered README"
  (a 403 isn't definitive proof — could be a CDN/WAF block); the message now says
  the README could not be retrieved and gives the actionable next step.

Tests: TestCargoURLAllowed (https/port/userinfo/foreign-host matrix) and a
combined-fixture case where the 403 existence-probe is rate-limited (429) and
must report transient, not "not found".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rdimitrov added a commit that referenced this pull request Jun 5, 2026
…ent-form safe) + cargo follow-ups (#1331)

Registry-wide hardening of the `mcp-name:` ownership-token match, plus
the cargo follow-up fixes from the #1330 review — consolidated here per
maintainer request. Rebased onto current `main` (the merged #1330 cargo
commit drops out), so the diff is just the net-new work.

### Changes
- **PyPI/NuGet: boundary-anchored token match.** Replace
`strings.Contains(readme, "mcp-name: "+name)` with the shared
`containsMCPNameToken`, so a README declaring a longer name
(`…/widget-pro`) no longer satisfies an ownership claim for a shorter
prefix (`…/widget`). (Cargo already got this in #1330; NPM is unaffected
— it compares an exact metadata field.)
- **Matcher: treat the HTML comment close as a boundary.** `<!--
mcp-name: NAME-->` / `<!--mcp-name: NAME-->` (any spacing) validate
again — the documented hidden-comment form for PyPI/NuGet — while a
genuine longer name (`…/widget--pro`) still does not.
- **Cargo hardening (review of #1330):** pin scheme+port (not just host)
on the README fetch and redirects; a rate-limited/failed crate-version
existence probe now reports *transient* instead of "not found"; a 403
with the crate present no longer flatly asserts "no README".
- **Docs:** note the token must be followed by a boundary.

### ⚠️ Behavior change for PyPI/NuGet — read before merging
The match is **strictly stricter** (verified by fuzz, 2.3M execs: it can
only flip pass→fail, never fail→pass). After the comment-close fix, the
**only** forms that newly fail are unusual *inline* ones: the token
ending a sentence (`…/my-mcp.`) or glued to a trailing `/`. The
documented forms (own line, in `<!-- … -->`) are unaffected.

**Correcting an earlier claim:** this re-validates on **edits too**, not
just new versions — `edit.go → UpdateServer → ValidateUpdateRequest →
validateRegistryOwnership` runs the token check on any edit of a live
server (only delete-transitions skip it). So an existing PyPI/NuGet
server whose README uses one of the breaking inline forms would fail on
its next publish *or* edit. Given the v0.1 API-freeze posture, this
should land as a deliberate, noted change.

### Testing
`go build`, `vet`, `gofmt`, `golangci-lint` (prod files),
`check-schema`, `validate-examples` all clean. Hermetic + **live**
PyPI/NuGet/cargo suite passes. New: `TestCargoURLAllowed`, a
combined-fixture probe-429→transient case, comment-form matcher cases,
and `FuzzContainsMCPNameToken` (the strictly-stricter property, 2.3M
execs).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant