fix: sanitize HTML error pages in getErrorMessage to avoid noisy safe_outputs logs#39655
Conversation
When GitHub returns a 504/5xx response with an HTML body (e.g. the "Unicorn" error page), getErrorMessage was returning the full raw HTML as the error message. This caused CI logs to be flooded with hundreds of lines of HTML markup. Add isHtmlContent() to detect DOCTYPE/html responses and modify getErrorMessage() to replace them with a concise human-readable message that includes the HTTP status code when available. Fixes: Review finalization failed: <!DOCTYPE html>... Now shows: Review finalization failed: GitHub returned an unexpected HTML response (HTTP 504) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This pull request updates the JavaScript action error helpers to prevent HTML error pages (e.g., GitHub gateway “Unicorn” pages) from being emitted verbatim into safe-outputs logs, replacing them with a concise, readable message.
Changes:
- Add
isHtmlContent()to detect HTML-like responses (<!DOCTYPE .../<html ...>) and use it ingetErrorMessage()to sanitize HTML bodies. - Include the HTTP status in the sanitized message when available.
- Add vitest coverage for HTML detection and sanitization behavior.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/error_helpers.cjs | Adds HTML detection + sanitization path in getErrorMessage, exports isHtmlContent. |
| actions/setup/js/error_helpers.test.cjs | Adds tests validating HTML detection and getErrorMessage sanitization output. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 2
| if (isHtmlContent(message)) { | ||
| const status = error && typeof error === "object" && "status" in error && typeof (/** @type {any} */ error.status) === "number" ? /** @type {any} */ error.status : null; | ||
| return status != null ? `GitHub returned an unexpected HTML response (HTTP ${status})` : "GitHub returned an unexpected HTML response"; | ||
| } |
There was a problem hiding this comment.
Fixed in 48d518b: switched to a single cast (const err = /** @type {any} */ (error)) and now read status from err in actions/setup/js/error_helpers.cjs.
|
✅ Test Quality Sentinel completed test quality analysis. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #39655 does not have the 'implementation' label and has 0 new lines of code in business logic directories (≤100 threshold). No custom .design-gate.yml config present. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd and /diagnose — requesting changes for one test defect and one coverage gap.
📋 Key Themes & Highlights
Issues Found
- No-op test (
/tdd): The"should return false for JSON-like content"test inisHtmlContentcallsexpect()without a matcher — Vitest never asserts anything, so this test will always be green even if the regex regresses. - Sanitization bypass (
/diagnose): Theelse { return String(error); }path skips the HTML check entirely. A raw HTML string throw (unlikely but possible) would still produce unfiltered markup in logs.
Positive Highlights
- ✅ Clean extraction of
isHtmlContentinto a well-documented, independently testable helper - ✅ Excellent edge-case coverage: leading whitespace, case-insensitive DOCTYPE, bare
<html>, with/without HTTP status - ✅ Fix is surgical — single extraction point means all handlers benefit without touching individual call sites
- ✅ PR description clearly describes the symptom, root cause, and the before/after output
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer
| }); | ||
|
|
||
| it("should return false for JSON-like content", () => { | ||
| expect(isHtmlContent('{"message":"Not Found","documentation_url":"..."}')); |
There was a problem hiding this comment.
[/tdd] Missing matcher: this expect() call has no assertion chained to it, so the test always passes regardless of what isHtmlContent returns — it is effectively a no-op.
💡 Fix
Add .toBe(false) to make the assertion real:
it("should return false for JSON-like content", () => {
expect(isHtmlContent('{"message":"Not Found","documentation_url":"..."}'))
.toBe(false);
});A bare expect(value) in Vitest creates an expectation object but asserts nothing. Without a matcher the test will always pass, hiding a future regression if the regex ever accidentally matches JSON-like input.
There was a problem hiding this comment.
Addressed in 48d518b: this test now asserts with .toBe(false) so it no longer passes as a no-op.
| } else if (error && typeof error === "object" && "message" in error && typeof error.message === "string") { | ||
| message = error.message; | ||
| } else { | ||
| return String(error); |
There was a problem hiding this comment.
[/diagnose] This early-return path bypasses the HTML sanitization added below. If a handler throws a raw HTML string (i.e. throw htmlString rather than throw new Error(htmlString)), the noisy HTML will still reach the logs unfiltered.
💡 Suggested fix
Apply isHtmlContent to the stringified fallback as well:
} else {
const str = String(error);
if (isHtmlContent(str)) {
return "GitHub returned an unexpected HTML response";
}
return str;
}Add a corresponding test:
it("should sanitize a raw HTML string throw", () => {
const html = "<!DOCTYPE html><html><body>Unicorn</body></html>";
expect(getErrorMessage(html)).toBe("GitHub returned an unexpected HTML response");
});Low-probability path today, but a quick one-liner guard closes the gap completely.
There was a problem hiding this comment.
Addressed in 48d518b: getErrorMessage now also sanitizes HTML from raw string throws (the fallback String(error) path), and a regression test was added.
|
@copilot run pr-finisher skill |
🧪 Test Quality Sentinel Report✅ Test Quality Score: 88/100 — Excellent
📊 Metrics & Test Classification (12 tests analyzed)
Score breakdown: behavioral 40/40 + edge-case 28/30 + duplication 20/20 + inflation 0/10 = 88/100
Go: 0 (
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 88/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). One non-blocking fix recommended: add missing .toBe(false) matcher to should return false for JSON-like content in error_helpers.test.cjs.
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done. I ran the PR finisher flow, applied the actionable review fixes, validated locally (
Fixed in |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…tatus access Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed. The CI failure was a TypeScript typecheck error, not the HTML transient test failure. The test Fixed in latest commit by using |
|
@copilot review all comments and address unresolved review feedback. Please fix the remaining review-requested test defect and rerun the failing JS checks.
|
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in |
|
@copilot review all comments and address the unresolved review feedback. Please rerun the JS checks after fixing the requested test issue.
|
|
@copilot fix https://github.com/github/gh-aw/actions/runs/27650502649/job/81780474902#step:6:1 and ensure prettier does not break it |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in |
When GitHub returns a 504/5xx with an HTML body (e.g. the "Unicorn" page on gateway timeout),
getErrorMessagewas returning the raw HTML verbatim. This flooded CI logs and made the safe_outputs failure summary unreadable — 300+ lines of CSS/markup instead of the actual error.Changes
error_helpers.cjs— AddedisHtmlContent()to detect<!DOCTYPE/<html>responses.getErrorMessagenow replaces HTML bodies with a concise message, including the HTTP status when available:error_recovery.cjs— UpdatedisTransientErrorto recognize both raw HTML responses (checking for<!doctype html) and sanitized HTML responses (checking for"unexpected html response"), ensuring these server errors are properly retried.error_helpers.test.cjs— Added 11 tests covering HTML sanitization (with/without status, leading whitespace, case-insensitive DOCTYPE) and theisHtmlContenthelper directly.Since
getErrorMessageis the single extraction point used across all handlers, the fix covers all safe_outputs error paths without touching individual handlers. TheisTransientErrorenhancement ensures HTML errors remain retryable after sanitization.