Disable npm audit to prevent ETIMEDOUT build failures#66465
Disable npm audit to prevent ETIMEDOUT build failures#66465wtgodbe merged 9 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Disables npm audit during installs by setting audit=false in the repo’s root .npmrc, preventing intermittent CI failures caused by timeouts when npm 11.x tries to reach the AzDO feed’s security advisories endpoint.
Changes:
- Add
audit=falseto.npmrcsonpm cino longer performs audit requests during installs.
328f880 to
a18c084
Compare
a18c084 to
540da99
Compare
Investigation update — 57005 is Windows-onlyWe've confirmed that exit code 57005 occurs exclusively on Windows agents (
This PR now also adds Additional diagnostics to try if the .report file doesn't yield answers
|
540da99 to
bc1960e
Compare
Add audit=false to .npmrc to prevent ETIMEDOUT failures when the AzDO feed's security advisories endpoint times out (exit code -1). Add NODE_OPTIONS=--report-on-faulting-state-change to generate a Node.js diagnostic report (.report file) if the node process crashes during npm ci. This will help diagnose the Windows-only exit code 57005 (0xDEAD) failures where the process is killed mid-extraction with no error output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bc1960e to
471a9a4
Compare
- Add maxsockets=10 to .npmrc to reduce resource pressure during extraction - After npm ci failure on Windows, check: - Windows Event Log for application errors in the last 5 minutes - WER crash dump folder for recent .dmp files - Node.js diagnostic report directory for crash reports Investigating exit code 57005 (0xDEAD) crashes that only occur on Windows CI agents. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Latest push: maxsockets + Windows crash diagnosticsConcurrency reductionAdded New Windows crash diagnosticsAfter npm ci failure on Windows, we now automatically check:
Key finding from log analysisBoth 57005 failures produced identical log file sizes (518,908 bytes, 2802 lines) despite different timestamps and packages. The npm debug log is truncated mid-extraction at the same byte offset. This suggests the process is being killed at a deterministic resource threshold, not at a random point. If maxsockets doesn't fix it, next steps
|
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
The " XML entities for double quotes inside the Command attribute were closing the MSBuild attribute early. Use string concatenation with single quotes instead to avoid nested double quotes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
Node.js v24.15.0 (released Apr 16) causes intermittent native crashes in node.exe during npm ci tarball extraction on Windows CI agents. The crash produces exit code 57005 (0xDEAD) with a Windows Event Log 'Application Error' entry confirming a fault in node.exe. The first 57005 occurrence (build 1381847) appeared ~4 hours after the v24.15.0 release. Zero crashes occurred on v24.14.1. Pin to 24.14.1 until the upstream issue is identified and fixed. Also update cache keys to avoid stale node_modules from 24.x builds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The post-job cache save fails when node_modules doesn't exist (e.g., after a build failure before npm ci runs). This shouldn't fail the overall job. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
Root cause identified: Node.js v24.15.0 native crash on Windows. Fixed by pinning to 24.14.1. Remove all temporary diagnostics: - Verbose npm ci output - Environment logging (node/npm version, disk, registry) - npm debug log capture and artifact upload - Windows Event Log / WER / Node.js report checks - NODE_OPTIONS=--report-uncaught-exception Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Code check doesn't need Node.js. This also avoids the Cache node_modules post-job failure when node_modules doesn't exist. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
npm ciin npm 11.x runsnpm auditby default and treats audit network failures as hard errors. The AzDO feed's/npm/v1/security/advisories/bulkendpoint intermittently times out, causing builds to fail with exit code -1 and:This was captured by the verbose diagnostics added in #66447 (build 1394430). The diagnostics confirmed:
What this fixes
This fixes the exit code -1 / ETIMEDOUT npm failures caused by audit timeouts. It does not fix the separate exit code 57005 (0xDEAD) failures, which have a different unknown root cause that we haven't yet captured with diagnostics.
Fix
Add
audit=falseto.npmrc. The security audit is non-essential for CI builds.Relates to #62807