-
-
Notifications
You must be signed in to change notification settings - Fork 217
Description
Description
I've encountered a consistent deadlock when using child_process.spawn() in Node.js after initializing any ZeroMQ socket. The issue is 100% reproducible on OrbStack running x64 Linux on Apple Silicon (M1/M2/M3) via Rosetta 2 emulation.
This appears to be a classic fork-safety issue where ZeroMQ's background I/O threads hold mutex locks during fork(), causing the child process to inherit locked mutexes with no owner thread—resulting in deadlock.
Environment
- Host OS: macOS (Apple Silicon / ARM64)
- OrbStack Version: (e.g., v1.x.x) Version 2.0.5 (19905)
- Guest Linux: Debian/Ubuntu x64 (Running via Rosetta 2 emulation)
- Node.js Version: 22
- zeromq.js Version: latest
Minimal Reproduction
const { spawn } = require("child_process");
const { Router } = require("zeromq");
// 1. Initialize ZeroMQ (starts background I/O threads)
const server = new Router();
console.log("ZeroMQ Router created");
// 2. Attempt to spawn a child process
// This hangs indefinitely on OrbStack (x64 on ARM64)
const child = spawn("echo", ["hello"], {
stdio: ["ignore", "pipe", "pipe"],
});
// This line is never reached
console.log("child.pid:", child.pid);
child.stdout.on("data", (data) => console.log(`stdout: ${data}`));
child.on("close", (code) => console.log(`exited with code ${code}`));What I've Tried (All Failed)
1. Setting ioThreads: 0/Setting blocky: false and linger: 0/Setting threadPriority and threadSchedulingPolicy
const context = new Context({
ioThreads: 0,
blocky: false,
threadPriority: 0,
threadSchedulingPolicy: 0,
});
// Still deadlocks2. Using worker_threads to isolate ZeroMQ
zmq_worker.js:
const { parentPort } = require("worker_threads");
const { Router } = require("zeromq");
const server = new Router();
console.log("Worker: ZeroMQ Router created");
parentPort.postMessage("ready");main.js:
const { spawn } = require("child_process");
const { Worker } = require("worker_threads");
const worker = new Worker("./zmq_worker.js");
worker.on("message", (msg) => {
if (msg === "ready") {
// Still deadlocks!
const child = spawn("echo", ["hello"]);
}
});This also deadlocks because worker_threads share the same process address space, and fork() copies the entire process memory including ZeroMQ's internal state from the worker thread.
Technical Analysis
The root cause is the well-known fork-after-pthread problem:
- ZeroMQ creates background threads (even with
ioThreads: 0, there may be internal initialization) - These threads may hold mutexes (malloc locks, internal ZMQ locks, etc.)
- When
fork()is called, only the calling thread is copied to the child process - The child process inherits locked mutexes, but the threads that held them don't exist
- Any operation requiring those locks (like
malloc) will deadlock
Rosetta 2's x64→ARM64 translation layer appears to significantly widen the race condition window, making this issue 100% reproducible instead of sporadic.
strace Evidence
6886 brk(0x2 <unfinished ...>
6886 <... brk resumed>) = 0x8000001a1c30
6884 syscall_0x6aad140(0, 0, 0x6, 0, 0x1, 0x62) = 0x3fbfe
6884 syscall_0x6aad140(0, 0, 0x6, 0, 0x1, 0x62 <unfinished ...>
// Child process hangs here indefinitely (likely futex wait)
Potential Solutions (Discussion)
Option A: pthread_atfork() handlers in zeromq.js
Register fork handlers to lock/unlock known mutexes around fork:
pthread_atfork(
[]() { /* prepare: acquire locks */ },
[]() { /* parent: release locks */ },
[]() { /* child: release/reinit locks */ }
);Problem: The critical locks are in libzmq and glibc, not in zeromq.js itself.
Option B: Upstream fix in libzmq
Request libzmq to implement pthread_atfork handlers for their internal mutexes.
Thank you for your time!