spike: mic-latency investigation + native-Swift perf harness (re #1283) by rsanheim · Pull Request #1329 · cjpais/Handy

rsanheim · 2026-04-24T05:47:39Z

Draft / spike — not proposed for merge as-is. Opening this to have something linkable from discussion on #1283 and to share the investigation code & notes.

What's in here

test(audio): add AudioRecorder lifecycle coverage — five functional tests that drive the real default input device to pin current behaviour of AudioRecorder::open / start / stop / close, plus the level-callback path. Skip gracefully on hosts without a default input device so headless CI is fine. Ship-worthy independent of the rest.
chore: pin bun 1.3.13 via .mise.toml — the repo had no pinned bun version, so fresh mise-managed shells landed in No version is set for shim: bun. Low-impact repo hygiene. Happy to drop or adjust the version.
docs: add latency investigation writeup — docs/latency-investigation.md. Working document capturing what we measured, what the prior mental model got wrong, and an analysis of the v0.7.9 → v0.7.10 timing delta reported in Microphone initialization delay (~500ms) causes beginning of speech to be cut off #1283 (short version: it's an observability change from Handle microphone init failure without aborting #945's async→sync open() restructure, not a runtime regression — start_microphone_stream is byte-identical between the two tags). Includes thread summary, v0.7.9/v0.7.10 code comparison, and implications.
tools: add macos-audio-perf native Swift harness — tools/macos-audio-perf/, a SwiftPM executable that measures keypress → first-sample → tone-played on macOS independent of Handy's Rust/cpal/rodio stack, used to separate platform-floor costs from implementation costs. Cold and warm modes, --auto for unattended runs with summary stats, writes wavs + CSV. Not built or shipped with the Handy app.

Headline findings (full detail in the doc)

On a Mac Studio + Studio Display USB mic:

Handy cold path ≈ 640ms end-to-end. Native-Swift AVAudioEngine on the same machine pays ~490ms in engine.start() alone plus ~180ms before first sample — essentially identical. cpal is not the bottleneck; it's CoreAudio + USB-mic hardware.
Warm path (engine kept running) ≈ 107ms in the Swift harness. Handy's current lazy_stream_close=true warm path is estimated ~180–290ms because it still cycles the cpal stream plus waits 100ms before the tone plays plus cold-opens the rodio output stream on every press.
tap_host_ms ≈ −10ms in warm mode — the running engine has audio buffered ahead of the keypress, so the "first sample" was captured before we pressed. This is the shape of the always-on pattern @m13v describes.
The bulk of the remaining ~105ms warm-path is installTap(), not hardware latency. A tap-always-installed / flag-gated design should drop this to the low tens of ms.

Not proposing

No change to Handy's runtime behaviour in this PR. If anything on the docs/ or tools/ side is useful to upstream separately, let me know.

The repo has no bun version pinned, so fresh mise-managed shells fall into "No version is set for shim: bun" and cannot run `bun install`. Pinning here gives everyone a known-good baseline without touching global mise config.

The recorder module previously only tested the error-message classifier helpers. The open/start/stop/close lifecycle — which the on-demand microphone path relies on — had zero coverage, leaving no regression safety net for upcoming latency work. Adds five functional tests that drive the real default input device: * open_default_device_then_close_is_clean — basic open/close round trip * open_is_idempotent_on_already_open_recorder — pins the early-return contract a second open relies on * start_then_stop_returns_captured_samples — stop returns owned samples * close_allows_reopen — on-demand mode cycles through this repeatedly * level_callback_fires_while_recording — exercises the visualizer path Tests skip gracefully on hosts without a default input device so the suite still passes in headless CI.

Captures state of knowledge on Handy's shortcut→tone latency: * what we measured in the installed app and where prior mental models got it wrong (the "7ms warm path" number excluded the tone itself) * what a native-Swift AVAudioEngine harness measures for the same Mac and mic, cold and warm * two implications for Handy: stay with cpal but adopt a tap-always-installed, flag-gated design; pre-warm the stream at app launch to move the one-time ~500ms cold-open off the first-press critical path Local working doc for the spike; not a user-facing design.

Standalone SwiftPM executable that measures the platform floor for keypress → first-sample → tone-played on macOS, independent of Handy's Rust/cpal/rodio stack. Used to decide whether Handy's observed latency is a Rust-layer problem or a platform floor. Capabilities: * Cold mode: rebuild AVAudioEngine per press (mirrors Handy's default behavior of cold-opening on every recording start) * Warm mode: engine built and started once at init, kept running between presses; per-press work is just install / remove a tap on the input node * Timing in raw mach ticks throughout, converted to ms only at display so there's no UInt64 underflow from mixing AVAudioTime.hostTime with DispatchTime.uptimeNanoseconds * --auto mode drives N press/stop cycles without human input, prints a per-run summary (min/median/mean/max plus per-press table), and exits cleanly — suitable for unattended runs * Four global Carbon hotkeys (Cmd+Shift+H, Ctrl+Shift+H, Ctrl+Opt+Space, F19) plus stdin-Enter fallback when running interactively * Writes each press to /tmp/handy-audio-perf/press-*.wav and appends a CSV row per press for historical comparison Lives under tools/ so it is not compiled or shipped with the Handy app.

… change Adds a section to latency-investigation.md that walks through the v0.7.9 → v0.7.10 step of domdomegg's bisection. Core finding: the apparent 10-20x mic-init regression isn't a runtime regression, it's an observability correction from PR cjpais#945. Before cjpais#945, AudioRecorder::open() spawned the cpal worker thread and returned immediately; the "Microphone stream initialized in X.XXms" log was timing mpsc channel setup + thread::spawn + three field assignments while real CoreAudio HAL cold-spin continued in the background. After cjpais#945, open() waits on an init handshake channel until the worker's stream.play() has returned, which is actual mic readiness. managers/audio.rs::start_microphone_stream is byte-identical between the two tags; all of the delta lives inside open()'s wait semantics. Doc also captures: UX consequences of each model, why the sync handshake is a correctness fix (release-build panic=abort would terminate on cpal open failure), and a note to rename the log line so the next bisector isn't steered the same way.

rsanheim added 5 commits April 23, 2026 14:09

chore: pin bun 1.3.13 via .mise.toml

35a6e69

The repo has no bun version pinned, so fresh mise-managed shells fall into "No version is set for shim: bun" and cannot run `bun install`. Pinning here gives everyone a known-good baseline without touching global mise config.

rsanheim mentioned this pull request Apr 24, 2026

Microphone initialization delay (~500ms) causes beginning of speech to be cut off #1283

Open

more docs

99fbb0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

spike: mic-latency investigation + native-Swift perf harness (re #1283)#1329

spike: mic-latency investigation + native-Swift perf harness (re #1283)#1329
rsanheim wants to merge 6 commits into
cjpais:mainfrom
rsanheim:spike/latency-investigation

rsanheim commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rsanheim commented Apr 24, 2026

What's in here

Headline findings (full detail in the doc)

Not proposing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant