spike: mic-latency investigation + native-Swift perf harness (re #1283)#1329
Draft
rsanheim wants to merge 6 commits into
Draft
spike: mic-latency investigation + native-Swift perf harness (re #1283)#1329rsanheim wants to merge 6 commits into
rsanheim wants to merge 6 commits into
Conversation
The repo has no bun version pinned, so fresh mise-managed shells fall into "No version is set for shim: bun" and cannot run `bun install`. Pinning here gives everyone a known-good baseline without touching global mise config.
The recorder module previously only tested the error-message classifier helpers. The open/start/stop/close lifecycle — which the on-demand microphone path relies on — had zero coverage, leaving no regression safety net for upcoming latency work. Adds five functional tests that drive the real default input device: * open_default_device_then_close_is_clean — basic open/close round trip * open_is_idempotent_on_already_open_recorder — pins the early-return contract a second open relies on * start_then_stop_returns_captured_samples — stop returns owned samples * close_allows_reopen — on-demand mode cycles through this repeatedly * level_callback_fires_while_recording — exercises the visualizer path Tests skip gracefully on hosts without a default input device so the suite still passes in headless CI.
Captures state of knowledge on Handy's shortcut→tone latency:
* what we measured in the installed app and where prior
mental models got it wrong (the "7ms warm path" number
excluded the tone itself)
* what a native-Swift AVAudioEngine harness measures for the
same Mac and mic, cold and warm
* two implications for Handy: stay with cpal but adopt a
tap-always-installed, flag-gated design; pre-warm the stream
at app launch to move the one-time ~500ms cold-open off the
first-press critical path
Local working doc for the spike; not a user-facing design.
Standalone SwiftPM executable that measures the platform floor
for keypress → first-sample → tone-played on macOS, independent
of Handy's Rust/cpal/rodio stack. Used to decide whether Handy's
observed latency is a Rust-layer problem or a platform floor.
Capabilities:
* Cold mode: rebuild AVAudioEngine per press (mirrors Handy's
default behavior of cold-opening on every recording start)
* Warm mode: engine built and started once at init, kept
running between presses; per-press work is just install /
remove a tap on the input node
* Timing in raw mach ticks throughout, converted to ms only at
display so there's no UInt64 underflow from mixing
AVAudioTime.hostTime with DispatchTime.uptimeNanoseconds
* --auto mode drives N press/stop cycles without human input,
prints a per-run summary (min/median/mean/max plus per-press
table), and exits cleanly — suitable for unattended runs
* Four global Carbon hotkeys (Cmd+Shift+H, Ctrl+Shift+H,
Ctrl+Opt+Space, F19) plus stdin-Enter fallback when running
interactively
* Writes each press to /tmp/handy-audio-perf/press-*.wav and
appends a CSV row per press for historical comparison
Lives under tools/ so it is not compiled or shipped with the
Handy app.
… change Adds a section to latency-investigation.md that walks through the v0.7.9 → v0.7.10 step of domdomegg's bisection. Core finding: the apparent 10-20x mic-init regression isn't a runtime regression, it's an observability correction from PR cjpais#945. Before cjpais#945, AudioRecorder::open() spawned the cpal worker thread and returned immediately; the "Microphone stream initialized in X.XXms" log was timing mpsc channel setup + thread::spawn + three field assignments while real CoreAudio HAL cold-spin continued in the background. After cjpais#945, open() waits on an init handshake channel until the worker's stream.play() has returned, which is actual mic readiness. managers/audio.rs::start_microphone_stream is byte-identical between the two tags; all of the delta lives inside open()'s wait semantics. Doc also captures: UX consequences of each model, why the sync handshake is a correctness fix (release-build panic=abort would terminate on cpal open failure), and a note to rename the log line so the next bisector isn't steered the same way.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft / spike — not proposed for merge as-is. Opening this to have something linkable from discussion on #1283 and to share the investigation code & notes.
What's in here
test(audio): add AudioRecorder lifecycle coverage— five functional tests that drive the real default input device to pin current behaviour ofAudioRecorder::open/start/stop/close, plus the level-callback path. Skip gracefully on hosts without a default input device so headless CI is fine. Ship-worthy independent of the rest.chore: pin bun 1.3.13 via .mise.toml— the repo had no pinned bun version, so fresh mise-managed shells landed inNo version is set for shim: bun. Low-impact repo hygiene. Happy to drop or adjust the version.docs: add latency investigation writeup—docs/latency-investigation.md. Working document capturing what we measured, what the prior mental model got wrong, and an analysis of the v0.7.9 → v0.7.10 timing delta reported in Microphone initialization delay (~500ms) causes beginning of speech to be cut off #1283 (short version: it's an observability change from Handle microphone init failure without aborting #945's async→syncopen()restructure, not a runtime regression —start_microphone_streamis byte-identical between the two tags). Includes thread summary, v0.7.9/v0.7.10 code comparison, and implications.tools: add macos-audio-perf native Swift harness—tools/macos-audio-perf/, a SwiftPM executable that measures keypress → first-sample → tone-played on macOS independent of Handy's Rust/cpal/rodio stack, used to separate platform-floor costs from implementation costs. Cold and warm modes,--autofor unattended runs with summary stats, writes wavs + CSV. Not built or shipped with the Handy app.Headline findings (full detail in the doc)
On a Mac Studio + Studio Display USB mic:
engine.start()alone plus ~180ms before first sample — essentially identical. cpal is not the bottleneck; it's CoreAudio + USB-mic hardware.lazy_stream_close=truewarm path is estimated ~180–290ms because it still cycles the cpal stream plus waits 100ms before the tone plays plus cold-opens the rodio output stream on every press.tap_host_ms ≈ −10msin warm mode — the running engine has audio buffered ahead of the keypress, so the "first sample" was captured before we pressed. This is the shape of the always-on pattern @m13v describes.installTap(), not hardware latency. A tap-always-installed / flag-gated design should drop this to the low tens of ms.Not proposing
No change to Handy's runtime behaviour in this PR. If anything on the
docs/ortools/side is useful to upstream separately, let me know.