Skip to content

spike: mic-latency investigation + native-Swift perf harness (re #1283)#1329

Draft
rsanheim wants to merge 6 commits into
cjpais:mainfrom
rsanheim:spike/latency-investigation
Draft

spike: mic-latency investigation + native-Swift perf harness (re #1283)#1329
rsanheim wants to merge 6 commits into
cjpais:mainfrom
rsanheim:spike/latency-investigation

Conversation

@rsanheim
Copy link
Copy Markdown
Contributor

Draft / spike — not proposed for merge as-is. Opening this to have something linkable from discussion on #1283 and to share the investigation code & notes.

What's in here

  1. test(audio): add AudioRecorder lifecycle coverage — five functional tests that drive the real default input device to pin current behaviour of AudioRecorder::open / start / stop / close, plus the level-callback path. Skip gracefully on hosts without a default input device so headless CI is fine. Ship-worthy independent of the rest.

  2. chore: pin bun 1.3.13 via .mise.toml — the repo had no pinned bun version, so fresh mise-managed shells landed in No version is set for shim: bun. Low-impact repo hygiene. Happy to drop or adjust the version.

  3. docs: add latency investigation writeupdocs/latency-investigation.md. Working document capturing what we measured, what the prior mental model got wrong, and an analysis of the v0.7.9 → v0.7.10 timing delta reported in Microphone initialization delay (~500ms) causes beginning of speech to be cut off #1283 (short version: it's an observability change from Handle microphone init failure without aborting #945's async→sync open() restructure, not a runtime regression — start_microphone_stream is byte-identical between the two tags). Includes thread summary, v0.7.9/v0.7.10 code comparison, and implications.

  4. tools: add macos-audio-perf native Swift harnesstools/macos-audio-perf/, a SwiftPM executable that measures keypress → first-sample → tone-played on macOS independent of Handy's Rust/cpal/rodio stack, used to separate platform-floor costs from implementation costs. Cold and warm modes, --auto for unattended runs with summary stats, writes wavs + CSV. Not built or shipped with the Handy app.

Headline findings (full detail in the doc)

On a Mac Studio + Studio Display USB mic:

  • Handy cold path ≈ 640ms end-to-end. Native-Swift AVAudioEngine on the same machine pays ~490ms in engine.start() alone plus ~180ms before first sample — essentially identical. cpal is not the bottleneck; it's CoreAudio + USB-mic hardware.
  • Warm path (engine kept running) ≈ 107ms in the Swift harness. Handy's current lazy_stream_close=true warm path is estimated ~180–290ms because it still cycles the cpal stream plus waits 100ms before the tone plays plus cold-opens the rodio output stream on every press.
  • tap_host_ms ≈ −10ms in warm mode — the running engine has audio buffered ahead of the keypress, so the "first sample" was captured before we pressed. This is the shape of the always-on pattern @m13v describes.
  • The bulk of the remaining ~105ms warm-path is installTap(), not hardware latency. A tap-always-installed / flag-gated design should drop this to the low tens of ms.

Not proposing

No change to Handy's runtime behaviour in this PR. If anything on the docs/ or tools/ side is useful to upstream separately, let me know.

The repo has no bun version pinned, so fresh mise-managed shells fall
into "No version is set for shim: bun" and cannot run `bun install`.
Pinning here gives everyone a known-good baseline without touching
global mise config.
The recorder module previously only tested the error-message classifier
helpers. The open/start/stop/close lifecycle — which the on-demand
microphone path relies on — had zero coverage, leaving no regression
safety net for upcoming latency work.

Adds five functional tests that drive the real default input device:

* open_default_device_then_close_is_clean — basic open/close round trip
* open_is_idempotent_on_already_open_recorder — pins the early-return
  contract a second open relies on
* start_then_stop_returns_captured_samples — stop returns owned samples
* close_allows_reopen — on-demand mode cycles through this repeatedly
* level_callback_fires_while_recording — exercises the visualizer path

Tests skip gracefully on hosts without a default input device so the
suite still passes in headless CI.
Captures state of knowledge on Handy's shortcut→tone latency:
  * what we measured in the installed app and where prior
    mental models got it wrong (the "7ms warm path" number
    excluded the tone itself)
  * what a native-Swift AVAudioEngine harness measures for the
    same Mac and mic, cold and warm
  * two implications for Handy: stay with cpal but adopt a
    tap-always-installed, flag-gated design; pre-warm the stream
    at app launch to move the one-time ~500ms cold-open off the
    first-press critical path

Local working doc for the spike; not a user-facing design.
Standalone SwiftPM executable that measures the platform floor
for keypress → first-sample → tone-played on macOS, independent
of Handy's Rust/cpal/rodio stack. Used to decide whether Handy's
observed latency is a Rust-layer problem or a platform floor.

Capabilities:
  * Cold mode: rebuild AVAudioEngine per press (mirrors Handy's
    default behavior of cold-opening on every recording start)
  * Warm mode: engine built and started once at init, kept
    running between presses; per-press work is just install /
    remove a tap on the input node
  * Timing in raw mach ticks throughout, converted to ms only at
    display so there's no UInt64 underflow from mixing
    AVAudioTime.hostTime with DispatchTime.uptimeNanoseconds
  * --auto mode drives N press/stop cycles without human input,
    prints a per-run summary (min/median/mean/max plus per-press
    table), and exits cleanly — suitable for unattended runs
  * Four global Carbon hotkeys (Cmd+Shift+H, Ctrl+Shift+H,
    Ctrl+Opt+Space, F19) plus stdin-Enter fallback when running
    interactively
  * Writes each press to /tmp/handy-audio-perf/press-*.wav and
    appends a CSV row per press for historical comparison

Lives under tools/ so it is not compiled or shipped with the
Handy app.
… change

Adds a section to latency-investigation.md that walks through the
v0.7.9 → v0.7.10 step of domdomegg's bisection. Core finding: the
apparent 10-20x mic-init regression isn't a runtime regression, it's
an observability correction from PR cjpais#945.

Before cjpais#945, AudioRecorder::open() spawned the cpal worker thread and
returned immediately; the "Microphone stream initialized in X.XXms"
log was timing mpsc channel setup + thread::spawn + three field
assignments while real CoreAudio HAL cold-spin continued in the
background. After cjpais#945, open() waits on an init handshake channel
until the worker's stream.play() has returned, which is actual mic
readiness. managers/audio.rs::start_microphone_stream is byte-identical
between the two tags; all of the delta lives inside open()'s wait
semantics.

Doc also captures: UX consequences of each model, why the sync
handshake is a correctness fix (release-build panic=abort would
terminate on cpal open failure), and a note to rename the log line so
the next bisector isn't steered the same way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant