perf(shm): add lock-free segment pool to PosixShmProviderBackendTalc by YuanYuYuan · Pull Request #2514 · eclipse-zenoh/zenoh

YuanYuYuan · 2026-03-23T12:07:31Z

Summary

Stacks on #2513 — review that first; this PR's diff against main includes those commits too.

Add a lock-free segment pool to PosixShmProviderBackendTalc so that SHM segments are reused across sends instead of being created and destroyed per-message. Previously every send called shm_open + mmap + ftruncate, which incurred filesystem quota accounting (__dquot_alloc_space at ~10% CPU in profiling). The pool reduces the hot path to a single ArrayQueue CAS.

Key Changes

commons/zenoh-shm/src/api/protocol_implementations/posix/posix_shm_provider_backend_talc.rs: check pool first in alloc; free is a no-op when pool is active (segment returned on drop instead)
commons/zenoh-shm/src/api/protocol_implementations/posix/posix_shm_segment.rs: expose size() accessor for pool keying
commons/zenoh-shm/src/posix_shm/pool.rs: ShmSegmentPool — per-size ArrayQueue of idle segments; hot path is read lock + single CAS
commons/zenoh-shm/src/posix_shm/pooled_segment.rs: PooledPosixShmSegment — Drop returns segment to pool instead of unmapping
commons/zenoh-shm/benches/shm_pool.rs: criterion benchmark measuring cold (pool miss) and warm (pool hit) allocation latency

Benchmark Results

path	latency
cold (shm_open + mmap)	314 µs
warm (pool hit, ArrayQueue CAS)	134 ns
speedup	~2340×

Flamegraph: __dquot_alloc_space at 9.56% before (filesystem quota accounting per new segment), absent after.

Reproducing

taskset -c 0,1 cargo bench -j4 -p zenoh-shm --bench shm_pool

Results land in target/criterion/shm_pool/.

Breaking Changes

None

🏷️ Label-Based Checklist

Based on the labels applied to this PR, please complete these additional requirements:

Labels: new feature

🆕 New Feature Requirements

Since this PR adds a new feature:

Feature scope documented - Clear description of what the feature does and why it's needed
Minimum necessary code - Implementation is as simple as possible, doesn't overcomplicate the system
New APIs well-designed - Public APIs are intuitive, consistent with existing APIs
Comprehensive tests - All functionality is tested (happy path + edge cases + error cases)
Examples provided - Usage examples in code comments or separate example files
Documentation added - New docs explaining the feature, its use cases, and API
Feature flag considered - Can the feature be enabled/disabled for gradual rollout?
Performance impact assessed - Memory, CPU, storage implications measured
Integration tested - Feature works with existing features

Consider: Can this feature be split into smaller, incremental PRs?

Instructions:

Check off items as you complete them (change - [ ] to - [x])
The PR checklist CI will verify these are completed

This checklist updates automatically when labels change, but preserves your checked boxes.

The Waiter::wait() hot path in the transport pipeline created and dropped an event_listener::EventListener on every batch send cycle. In perf profiles of ros-z-pingpong this showed as 12.18% CPU in drop_in_place<EventListener> on the tx-0 thread. Replace EventInner { EventLib + AtomicU8 } with { Mutex<u8> + Condvar }. This eliminates per-wakeup EventListener allocation and intrusive-list register/deregister overhead. Semantics are preserved: - Sticky notification: if notify() fires with no waiter, the next wait() returns immediately (flag stored in the mutex-protected state). - Error propagation: last Notifier drop sets ERR and wakes all waiters. - wait_async: delegated to spawn_blocking (not on the hot path). All existing zenoh-sync tests pass.

The original EventListener alloc+intrusive-list per wait() cost CPU on the transport hot path (12% in perf profiles). Previous fix attempt used spawn_blocking for wait_async(), which is not cancellation-safe: tokio::time::timeout() cancels the outer future but the blocking thread detaches and eventually consumes the next real notification, causing a hang at 64KB+ payloads (fragmented messages). New approach: - sync paths (wait/wait_deadline/wait_timeout): std Mutex<u8> + Condvar (zero allocation, no intrusive list) - async path (wait_async): tokio::sync::Notify with enable() pattern (cancellation-safe: dropped Notified futures return their permit) The enable()-before-state-check pattern ensures notifications are never missed between the state check and the .await.

…hroughput std::sync::Mutex uncontended lock/unlock costs ~90 ns; parking_lot::Mutex costs ~20 ns. The ev-fix design requires notify() to briefly acquire the condvar mutex on every call (lost-wakeup fence). With std that made the sticky (already-set) wait() path cost 179 ns vs event_listener's 31 ns. With parking_lot the sticky path measures 32 ns — matching event_listener — while the cross-thread wakeup path improves to 4.7 µs (from 5.1 µs with event_listener). No semantic change; parking_lot is already in the lockfile.

codecov · 2026-03-23T13:06:07Z

Codecov Report

❌ Patch coverage is 95.29412% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.53%. Comparing base (1fab5d0) to head (24aac22).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
commons/zenoh-sync/src/event.rs	92.40%	6 Missing ⚠️
...entations/posix/posix_shm_provider_backend_talc.rs	95.12%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2514      +/-   ##
==========================================
+ Coverage   72.48%   72.53%   +0.04%     
==========================================
  Files         390      392       +2     
  Lines       63374    63455      +81     
==========================================
+ Hits        45937    46026      +89     
+ Misses      17437    17429       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Measures the sticky (hot) path and cross-thread wakeup path. Used to compare event_listener vs the Mutex+Condvar+parking_lot replacement introduced in the preceding commits.

Sort [[bench]] keys alphabetically (harness before name) to satisfy taplo reorder_keys=true. Clear the default peer listen endpoint in pub_config of test_session_from_cloned_config to prevent the OS from assigning port 38446 to pub_session's random listener, which collided with sub_session's explicit bind on that port.

Introduce ShmSegmentPool: a per-size pool of idle POSIX SHM segments that eliminates per-message shm_open+mmap / munmap+shm_unlink cycles on the hot publish path. Design: - ShmSegmentPool stores segments in per-size ArrayQueue (crossbeam, bounded MPMC lock-free). Hot path (take/give after warmup) requires only a shared RwLock read + single CAS — no mutex waiting under concurrent stream access. - PooledPosixShmSegment: Drop impl returns the segment to the pool instead of unmapping, keeping TLB and page-cache entries hot. - New builder_with_pool(layout, ShmPoolConfig) constructor enables the pool path; existing builder() retains the traditional Talc sub- allocation path unchanged. - ShmPoolConfig::max_idle_per_size (default 8) caps resident memory at 8 MB per provider at 1 MB payload size.

Measures the two steady-state paths of ShmSegmentPool: alloc/cold/1MB ~314 µs shm_open+ftruncate+mmap+mlock per alloc alloc/warm/1MB ~134 ns ArrayQueue pop+push (two CAS, no syscalls) Ratio: ~2300x. Demonstrates what pool reuse eliminates on the hot publish path at 1 MB payload size.

YuanYuYuan added 3 commits March 23, 2026 20:13

YuanYuYuan force-pushed the feat/shm-pool branch from 2affdc2 to 41067eb Compare March 23, 2026 12:16

YuanYuYuan added the enhancement Existing things could work better label Mar 23, 2026

YuanYuYuan force-pushed the feat/shm-pool branch 3 times, most recently from e5cf1e8 to 1bf318f Compare March 23, 2026 12:42

bench(zenoh-sync): add notify_wait criterion benchmark

ca3152e

Measures the sticky (hot) path and cross-thread wakeup path. Used to compare event_listener vs the Mutex+Condvar+parking_lot replacement introduced in the preceding commits.

YuanYuYuan force-pushed the feat/shm-pool branch from 1bf318f to 54e6264 Compare March 23, 2026 13:40

YuanYuYuan added 3 commits March 23, 2026 23:18

YuanYuYuan force-pushed the feat/shm-pool branch from 54e6264 to 0ddfe24 Compare March 23, 2026 17:06

YuanYuYuan added 4 commits March 23, 2026 17:13

style(shm): fix import order for nightly rustfmt

ce898d0

fix(shm): fix clippy let-and-return in test

7c75960

fix(shm): fix broken rustdoc intra-doc link

b14d951

style(shm): fix taplo key order in bench section

0088a7e

YuanYuYuan added breaking-change Indicates that the issue implies a breaking change (be it at compile time or at runtime) new feature Something new is needed and removed breaking-change Indicates that the issue implies a breaking change (be it at compile time or at runtime) labels Mar 24, 2026

docs(shm): add z_posix_shm_provider_pooled example

24aac22

YuanYuYuan removed the enhancement Existing things could work better label Mar 24, 2026

YuanYuYuan requested a review from yellowhatter March 24, 2026 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(shm): add lock-free segment pool to PosixShmProviderBackendTalc#2514

perf(shm): add lock-free segment pool to PosixShmProviderBackendTalc#2514
YuanYuYuan wants to merge 12 commits into
eclipse-zenoh:mainfrom
ZettaScaleLabs:feat/shm-pool

YuanYuYuan commented Mar 23, 2026 •

edited by github-actions Bot

Loading

Uh oh!

codecov Bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YuanYuYuan commented Mar 23, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Benchmark Results

Reproducing

Breaking Changes

🏷️ Label-Based Checklist

🆕 New Feature Requirements

Uh oh!

codecov Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YuanYuYuan commented Mar 23, 2026 •

edited by github-actions Bot

Loading

codecov Bot commented Mar 23, 2026 •

edited

Loading