Skip to content

DLT logging refactor#174

Open
jwinarske wants to merge 5 commits intov2.0from
jw/dlt-refactor
Open

DLT logging refactor#174
jwinarske wants to merge 5 commits intov2.0from
jw/dlt-refactor

Conversation

@jwinarske
Copy link
Copy Markdown
Contributor

No description provided.

Adds the feature-detection and compatibility shim layer for the DLT
minimal-impact refactor:

  - cmake/cxx_compat.cmake: probes std::format, std::format_to_n,
    std::jthread, std::flat_map, std::expected, std::print and sets
    IHS_HAS_* flags; ihs_apply_cxx_compat() propagates them to targets.
  - CMakeLists.txt: include(cxx_compat) after include(compiler).
  - shell/logging/dlt/compat.hpp: defines ihs:: aliases (flat_map,
    expected, format_to, println, jthread, stop_token) that map to
    native C++20/23 facilities when available, with C++17 polyfills
    (std::map, minimal expected, snprintf, fprintf, std::thread +
    atomic<bool> stop flag) otherwise. Adds IHS_FMT()/IHS_FMT_C17().

No existing sources are modified to consume the shim yet.

Signed-off-by: Joel Winarske <joel.winarske@gmail.com>
Lands the core of the DLT minimal-impact bridge as an additive shared
library (libihs_logging_dlt.so) gated on ENABLE_DLT. The legacy
dlt.cc/libdlt.cc path compiled into homescreen is untouched so existing
behavior is preserved.

New components in shell/logging/dlt/:
  - log_level.hpp           LogLevel enum matching DLT severities.
  - ring_slot.hpp           Cache-line-aligned fixed-capacity slot
                            (240-byte text, seq/level/ctx_index).
  - thread_ring.{hpp,cpp}   Wait-free SPSC ring (256 slots, pow-2 mask);
                            overflow increments a drop counter.
  - ring_registry.{hpp,cpp} Intrusive lock-free list of per-thread
                            rings; thread-local acquires on first use.
  - libdlt_loader.{hpp,cpp} dlopen("libdlt.so.2") loader with a
                            self-contained ABI type surface in
                            ihs::dlt::abi to avoid colliding with the
                            legacy libdlt.h. emit() drives the real
                            dlt_user_log_write_start/string/finish
                            sequence via an opaque scratch buffer.
  - context_cache.{hpp,cpp} ihs::flat_map<string,index> + bounded
                            (kMaxContexts=256) entry table, returning
                            ihs::expected<uint32_t, ContextError>.
  - worker.{hpp,cpp}        Drainer on ihs::jthread; wakes every 10 ms
                            or on flush()/stop(), iterates the registry
                            and dispatches slots to the loader.
  - bridge.{hpp,cpp}        DltBridge singleton: start/stop/flush,
                            acquire_context, log(), and a logf<>
                            template whose fmt param type flips
                            between std::format_string and const char*
                            depending on IHS_HAS_FORMAT_TO_N.
  - ffi_shim.{hpp,cpp}      Minimal extern "C" surface
                            (ihs_dlt_start/stop/flush/acquire_context/
                            log).

New shell/logging/logger.hpp mux header exposes IhsLogContext and the
IHS_LOGGING_START/STOP/FLUSH + IHS_LOG_{FATAL,ERROR,WARN,INFO,DEBUG,
TRACE} macros. With ENABLE_DLT off the macros expand to ((void)0); no
sources include it yet.

Build system:
  - shell/logging/CMakeLists.txt builds libihs_logging_dlt.so when
    ENABLE_DLT is ON, applies ihs_apply_cxx_compat(), links CMAKE_DL_LIBS,
    and sets hidden visibility.
  - Top-level CMakeLists.txt adds add_subdirectory(shell/logging) under
    the same guard.
  - cmake/cxx_compat.cmake: the probe helper now appends
    -std=c++${CMAKE_CXX_STANDARD} to CMAKE_REQUIRED_FLAGS so feature
    detection reflects the standard the real build will use
    (check_cxx_source_compiles does not honor CMAKE_CXX_STANDARD on
    its own).
  - shell/logging/dlt/compat.hpp: added ihs::unexpect<T,E>() helper
    that forwards to std::unexpected under C++23 and to the polyfill's
    make_error() under C++17/20, so callers can return errors uniformly.

Verified locally on GCC 15.2 with -std=c++17, -std=c++20, and
-std=c++23: all three configurations build libihs_logging_dlt.so with
zero errors. Feature probes report the expected progression
(C++17: none; C++20: format/format_to_n/jthread; C++23: all six).

Not included in this phase:
  - Unit tests for the jthread, expected, and format_to fallbacks
    (Phase 2 checklist items) — deferred; the compile matrix exercises
    the polyfill paths end-to-end but no runtime assertions yet.
  - main.cc / engine.cc / app.cc integration.

Signed-off-by: Joel Winarske <joel.winarske@gmail.com>
Integrates the DLT bridge shipped in Phase 2 into the homescreen
process. All changes are additive to the existing spdlog/legacy-Dlt
path so the old flow continues to work unchanged when ENABLE_DLT=OFF.

Process lifecycle (shell/main.cc):
  - include "logging/logger.hpp"
  - IHS_LOGGING_START("IHSC", "ivi-homescreen Flutter runtime") right
    after gLogger is constructed
  - IHS_LOGGING_FLUSH() + IHS_LOGGING_STOP() before gLogger.reset() on
    normal exit, and IHS_LOGGING_FLUSH() in the SIGINT handler so
    pending ring slots are drained before exit(0)

Engine log routing (shell/engine.cc):
  - Engine::onLogMessageCallback keeps its spdlog::info call and, under
    #if ENABLE_DLT, additionally formats the "<tag>: <message>" line
    into a 256-byte stack buffer with snprintf and pushes it through
    DltBridge::instance().log() against a cached IhsLogContext("FLTR",
    "Flutter engine"). Using the direct log() entry avoids the
    {}/-vs-% format-string portability question in the macro layer.

Flush watchdog (shell/app.{h,cc}, shell/logging/logger.hpp):
  - New IhsFlushWatchdog RAII helper in logger.hpp spins a std::thread
    that wakes every N ms (default 100) and calls DltBridge::flush().
    The worker already drains on its own schedule; this is a
    belt-and-braces guarantee of forward progress when a producer
    thread falls silent mid-burst.
  - App owns std::unique_ptr<IhsFlushWatchdog> (ENABLE_DLT-gated),
    constructed next to the existing BUILD_WATCHDOG systemd watchdog
    and torn down with the App instance.

CMake helper (cmake/logging_helpers.cmake):
  - New ihs_target_add_logging(<target>) function: adds
    shell/logging to the target's include path and, when ENABLE_DLT
    is ON, links the target against ihs_logging, defines ENABLE_DLT=1,
    and applies ihs_apply_cxx_compat(). Safe to call unconditionally;
    the ENABLE_DLT=OFF path is include-only so logger.hpp's stub macros
    compile into ((void)0).
  - Top-level CMakeLists.txt: include(logging_helpers) after cxx_compat.
  - shell/CMakeLists.txt: ihs_target_add_logging(${PROJECT_NAME}) so the
    homescreen binary picks up the mux header and, when enabled, the
    bridge shared library.

Verified libihs_logging_dlt.so still builds cleanly under
-std=c++17/20/23 on GCC 15.2 with the feature probes reporting the
expected progression.

Not in this phase (deferred to a follow-up):
  - ihs_target_add_logging() adoption across built-in plugin targets
    — plugins live in the separate ivi-homescreen-plugins tree pulled
    in via PLUGINS_DIR, so attaching the helper there is a cross-repo
    change.
  - Removal of the legacy shell/logging/logging.h spdlog→Dlt callback
    sink; it runs alongside the new bridge for now, to be retired once
    the bridge has been exercised end-to-end.

Signed-off-by: Joel Winarske <joel.winarske@gmail.com>
@jwinarske jwinarske changed the title Jw/dlt refactor DLT logging refactor Apr 6, 2026
@jwinarske jwinarske force-pushed the jw/dlt-refactor branch 3 times, most recently from 2066005 to 1472dd8 Compare April 6, 2026 20:45
Adds C++17/20/23 build-and-runtime matrix and the standalone harness it drives.
Two bug fixes that the matrix surfaced ride along.

CI workflow (.github/workflows/ihs-logging-compat-matrix.yml):
  - Runs on PRs and pushes that touch shell/logging/**, the new cmake
    helpers, or the workflow file itself.
  - Four matrix entries:
      Clang-12 / C++17 / ENABLE_DLT=ON  — exercises every polyfill in
                                          compat.hpp (jthread, expected,
                                          format_to snprintf, flat_map=map).
      Clang-14 / C++20 / ENABLE_DLT=ON  — std::format / std::jthread
                                          native; expected/flat_map still
                                          polyfilled.
      Clang-17 / C++23 / ENABLE_DLT=ON  — every IHS_HAS_* flag expected ON.
      Clang-17 / C++23 / ENABLE_DLT=OFF — spdlog-only sanity build,
                                          confirms the stub ((void)0)
                                          macros parse and link.
  - Installs the matching Clang from apt.llvm.org, configures with
    -DCMAKE_CXX_STANDARD set per entry, dumps the detected IHS_HAS_*
    cache values, builds, runs compat_smoke, and runs ctest. Uses the
    standalone harness rather than the full homescreen build so the job
    only needs a C++ toolchain — no Wayland / Flutter / third_party
    submodules.

Standalone harness (shell/logging/tests/compat-matrix/):
  - CMakeLists.txt: minimal project that locates the ivi-homescreen
    root three levels up, includes cmake/cxx_compat.cmake, and
    add_subdirectory's shell/logging. Defines a compat_smoke target
    that links against ihs_logging when ENABLE_DLT is ON, and registers
    it as a CTest case.
  - compat_smoke.cc: runtime smoke test that
      1. drives IHS_LOGGING_START/STOP/FLUSH;
      2. compiles IHS_LOG_INFO with {} placeholders behind
         IHS_HAS_FORMAT_TO_N and with %d/%s in the C++17 fallback —
         the explicit Phase 4 checklist item;
      3. pushes 128 messages through the calling thread's SPSC ring
         and asserts dropped()==0, exercising the per-thread ring,
         registry, and worker drain;
      4. constructs and tears down IhsFlushWatchdog to exercise the
         ihs::jthread polyfill on C++17 and the native std::jthread
         on C++20+;
      5. has an #if !defined(ENABLE_DLT) branch that exercises the
         stub ((void)0) macros for the spdlog-only matrix entry.

Bug fixes surfaced by the matrix:
  - shell/logging/dlt/compat.hpp: ihs::stop_source::request_stop() now
    null-checks flag_ before storing. Without this, the C++17 jthread
    polyfill segfaulted when a temporary ihs::jthread was move-assigned
    (thread_ = ihs::jthread(...) in Worker::start()): the moved-from
    temporary's destructor called request_stop() on a moved-from
    shared_ptr<atomic<bool>> and dereferenced null. C++20/23 paths use
    std::jthread and were unaffected.
  - shell/logging/CMakeLists.txt: dropped -fvisibility=hidden and
    CXX_VISIBILITY_PRESET=hidden. The C++ DltBridge / RingRegistry /
    ContextHandle API is consumed directly by the homescreen binary,
    so it needs default visibility; hiding the symbols broke linking
    from any in-process C++ consumer (the compat_smoke link failure
    is what caught it). The extern "C" ffi_shim surface is public
    regardless. Re-introducing hidden visibility with explicit
    __attribute__((visibility("default"))) annotations on the public
    C++ API is a follow-up polish item.

Verified locally on GCC 15.2 across -std=c++17/20/23 with ENABLE_DLT=ON
and additionally with ENABLE_DLT=OFF on -std=c++17. compat_smoke passes
on every configuration; the IHS_HAS_* probe set matches the documented
progression (none → format/jthread → all six).

Signed-off-by: Joel Winarske <joel.winarske@gmail.com>
Layered defenses against libdlt ABI/version skew and the stack-corruption
hazard from passing an opaque DltContextData scratch through dlsym'd
entry points.

Layer 1 — dlt_check_library_version gate (libdlt_loader.{hpp,cpp}):
  - New abi::kDltExpectedMajor / kDltExpectedMinor constants (2 / 18) as
    the single source of truth for the version contract.
  - load() resolves dlt_check_library_version (and dlt_get_version) before
    touching any other API. Three refusal modes, each producing a clean
    available_=false instead of risking a layout mismatch:
      * symbol missing  -> DltDisableReason::VersionCheckMissing
      * version mismatch -> DltDisableReason::VersionMismatch (with the
        loaded version string in the disable detail when get_version is
        available)
      * required dlt_user_log_write_* symbol still null after the version
        check passes -> DltDisableReason::RequiredSymbolMissing
  - On success the loaded version string is logged once to stderr so
    operators can correlate later disables with what was actually loaded.

Layer 2 — heap-backed scratch with sentinel guards (libdlt_loader.{hpp,cpp}):
  - OpaqueContextData (512 B, stack-local in emit()) replaced with
    abi::GuardedContextData: 4096 B payload bracketed by 64-bit
    head_guard / tail_guard set to abi::kScratchGuardPattern
    (0xDEADBEEFCAFEBABE).
  - The scratch is thread_local inside emit(), so it lives in TLS, not on
    the worker's stack frame. A libdlt overrun smashes a thread-local
    region (debuggable, ASan-visible) instead of the worker's return
    address.
  - Guards are checked after every emit; a mismatch triggers
    disable(ScratchOverrun, detail) with the observed sentinel values
    formatted into the detail line. Subsequent emits then short-circuit.
  - The per-emit memset that the first cut of this change carried has
    been deleted: TLS one-time zero-init still happens, the guards still
    bracket the buffer, and dlt_user_log_write_start is responsible for
    (re)initializing DltContextData between start and finish — so
    repeating that work on every emit is wasted memory bandwidth. Per-emit
    cost on the slow path is now ~3 ns (one TLS access plus two
    cache-hot 64-bit guard compares).

Layer 3 — single-shot fast path (libdlt_loader.{hpp,cpp}):
  - New optional fn_log_string_oneshot_, probed via
    dlsym("dlt_log_string") at load time.
  - When non-null, emit() calls it directly with (ctx, level, text) and
    skips the start/string/finish dance — and therefore the
    GuardedContextData scratch — entirely. Eliminates the entire
    layout-mismatch class of bug for this code path and is also a net
    speedup (3x fewer indirect libdlt calls per emit, no scratch).
  - Falls through to the guarded scratch path when the symbol is not
    exported.

Layer 4 — loader hardening (libdlt_loader.cpp::load()):
  - RTLD_LAZY -> RTLD_NOW. All symbol resolutions happen at dlopen time
    so a libdlt with missing entry points fails the load itself rather
    than blowing up later inside the worker thread. Costs single-digit
    milliseconds at startup; buys an entire failure mode worth of
    debuggability.
  - Honors IHS_DLT_LIBRARY=/path/to/libdlt.so.2 as a soname override.
    Empty/unset falls back to "libdlt.so.2". Lets ops swap libdlt during
    incident response without recompiling.

Layer 5 — degraded-mode observability (libdlt_loader.{hpp,cpp}):
  - New enum class DltDisableReason with one enumerator per failure path
    (Ok, LibraryNotFound, VersionCheckMissing, VersionMismatch,
    RequiredSymbolMissing, ScratchOverrun).
  - LibDltLoader::disabled_reason() exposes it; LibDltLoader::degraded_drops()
    returns an std::atomic<uint64_t> count of emit() calls that were
    dropped because the bridge had self-disabled.
  - New disable() helper writes a one-shot stderr line
    "[dlt] bridge disabled: <reason> — <detail>". This is the only place
    that flips available_ to false and the only place that prints the
    reason, so operators get exactly one line per disable event regardless
    of how many threads observe it.
  - available_ promoted from plain bool to std::atomic<bool> (acquire
    load on the hot path, release store on disable) since disable() can
    be called from emit() on any worker thread while another thread is
    still inside emit() checking the flag. Cost on x86 is one ordinary
    mov; on ARM64 one ldar. Negligible.

Hot-path performance summary:
  - Fast-path present: net speedup vs. the previous implementation.
  - Fast-path absent: net cost ~3 ns per emit (atomic load + two guard
    compares). Lost in the noise next to libdlt's own IPC.
  - Startup cost: low single-digit ms, dominated by RTLD_NOW symbol
    resolution.
  - Memory: ~4 KB TLS per thread that ever calls emit() (today: one page
    total, only the worker thread emits).

Verified locally on GCC 15.2 across -std=c++17/20/23 with both
ENABLE_DLT=ON and =OFF. The dev box has no libdlt.so.2 installed, so
compat_smoke now also exercises the new disable path:
  "[dlt] bridge disabled: library-not-found — libdlt.so.2: cannot
   open shared object file: No such file or directory"
followed by compat_smoke OK. The CI matrix should remain green for the
same reason.

Not in this change (deferred):
  - Vendor-specific single-shot symbol names. The probe currently only
    looks for "dlt_log_string"; vendor builds that export, e.g.,
    "dlt_user_log_string" will fall through to the guarded path. Add
    additional probes once the target deployments are known.
  - Runtime test for the ScratchOverrun path. Triggering it requires a
    deliberately mis-laid-out libdlt build, which the smoke harness
    cannot fabricate. The guards are in production; verifying them
    needs an integration test against an actual broken libdlt.

Signed-off-by: Joel Winarske <joel.winarske@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant