Malkovsky · Malkovsky · May 28, 2026 · May 24, 2026 · May 24, 2026 · May 24, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -8,7 +8,15 @@ Current implementations include BitVector, RmM Tree, and LOUDS Tree. Planned add
 
 ## Skills
 
-./.kilo/skills/ contains several project-specific skills, use them when appropriate
+Shared C++ agent skills live in `agentic/cpp/skills`. Pixie-specific examples
+for those skills live in `agentic/local/cpp/skills`.
+Shared C++ agent commands live in `agentic/cpp/commands`. Pixie-specific
+commands or command notes live in `agentic/local/cpp/commands`.
+
+When a task matches a skill, read:
+
+1. `agentic/cpp/skills/<skill>/SKILL.md`
+2. `agentic/local/cpp/skills/<skill>/EXAMPLES.md`, if present
 
 ## Architecture
 

diff --git a/agentic/cpp/README.md b/agentic/cpp/README.md
@@ -0,0 +1,15 @@
+# Shared C++ Agent Skills
+
+This subtree contains reusable C++ agent skills and related commands.
+
+Keep this tree generic:
+
+- Do not add project-specific benchmark names, CMake options, or paths.
+- Keep reusable scripts beside the skills that use them.
+- Put project-specific examples in the consuming repository under
+  `agentic/local/cpp/skills/<skill-name>/EXAMPLES.md`.
+
+When using a skill in a project, read:
+
+1. `agentic/cpp/skills/<skill-name>/SKILL.md`
+2. `agentic/local/cpp/skills/<skill-name>/EXAMPLES.md`, if present
diff --git a/agentic/cpp/commands/README.md b/agentic/cpp/commands/README.md
@@ -0,0 +1,6 @@
+# Shared C++ Agent Commands
+
+Reusable command definitions for C++ projects belong here.
+
+Keep project-specific commands in the consuming repository under
+`agentic/local/cpp/commands`.
diff --git a/agentic/cpp/commands/benchmarks-affected.md b/agentic/cpp/commands/benchmarks-affected.md
@@ -0,0 +1,34 @@
+---
+description: Scan current branch and report impacted benchmark targets/functions.
+---
+
+# Benchmarks Affected
+
+Identify which benchmark binaries and benchmark functions are affected by changes on the current branch.
+
+Use the `benchmarks-affected` skill as the single source of truth for workflow details and guardrails.
+Do not duplicate or override the skill instructions in this command.
+
+## Inputs
+
+- Optional `--baseline <ref>` (default: `main`)
+- Optional `--compile-commands <path>`
+- Optional `--no-include-working-tree`
+- Optional `--format <text|json>` (default: `text`)
+
+## Workflow
+
+1. Execute the `benchmarks-affected` skill workflow.
+2. Pass through command inputs to the analyzer invocation defined by the skill.
+3. Report results with these sections:
+   - Changed files
+   - Affected benchmark targets
+   - Affected benchmark functions
+   - Suggested `--benchmark_filter` regex
+   - Any warnings/failures
+
+## Output rules
+
+1. If `affected_benchmarks` is non-empty, prioritize those names.
+2. If `affected_benchmarks` is empty but benchmark targets are affected, mark result as partial and include target-level impact.
+3. Do not run full benchmark suites in this command; this command is for impact discovery only.
diff --git a/agentic/cpp/commands/perf-review.md b/agentic/cpp/commands/perf-review.md
@@ -0,0 +1,149 @@
+---
+description: Benchmark-driven PR performance review versus target branch
+---
+
+# Perf Review Workflow
+
+You are performing a performance review for the current PR branch.
+
+Non-negotiable requirements:
+1. Benchmark timing plus profiling data is the highest-priority judgment tool.
+2. Compare source branch versus target branch and report relevant benchmark metric changes.
+3. Provide analysis and a final verdict: does the PR improve performance or not.
+
+## Inputs
+
+- Optional argument `--target <branch>`: target branch override.
+- Optional argument `--filter <regex>`: benchmark filter regex.
+- Optional argument `--no-counters`: disable hardware-counter collection.
+
+If arguments are omitted:
+- Default target branch to PR base branch from `gh pr view --json baseRefName` when available.
+- Fall back target branch to `main`.
+
+Filter handling:
+- If `--filter` is provided, pass it through.
+- Else use the filter produced by `benchmarks-affected` through `benchmarks-compare-revisions`.
+- If no filter can be derived, run conservative full-binary compare for impacted binaries.
+
+## Step 1 - Resolve branches and hashes
+
+1. Resolve contender from current checkout (`HEAD`) and compute short hash.
+2. Resolve baseline branch using precedence: `--target` -> PR base from `gh pr view --json baseRefName` -> `main`.
+3. Resolve baseline short hash.
+4. Print branch/hash mapping before benchmark execution.
+
+## Step 2 - Run timing and hardware-counter comparison via skill (single source of truth)
+
+Use `benchmarks-compare-revisions` as the single source of truth for revision builds, benchmark scope, compare.py flow, retry policy, and guardrails.
+
+Pass-through inputs:
+- Baseline ref/hash from Step 1.
+- Contender ref/hash from Step 1.
+- Optional `--filter` override.
+- Counter mode: default on (`COLLECT_COUNTERS=1`) on Linux, disabled when `--no-counters` is provided.
+
+Consume outputs from `benchmarks-compare-revisions`:
+- Baseline and contender benchmark JSON artifacts.
+- compare.py output per binary.
+- Effective filter used.
+- Scope metadata from `benchmarks-affected` (`affected_benchmark_targets`, `affected_benchmarks`) when available.
+- `counters_available` status and, when unavailable, explicit reason.
+- Baseline and contender counter JSON artifacts (when available).
+- Derived counter metrics per benchmark (IPC, cache miss rate, branch mispredict rate).
+- Counter anomaly list and ready-to-embed counter summary table.
+
+Execution guardrails:
+- Run benchmarks sequentially.
+- No background jobs (`nohup`, `&`).
+- Use Release timing builds only.
+- If timing comparison fails, return blocked verdict with exact failure points.
+
+## Step 3 - Consume delegated hardware-counter outputs
+
+Hardware-counter collection is delegated to `benchmarks-compare-revisions`.
+
+Pass-through inputs:
+- `COLLECT_COUNTERS=1` by default on Linux (unless `--no-counters` is provided).
+- Same baseline/contender refs and effective filter used in Step 2.
+
+Consume outputs:
+- Counter preflight result.
+- Counter JSON artifacts for both revisions.
+- Derived metrics (IPC, cache miss rate, branch mispredict rate).
+- Anomaly list and counter summary table for report embedding.
+
+If counters are unavailable (`counters_available=false`), continue with timing-only review and explicitly mark profiling as unavailable in the report.
+
+## Step 4 - Analyze timing and counter data
+
+Timing classification per benchmark entry:
+- Improvement: time delta < -5%
+- Regression: time delta > +5%
+- Neutral: between -5% and +5%
+
+Aggregate per binary:
+- Number of improvements/regressions/neutral
+- Net average percentage change
+- Largest regression and largest improvement
+
+Counter correlation:
+- Use skill-provided hardware counter summary and anomaly list to explain major timing changes.
+- Do not recompute derived counter metrics in this command.
+
+Judgment priority:
+- Base verdict primarily on benchmark timing comparison.
+- Use counter data as explanatory evidence and confidence signal.
+
+Noise-control expectations:
+- Include at least one control benchmark family expected to be unaffected by the code change.
+- Treat isolated swings without pattern as noise unless reproduced across related sizes/fill ratios.
+
+## Step 5 - Produce final markdown report
+
+Return a structured markdown report with this shape:
+
+```markdown
+## Performance Review: <contender_branch> vs <baseline_branch>
+
+### Configuration
+- Baseline: <branch> (<hash>)
+- Contender: <branch> (<hash>)
+- Platform: <os/arch>
+- Benchmarks run: <binary list>
+- Filter: <regex or none>
+- Hardware counters: available / unavailable
+
+### Timing Summary
+| Binary | Improvements | Regressions | Neutral | Net Change |
+|---|---:|---:|---:|---:|
+| ... | N | N | N | +/-X% |
+
+### Detailed Timing Results
+<Annotated compare.py outputs by binary>
+
+### Hardware Counter Profile (if available)
+| Benchmark | IPC (base->new) | Cache Miss Rate (base->new) | Branch Mispredict (base->new) |
+|---|---:|---:|---:|
+| ... | X.XX -> Y.YY | A.A% -> B.B% | C.C% -> D.D% |
+
+### Key Findings
+- <Most important regressions/improvements>
+- <Counter-based explanations for key timing shifts>
+
+### Verdict
+**[IMPROVES PERFORMANCE | REGRESSES PERFORMANCE | NO SIGNIFICANT CHANGE]**
+
+<1-2 sentence justification grounded in benchmark metrics, with profiling context if available>
+```
+
+Verdict rules:
+- `IMPROVES PERFORMANCE`: improvements outnumber regressions, no severe regression (>10%), and net average change is favorable.
+- `REGRESSES PERFORMANCE`: any severe regression (>10%) or regressions dominate with net unfavorable average.
+- `NO SIGNIFICANT CHANGE`: mostly neutral changes or mixed results that approximately cancel out.
+
+## Failure Handling
+
+- If required builds fail or timing comparison cannot run, output a blocked review with exact failure points and no misleading verdict.
+- If only profiling fails (`counters_available=false` from delegated skill output), continue with timing-based verdict and explicitly list profiling limitation.
+- If JSON output is invalid/truncated, discard it and rerun that benchmark command once with tighter filter and explicit output redirection.
diff --git a/agentic/cpp/commands/ping.md b/agentic/cpp/commands/ping.md
@@ -0,0 +1,7 @@
+---
+description: Test command that replies with pong
+---
+
+Respond with exactly `pong`.
+Do not add any other words.
+Do not add quotes or punctuation.
diff --git a/agentic/cpp/skills/benchmarks-affected/SKILL.md b/agentic/cpp/skills/benchmarks-affected/SKILL.md
@@ -0,0 +1,81 @@
+---
+name: benchmarks-affected
+description: Analyze current branch versus a baseline and extract affected benchmark targets and benchmark functions using compile_commands and clang AST.
+---
+
+# Benchmarks Affected Skill
+
+Use this skill to identify exactly which benchmark binaries and benchmark functions are affected by code changes on the current branch.
+
+It implements a two-stage workflow:
+
+1. `compile_commands.json` analysis to determine affected compile targets.
+2. Clang AST analysis to determine affected benchmark functions.
+
+## Goal
+
+Given `HEAD` and a baseline branch (default `main`), produce:
+
+- Changed files.
+- Affected targets (with emphasis on benchmark targets).
+- Exact benchmark functions impacted by the changes.
+- A ready-to-use Google Benchmark filter regex.
+
+## Prerequisites
+
+1. Build tree with benchmarks enabled and compile database exported. Use the
+repository's normal benchmark-enabling CMake options:
+
+```bash
+BUILD_SUFFIX=local
+cmake -B build/benchmarks-all_${BUILD_SUFFIX} \
+  -DCMAKE_BUILD_TYPE=Release \
+  -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
+cmake --build build/benchmarks-all_${BUILD_SUFFIX} --config Release -j
+```
+
+2. `clang++` must be available on `PATH` (used for AST dump).
+
+For repository-specific invocations, check
+`agentic/local/cpp/skills/benchmarks-affected/EXAMPLES.md` when present.
+
+## Run
+
+```bash
+python3 agentic/cpp/skills/benchmarks-affected/analyze_benchmarks_affected.py \
+  --baseline main \
+  --compile-commands build/benchmarks-all_local/compile_commands.json \
+  --format json
+```
+
+If `--compile-commands` is omitted, the script auto-selects the most recently modified `build/**/compile_commands.json`.
+Working tree changes are included by default. Use `--no-include-working-tree` to restrict analysis to `<baseline>...HEAD` only.
+
+## Output
+
+The analyzer reports:
+
+- `affected_targets`: impacted CMake targets inferred from compile dependency analysis.
+- `affected_benchmark_targets`: subset of benchmark binaries impacted.
+- `affected_benchmarks`: precise benchmark function names from AST-level call analysis.
+- `suggested_filter_regex`: regex to pass as `--benchmark_filter`.
+
+## How to Use Findings
+
+1. Build only impacted benchmark binaries where feasible.
+2. Run benchmark binaries with the suggested filter:
+
+```bash
+FILTER='^(BM_Foo|BM_Bar)(/|$)'
+BENCH_CPU=${BENCH_CPU:-0}
+taskset -c "${BENCH_CPU}" build/benchmarks-all_local/benchmarks --benchmark_filter="${FILTER}"
+```
+
+3. If impact mapping is broad/uncertain, run full binary for selected benchmark target(s).
+
+## Guardrails
+
+1. Keep baseline comparison at merge-base style diff: `<baseline>...HEAD`.
+2. Use Release binaries for timing runs.
+3. If AST parse fails for a TU, still trust compile target impact and mark benchmark-function scope as partial.
+4. If benchmark infra (`CMakeLists.txt`, benchmark source layout) changed, fall back to conservative benchmark selection.