Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,15 @@ Current implementations include BitVector, RmM Tree, and LOUDS Tree. Planned add

## Skills

./.kilo/skills/ contains several project-specific skills, use them when appropriate
Shared C++ agent skills live in `agentic/cpp/skills`. Pixie-specific examples
for those skills live in `agentic/local/cpp/skills`.
Shared C++ agent commands live in `agentic/cpp/commands`. Pixie-specific
commands or command notes live in `agentic/local/cpp/commands`.

When a task matches a skill, read:

1. `agentic/cpp/skills/<skill>/SKILL.md`
2. `agentic/local/cpp/skills/<skill>/EXAMPLES.md`, if present

## Architecture

Expand Down
15 changes: 15 additions & 0 deletions agentic/cpp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Shared C++ Agent Skills

This subtree contains reusable C++ agent skills and related commands.

Keep this tree generic:

- Do not add project-specific benchmark names, CMake options, or paths.
- Keep reusable scripts beside the skills that use them.
- Put project-specific examples in the consuming repository under
`agentic/local/cpp/skills/<skill-name>/EXAMPLES.md`.

When using a skill in a project, read:

1. `agentic/cpp/skills/<skill-name>/SKILL.md`
2. `agentic/local/cpp/skills/<skill-name>/EXAMPLES.md`, if present
6 changes: 6 additions & 0 deletions agentic/cpp/commands/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Shared C++ Agent Commands

Reusable command definitions for C++ projects belong here.

Keep project-specific commands in the consuming repository under
`agentic/local/cpp/commands`.
34 changes: 34 additions & 0 deletions agentic/cpp/commands/benchmarks-affected.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
description: Scan current branch and report impacted benchmark targets/functions.
---

# Benchmarks Affected

Identify which benchmark binaries and benchmark functions are affected by changes on the current branch.

Use the `benchmarks-affected` skill as the single source of truth for workflow details and guardrails.
Do not duplicate or override the skill instructions in this command.

## Inputs

- Optional `--baseline <ref>` (default: `main`)
- Optional `--compile-commands <path>`
- Optional `--no-include-working-tree`
- Optional `--format <text|json>` (default: `text`)

## Workflow

1. Execute the `benchmarks-affected` skill workflow.
2. Pass through command inputs to the analyzer invocation defined by the skill.
3. Report results with these sections:
- Changed files
- Affected benchmark targets
- Affected benchmark functions
- Suggested `--benchmark_filter` regex
- Any warnings/failures

## Output rules

1. If `affected_benchmarks` is non-empty, prioritize those names.
2. If `affected_benchmarks` is empty but benchmark targets are affected, mark result as partial and include target-level impact.
3. Do not run full benchmark suites in this command; this command is for impact discovery only.
149 changes: 149 additions & 0 deletions agentic/cpp/commands/perf-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
description: Benchmark-driven PR performance review versus target branch
---

# Perf Review Workflow

You are performing a performance review for the current PR branch.

Non-negotiable requirements:
1. Benchmark timing plus profiling data is the highest-priority judgment tool.
2. Compare source branch versus target branch and report relevant benchmark metric changes.
3. Provide analysis and a final verdict: does the PR improve performance or not.

## Inputs

- Optional argument `--target <branch>`: target branch override.
- Optional argument `--filter <regex>`: benchmark filter regex.
- Optional argument `--no-counters`: disable hardware-counter collection.

If arguments are omitted:
- Default target branch to PR base branch from `gh pr view --json baseRefName` when available.
- Fall back target branch to `main`.

Filter handling:
- If `--filter` is provided, pass it through.
- Else use the filter produced by `benchmarks-affected` through `benchmarks-compare-revisions`.
- If no filter can be derived, run conservative full-binary compare for impacted binaries.

## Step 1 - Resolve branches and hashes

1. Resolve contender from current checkout (`HEAD`) and compute short hash.
2. Resolve baseline branch using precedence: `--target` -> PR base from `gh pr view --json baseRefName` -> `main`.
3. Resolve baseline short hash.
4. Print branch/hash mapping before benchmark execution.

## Step 2 - Run timing and hardware-counter comparison via skill (single source of truth)

Use `benchmarks-compare-revisions` as the single source of truth for revision builds, benchmark scope, compare.py flow, retry policy, and guardrails.

Pass-through inputs:
- Baseline ref/hash from Step 1.
- Contender ref/hash from Step 1.
- Optional `--filter` override.
- Counter mode: default on (`COLLECT_COUNTERS=1`) on Linux, disabled when `--no-counters` is provided.

Consume outputs from `benchmarks-compare-revisions`:
- Baseline and contender benchmark JSON artifacts.
- compare.py output per binary.
- Effective filter used.
- Scope metadata from `benchmarks-affected` (`affected_benchmark_targets`, `affected_benchmarks`) when available.
- `counters_available` status and, when unavailable, explicit reason.
- Baseline and contender counter JSON artifacts (when available).
- Derived counter metrics per benchmark (IPC, cache miss rate, branch mispredict rate).
- Counter anomaly list and ready-to-embed counter summary table.

Execution guardrails:
- Run benchmarks sequentially.
- No background jobs (`nohup`, `&`).
- Use Release timing builds only.
- If timing comparison fails, return blocked verdict with exact failure points.

## Step 3 - Consume delegated hardware-counter outputs

Hardware-counter collection is delegated to `benchmarks-compare-revisions`.

Pass-through inputs:
- `COLLECT_COUNTERS=1` by default on Linux (unless `--no-counters` is provided).
- Same baseline/contender refs and effective filter used in Step 2.

Consume outputs:
- Counter preflight result.
- Counter JSON artifacts for both revisions.
- Derived metrics (IPC, cache miss rate, branch mispredict rate).
- Anomaly list and counter summary table for report embedding.

If counters are unavailable (`counters_available=false`), continue with timing-only review and explicitly mark profiling as unavailable in the report.

## Step 4 - Analyze timing and counter data

Timing classification per benchmark entry:
- Improvement: time delta < -5%
- Regression: time delta > +5%
- Neutral: between -5% and +5%

Aggregate per binary:
- Number of improvements/regressions/neutral
- Net average percentage change
- Largest regression and largest improvement

Counter correlation:
- Use skill-provided hardware counter summary and anomaly list to explain major timing changes.
- Do not recompute derived counter metrics in this command.

Judgment priority:
- Base verdict primarily on benchmark timing comparison.
- Use counter data as explanatory evidence and confidence signal.

Noise-control expectations:
- Include at least one control benchmark family expected to be unaffected by the code change.
- Treat isolated swings without pattern as noise unless reproduced across related sizes/fill ratios.

## Step 5 - Produce final markdown report

Return a structured markdown report with this shape:

```markdown
## Performance Review: <contender_branch> vs <baseline_branch>

### Configuration
- Baseline: <branch> (<hash>)
- Contender: <branch> (<hash>)
- Platform: <os/arch>
- Benchmarks run: <binary list>
- Filter: <regex or none>
- Hardware counters: available / unavailable

### Timing Summary
| Binary | Improvements | Regressions | Neutral | Net Change |
|---|---:|---:|---:|---:|
| ... | N | N | N | +/-X% |

### Detailed Timing Results
<Annotated compare.py outputs by binary>

### Hardware Counter Profile (if available)
| Benchmark | IPC (base->new) | Cache Miss Rate (base->new) | Branch Mispredict (base->new) |
|---|---:|---:|---:|
| ... | X.XX -> Y.YY | A.A% -> B.B% | C.C% -> D.D% |

### Key Findings
- <Most important regressions/improvements>
- <Counter-based explanations for key timing shifts>

### Verdict
**[IMPROVES PERFORMANCE | REGRESSES PERFORMANCE | NO SIGNIFICANT CHANGE]**

<1-2 sentence justification grounded in benchmark metrics, with profiling context if available>
```

Verdict rules:
- `IMPROVES PERFORMANCE`: improvements outnumber regressions, no severe regression (>10%), and net average change is favorable.
- `REGRESSES PERFORMANCE`: any severe regression (>10%) or regressions dominate with net unfavorable average.
- `NO SIGNIFICANT CHANGE`: mostly neutral changes or mixed results that approximately cancel out.

## Failure Handling

- If required builds fail or timing comparison cannot run, output a blocked review with exact failure points and no misleading verdict.
- If only profiling fails (`counters_available=false` from delegated skill output), continue with timing-based verdict and explicitly list profiling limitation.
- If JSON output is invalid/truncated, discard it and rerun that benchmark command once with tighter filter and explicit output redirection.
7 changes: 7 additions & 0 deletions agentic/cpp/commands/ping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
description: Test command that replies with pong
---

Respond with exactly `pong`.
Do not add any other words.
Do not add quotes or punctuation.
81 changes: 81 additions & 0 deletions agentic/cpp/skills/benchmarks-affected/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
name: benchmarks-affected
description: Analyze current branch versus a baseline and extract affected benchmark targets and benchmark functions using compile_commands and clang AST.
---

# Benchmarks Affected Skill

Use this skill to identify exactly which benchmark binaries and benchmark functions are affected by code changes on the current branch.

It implements a two-stage workflow:

1. `compile_commands.json` analysis to determine affected compile targets.
2. Clang AST analysis to determine affected benchmark functions.

## Goal

Given `HEAD` and a baseline branch (default `main`), produce:

- Changed files.
- Affected targets (with emphasis on benchmark targets).
- Exact benchmark functions impacted by the changes.
- A ready-to-use Google Benchmark filter regex.

## Prerequisites

1. Build tree with benchmarks enabled and compile database exported. Use the
repository's normal benchmark-enabling CMake options:

```bash
BUILD_SUFFIX=local
cmake -B build/benchmarks-all_${BUILD_SUFFIX} \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON
cmake --build build/benchmarks-all_${BUILD_SUFFIX} --config Release -j
```

2. `clang++` must be available on `PATH` (used for AST dump).

For repository-specific invocations, check
`agentic/local/cpp/skills/benchmarks-affected/EXAMPLES.md` when present.

## Run

```bash
python3 agentic/cpp/skills/benchmarks-affected/analyze_benchmarks_affected.py \
--baseline main \
--compile-commands build/benchmarks-all_local/compile_commands.json \
--format json
```

If `--compile-commands` is omitted, the script auto-selects the most recently modified `build/**/compile_commands.json`.
Working tree changes are included by default. Use `--no-include-working-tree` to restrict analysis to `<baseline>...HEAD` only.

## Output

The analyzer reports:

- `affected_targets`: impacted CMake targets inferred from compile dependency analysis.
- `affected_benchmark_targets`: subset of benchmark binaries impacted.
- `affected_benchmarks`: precise benchmark function names from AST-level call analysis.
- `suggested_filter_regex`: regex to pass as `--benchmark_filter`.

## How to Use Findings

1. Build only impacted benchmark binaries where feasible.
2. Run benchmark binaries with the suggested filter:

```bash
FILTER='^(BM_Foo|BM_Bar)(/|$)'
BENCH_CPU=${BENCH_CPU:-0}
taskset -c "${BENCH_CPU}" build/benchmarks-all_local/benchmarks --benchmark_filter="${FILTER}"
```

3. If impact mapping is broad/uncertain, run full binary for selected benchmark target(s).

## Guardrails

1. Keep baseline comparison at merge-base style diff: `<baseline>...HEAD`.
2. Use Release binaries for timing runs.
3. If AST parse fails for a TU, still trust compile target impact and mark benchmark-function scope as partial.
4. If benchmark infra (`CMakeLists.txt`, benchmark source layout) changed, fall back to conservative benchmark selection.
Loading
Loading