From a5729af3832ae86cf13828341d8f722733cf79e4 Mon Sep 17 00:00:00 2001
From: Dmitry Teryaev <doudmitry@gmail.com>
Date: Sun, 21 Jun 2026 23:42:42 +0300
Subject: [PATCH 1/2] docs(plans): execution plan for init/increment perf
 (PR-P1..PR-P3)

Adds plans/active/PLAN-INIT-INCREMENT-PERF.md and the companion
plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md implementing the approved proposal
propose/active/INIT-INCREMENT-PERF-PROPOSE.md.

Three PRs:
- PR-P1: bulk in-memory-pyarrow COPY FROM for the full rebuild path; equivalence
  harness is the merge gate.
- PR-P2: same primitive for the incremental path (Route-MERGE dedup retained).
- PR-P3: lifespan-cached LayeredIgnore (ContextKey) + is_ignored _mega memo.

No production code. Stacks behind proposal PR #338.

Co-Authored-By: Claude <noreply@anthropic.com>
---
 plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md | 356 +++++++++++++++++
 plans/active/PLAN-INIT-INCREMENT-PERF.md   | 432 +++++++++++++++++++++
 2 files changed, 788 insertions(+)
 create mode 100644 plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md
 create mode 100644 plans/active/PLAN-INIT-INCREMENT-PERF.md

diff --git a/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md b/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md
new file mode 100644
index 00000000..c2e6e12d
--- /dev/null
+++ b/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md
@@ -0,0 +1,356 @@
+# Agent task prompts — Faster init/increment (PR-P1 → PR-P3)
+
+Status: **active**. One self-contained prompt per PR. Copy the prompt verbatim
+into the agent, attach the files in its `@-files` block, and let it execute.
+
+**Workflow per PR:**
+
+1. Create the branch named in the prompt off the stated base.
+2. Read the cited plan section in full **before** writing code.
+3. Implement step-by-step; run the listed tests after each step.
+4. Run the sentinel greps — every "must return zero" line must be empty.
+5. Paste the manual-evidence output into the PR description.
+6. Open a PR with the exact title in the Definition of Done.
+
+**Universal rules for every prompt:**
+
+- Use only `.venv/bin/python` and `.venv/bin/pip` (never system python/pip).
+- `server.py` is stdio — never write to stdout from anything reachable by a tool handler.
+- Do not add a cocoindex dependency outside `java_index_flow_lancedb.py`.
+- The plan is the source of truth — if this prompt and the plan disagree, the plan wins.
+- Do not touch any file outside the prompt's `@-files` + the test files it names. If you think an adjacent file must change, **stop and ask** — don't ship it.
+- Do not loosen any existing test assertion to make it pass.
+- Breaking changes are allowed; no compatibility shims.
+
+---
+
+## PR-P1 — Bulk `COPY FROM` for the full rebuild path
+
+**Branch:** `perf/bulk-graph-writes-p1` off `master`.
+**Base:** `master`.
+**Plan section:** `plans/active/PLAN-INIT-INCREMENT-PERF.md` § PR-P1 (read this first).
+**Estimated diff size:** medium (one module + tests + a fixture).
+
+**Attach (`@-files`):**
+
+- `@build_ast_graph.py` (the full-rebuild write path: `_write_nodes`, `_write_edges`, `_write_routes_and_exposes`, `write_ladybug`, `_node_row`, `_SCHEMA_*`, `_callee_declaring_role_at_write`, `_populate_declares_rows`, `_populate_overrides_rows`)
+- `@propose/active/INIT-INCREMENT-PERF-PROPOSE.md` (design + staging invariants)
+- `@tests/test_ast_graph_build.py` (existing regression net + where the new tests go)
+- `@tests/_builders.py` (graph-build helpers: `build_ladybug_full_into`, etc.)
+
+**Prompt:**
+
+````
+You are implementing PR-P1 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`.
+
+Read the **PR-P1** section of the plan in full before writing any code, plus the
+proposal's "Staging invariants" and "Equivalence" paragraphs. The plan is the
+source of truth — if this prompt and the plan disagree, the plan wins.
+
+## Scope
+
+Implement PR-P1 exactly as specified: replace the per-row `conn.execute` writes
+in the **full rebuild path** (`write_ladybug`) with bulk in-memory-pyarrow
+`COPY FROM`. Concretely:
+
+1. **Step-1 spike (first commit):** confirm the exact REL `COPY FROM` column
+   order + `pa.Table.from_pylist` LIST typing with a throwaway 2-node + 1-edge
+   toy. Record the working incantation in the `_bulk_copy` docstring.
+2. Add `_bulk_copy(conn, table_name, columns, rows)` + the `*_COLUMNS` /
+   `_REL_*_COLUMNS` constants (column order matches `_SCHEMA_*`; REL tables list
+   FROM/TO first).
+3. Convert `_write_nodes` (full path) to stage all node rows then
+   `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`. Delete `_CREATE_SYMBOL`.
+4. Convert `_write_edges` to per-edge-type row staging (applying the SAME
+   `seen_calls`/`seen_ucs` dedup and SAME `_callee_declaring_role_at_write`
+   lookup, at staging time) then `_bulk_copy` each REL table. Delete the dead
+   `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL/UNRESOLVED/UNRESOLVED_AT` strings.
+5. Convert `_write_routes_and_exposes` (+ Client/Producer/HTTP_CALLS/ASYNC_CALLS)
+   to bulk; delete the dead `_CREATE_ROUTE/EXPOSES/CLIENT/DECLARES_CLIENT/
+   PRODUCER/DECLARES_PRODUCER/HTTP_CALL/ASYNC_CALL` strings.
+6. Convert the `GraphMeta` write (`:3472-3473`) to a single-row `_bulk_copy`.
+7. Generate + commit `tests/fixtures/graph_baseline_bank_chat.json` from the
+   **last per-row build** before you remove the per-row path.
+8. Add the four named tests.
+
+## Out of scope (do NOT touch)
+
+- The incremental path: `_write_nodes_merge`, `_MERGE_SYMBOL`, `_delete_file_scope`,
+  `incremental_rebuild`, and the pass5/6 `MERGE (r:Route)` (`:3819-3821`) stay
+  exactly as-is. PR-P1 is full-rebuild only.
+- `java_index_flow_lancedb.py`, `path_filtering.py`, `server.py`, `search_lancedb.py`.
+- Any schema (`_SCHEMA_*` DDL), ontology, or re-index change.
+- CSV or Parquet-file staging (pyarrow in-memory only).
+- Loosening any existing test.
+
+If you find yourself wanting to touch any of the above, **stop and ask**.
+
+## Deliverables
+
+1. `_bulk_copy` helper + column-order constants in `build_ast_graph.py`.
+2. `write_ladybug` full path bulk-loads all node tables then all REL tables.
+3. Dead `_CREATE_*` / `_CREATE_SYMBOL` strings removed.
+4. `tests/fixtures/graph_baseline_bank_chat.json` committed.
+5. Four new tests in `tests/test_ast_graph_build.py`.
+
+## Tests
+
+Run, all must pass:
+```
+.venv/bin/python -m pytest tests/test_ast_graph_build.py -v
+.venv/bin/python -m pytest tests/test_incremental_graph.py tests/test_bank_chat_brownfield_integration.py tests/test_call_edges_e2e.py -q
+.venv/bin/ruff check .
+```
+
+Sentinel greps — **must all return zero** (no output):
+```
+grep -n "_CREATE_SYMBOL\b" build_ast_graph.py
+grep -nE "conn\.execute\(_CREATE_(EXT|IMPL|INJ|DECL|OVERRIDES|CALL|UNRESOLVED|ROUTE|EXPOSES|CLIENT|DECLARES_CLIENT|PRODUCER|DECLARES_PRODUCER|HTTP_CALL|ASYNC_CALL)" build_ast_graph.py
+```
+
+Sentinel greps — **must be non-zero** (guards against over-deletion):
+```
+grep -n "_MERGE_SYMBOL\b" build_ast_graph.py        # incremental path, kept
+grep -n "MERGE (r:Route" build_ast_graph.py          # pass5/6 dedup, kept
+grep -n "COPY .*FROM \$rows" build_ast_graph.py      # bulk path present
+```
+
+## Manual evidence (paste in PR description)
+
+Build the fixture via the bulk path and inspect meta:
+```bash
+rm -rf /tmp/p1 && .venv/bin/python build_ast_graph.py \
+  --source-root tests/bank-chat-system \
+  --ladybug-path /tmp/p1/code_graph.lbug --verbose
+.venv/bin/java-codebase-rag meta --source-root tests/bank-chat-system --index-dir /tmp/p1
+```
+Expected: meta `counts_json` and node/edge counts **identical** to a pre-PR
+per-row build (paste both side by side). Note the graph-write phase timing from
+the `JCIRAG_PROGRESS` lines vs the pre-PR baseline.
+
+## Definition of Done
+
+- [ ] Step-1 spike result recorded in `_bulk_copy` docstring.
+- [ ] Full path bulk-loads nodes-then-edges; no per-row `CREATE` remains in the full path.
+- [ ] `_CREATE_SYMBOL` and the dead `_CREATE_*` edge/node strings deleted.
+- [ ] All four new tests pass; full regression suites pass unchanged.
+- [ ] All "must return zero" sentinel greps are empty; all "non-zero" greps hit.
+- [ ] `.venv/bin/ruff check .` clean.
+- [ ] Benchmark (before/after graph-write phase) pasted in the PR description.
+- [ ] PR title: `perf(graph): bulk COPY FROM for the full rebuild path (PR-P1)`.
+````
+
+---
+
+## PR-P2 — Bulk write for the incremental path
+
+**Branch:** `perf/bulk-graph-writes-p2` off PR-P1's branch (or `master` if PR-P1 has merged).
+**Base:** PR-P1 merged.
+**Plan section:** `plans/active/PLAN-INIT-INCREMENT-PERF.md` § PR-P2 (read this first).
+**Estimated diff size:** small-medium.
+
+**Attach (`@-files`):**
+
+- `@build_ast_graph.py` (`_write_nodes_merge`, `_MERGE_SYMBOL`, `_delete_file_scope`, `incremental_rebuild`, `_bulk_copy` + `_REL_*_COLUMNS` from PR-P1)
+- `@tests/test_incremental_graph.py` (regression net)
+
+**Prompt:**
+
+````
+You are implementing PR-P2 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`.
+
+Read the **PR-P2** section in full. It depends on PR-P1's `_bulk_copy` primitive
+and `_REL_*_COLUMNS` constants, which are already merged. The plan wins if this
+prompt disagrees.
+
+## Scope
+
+Apply PR-P1's bulk primitive to the incremental path:
+
+1. Convert `_write_nodes_merge` (`:817`, uses `_MERGE_SYMBOL`) to stage rows then
+   `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`. The incremental path is
+   delete-then-insert (`_delete_file_scope` already removed the old scope), so
+   plain `COPY` insert into the cleaned scope is correct. Delete `_MERGE_SYMBOL`;
+   remove `_write_nodes_impl` if it has no remaining caller.
+2. Convert the incremental edge re-emit to per-type staging + `_bulk_copy`,
+   scoped to the re-emitted files (same pattern as PR-P1's `_write_edges`).
+3. **Retain the pass5/6 `MERGE (r:Route)` dedup** (`:3819-3821`) verbatim and add
+   a one-line comment explaining it is intentionally kept (routes written during
+   the scoped step must MERGE, not duplicate, against the global step). Do NOT
+   bulk-convert the Route writes that this MERGE guards.
+4. Add the two named tests.
+
+## Out of scope (do NOT touch)
+
+- The full-rebuild path (already bulk in PR-P1).
+- Route dedup semantics — keep the `MERGE (r:Route)` exactly.
+- `_delete_file_scope`, `incremental_rebuild` algorithm, dependent-expansion,
+  crash-marker logic.
+- Anything outside `build_ast_graph.py` + `tests/test_incremental_graph.py`.
+
+If you find yourself wanting to touch any of the above, **stop and ask**.
+
+## Deliverables
+
+1. Incremental node re-emit via `_bulk_copy`; `_MERGE_SYMBOL` deleted.
+2. Incremental edge re-emit via per-type `_bulk_copy`.
+3. `MERGE (r:Route)` retained + commented.
+4. Two new tests in `tests/test_incremental_graph.py`.
+
+## Tests
+
+Run, all must pass:
+```
+.venv/bin/python -m pytest tests/test_incremental_graph.py -v
+.venv/bin/python -m pytest tests/test_ast_graph_build.py -q
+.venv/bin/ruff check .
+```
+
+Sentinel greps — **must return zero**:
+```
+grep -n "_MERGE_SYMBOL\b" build_ast_graph.py
+```
+
+Sentinel greps — **must be non-zero** (Route dedup still present; bulk still used):
+```
+grep -n "MERGE (r:Route" build_ast_graph.py
+grep -n "COPY .*FROM \$rows" build_ast_graph.py
+```
+
+## Manual evidence (paste in PR description)
+
+Single-file change equivalence:
+```bash
+# set up an index, touch one file, increment, then full-rebuild the same state
+# and diff the graphs (node count, per-type edge counts, GraphMeta counters).
+```
+Expected: incremental(bulk) == full-rebuild(bulk) for that state. Paste the
+side-by-side counts.
+
+## Definition of Done
+
+- [ ] Incremental node/edge re-emit uses `_bulk_copy`; `_MERGE_SYMBOL` deleted.
+- [ ] `MERGE (r:Route)` retained and commented.
+- [ ] `test_incremental_bulk_write_equivalent_to_full_rebuild`,
+      `test_incremental_route_merge_dedup_preserved` pass; full
+      `tests/test_incremental_graph.py` passes unchanged.
+- [ ] Sentinel greps pass (zero where required, non-zero where required).
+- [ ] `.venv/bin/ruff check .` clean.
+- [ ] PR title: `perf(graph): bulk COPY FROM for the incremental path (PR-P2)`.
+````
+
+---
+
+## PR-P3 — Cached `LayeredIgnore` + `is_ignored` memo
+
+**Branch:** `perf/cached-ignore-p3` off `master`.
+**Base:** `master` (independent of PR-P1/P2).
+**Plan section:** `plans/active/PLAN-INIT-INCREMENT-PERF.md` § PR-P3 (read this first).
+**Estimated diff size:** small.
+
+**Attach (`@-files`):**
+
+- `@java_index_flow_lancedb.py` (`ContextKey` defs `:60-72`, `coco_lifespan` provide sites `~:287-306`, `process_java_file`/`process_sql_file`/`process_yaml_file` `:344`/`:416`/`:464`)
+- `@path_filtering.py` (`LayeredIgnore`, `_mega`, `_mega_build_for_rel`, `is_ignored`)
+- `@tests/test_lancedb_e2e.py` (ignore test to keep green)
+
+**Prompt:**
+
+````
+You are implementing PR-P3 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`.
+
+Read the **PR-P3** section in full. It is independent of PR-P1/P2. The plan wins
+if this prompt disagrees.
+
+## Scope
+
+1. Define `IGNORE = coco.ContextKey[LayeredIgnore]("java_lance_layered_ignore")`
+   alongside `PROJECT_ROOT`/`EMBEDDER`/`LANCE_DB` (`:60-72`), reusing the SAME
+   `_ck_params` (`detect_change` vs `tracked`) detection block.
+2. In `coco_lifespan`, add `builder.provide(IGNORE, LayeredIgnore(root))` next to
+   the other `builder.provide(...)` calls — built **once** per flow run.
+3. In `process_java_file` (`:344`), `process_sql_file` (`:416`), `process_yaml_file`
+   (`:464`): add `ignore = coco.use_context(IGNORE)` and replace
+   `LayeredIgnore(project_root).is_ignored((project_root / file.file_path.path).resolve())`
+   with `ignore.is_ignored((project_root / file.file_path.path).resolve())`.
+   Keep `project_root` (still used for path resolution and `_parse_and_enrich_java`).
+4. In `path_filtering.py` `LayeredIgnore`: add `self._mega_cache` in `__init__`
+   and memoize `_mega(rel)` keyed by `Path(rel_project).parent.as_posix()` (mega
+   depends only on the directory — `_mega_build_for_rel` reads `dir_parts` only).
+   `is_ignored`/`diagnose_dict` call `_mega` unchanged and benefit transparently.
+5. Add the three named tests.
+
+## Out of scope (do NOT touch)
+
+- `build_ast_graph.py` and the graph write path (PR-P1/P2 own that).
+- The ignore *decision* logic (`_mega_build_for_rel`, `_winning_row`, negation
+  scanning) — only memoize, do not alter semantics.
+- Any schema, ontology, or re-index change.
+- Loosening any existing test.
+
+If you find yourself wanting to touch any of the above, **stop and ask**.
+
+## Deliverables
+
+1. `IGNORE` ContextKey defined + provided once in `coco_lifespan`.
+2. The three `process_*_file` functions consume it (no per-file construction).
+3. `_mega` memoized by directory in `LayeredIgnore`.
+4. Three new tests.
+
+## Tests
+
+Run, all must pass:
+```
+.venv/bin/python -m pytest tests/test_lancedb_e2e.py -q
+.venv/bin/python -m pytest tests -q -k "ignore or path_filter or vectors_progress"
+.venv/bin/ruff check .
+```
+Heavy (only if you can run cocoindex e2e locally):
+```
+JAVA_CODEBASE_RAG_RUN_HEAVY=1 .venv/bin/python -m pytest tests/test_lancedb_e2e.py -q
+```
+
+Sentinel greps — **must return zero**:
+```
+grep -nE "LayeredIgnore\(project_root\)" java_index_flow_lancedb.py
+```
+
+Sentinel greps — **must be non-zero**:
+```
+grep -n "coco.use_context(IGNORE)" java_index_flow_lancedb.py   # 3 sites
+grep -n "_mega_cache" path_filtering.py                          # memo present
+```
+
+## Manual evidence (paste in PR description)
+
+Show the ignore object is built once: with `JAVA_CODEBASE_RAG_RUN_HEAVY=1`, run
+the flow over a small corpus and log `id(ignore)` per file (temporary instrumentation),
+confirm a single object id across all files. Then confirm a micro-benchmark of
+`is_ignored` over N files drops with the `_mega` cache (same-directory files hit
+the cache).
+
+## Definition of Done
+
+- [ ] `IGNORE` ContextKey defined (version-detected) + provided once in `coco_lifespan`.
+- [ ] The three `process_*_file` consume it; no per-file `LayeredIgnore(project_root)`.
+- [ ] `_mega` memoized by directory; `is_ignored`/`diagnose_dict` results unchanged.
+- [ ] `test_is_ignored_mega_caches_by_directory`,
+      `test_layered_ignore_memo_preserves_decisions`,
+      `test_layered_ignore_provided_once_per_flow` pass.
+- [ ] Existing ignore + vectors-progress tests pass unchanged.
+- [ ] Sentinel greps pass.
+- [ ] `.venv/bin/ruff check .` clean.
+- [ ] PR title: `perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3)`.
+````
+
+---
+
+## Notes for the orchestrator
+
+- **Landing order:** PR-P1 → PR-P2 (P2 depends on P1's `_bulk_copy` +
+  `_REL_*_COLUMNS`). PR-P3 is independent and can go first, last, or between.
+- **Review between PRs:** request code review after each PR lands (see
+  `superpowers:requesting-code-review`) — the equivalence harness is the gate
+  for P1/P2; the memo-parity test is the gate for P3.
+- **Sentinel greps are binding:** a non-empty "must return zero" grep means scope
+  leaked; a empty "must be non-zero" grep means an over-deletion. Either blocks merge.
diff --git a/plans/active/PLAN-INIT-INCREMENT-PERF.md b/plans/active/PLAN-INIT-INCREMENT-PERF.md
new file mode 100644
index 00000000..60182a9e
--- /dev/null
+++ b/plans/active/PLAN-INIT-INCREMENT-PERF.md
@@ -0,0 +1,432 @@
+# Plan: Faster init/increment — bulk graph writes + cached ignore
+
+Status: **active (planning)**. This plan implements
+[`propose/active/INIT-INCREMENT-PERF-PROPOSE.md`](../../propose/active/INIT-INCREMENT-PERF-PROPOSE.md)
+(see also PR #338).
+
+Depends on: proposal PR #338 should land first (or this plan tracks against the
+proposal text directly). No code dependency on other in-flight work.
+
+## Goal
+
+- Cut `init` / `reprocess` wall-clock on a medium Java corpus from ~395s toward
+  ~120s by replacing the per-row graph write with bulk `COPY FROM` (PR-P1).
+- Extend the bulk primitive to the incremental path so `increment` on large
+  change-sets is also fast (PR-P2).
+- Remove the ~25s `LayeredIgnore` re-construction + `is_ignored` re-merge paid
+  per file in the cocoindex vectors phase (PR-P3).
+- Preserve graph contents exactly: no ontology bump, no re-index, no query-result
+  change (proven by an equivalence harness).
+
+## Principles (do not relitigate in review)
+
+- **Byte-equivalent graph.** PR-P1/P2 change only the write mechanism; the graph
+  (node/edge rows, properties, `GraphMeta` counters) must be identical to today.
+  The equivalence harness is the merge gate — a failing equivalence test blocks
+  the PR.
+- **Full rebuild path only in PR-P1.** The incremental path keeps its current
+  per-row writes until PR-P2. PR-P1 does not touch `increment`.
+- **In-memory pyarrow `COPY FROM`** is the bulk mechanism (verified: the `ladybug`
+  wrapper forwards `COPY FROM` verbatim and accepts a pyarrow/pandas param —
+  `ladybug/connection.py:337/488`). Parquet-file staging is a fallback, not the
+  default. Do not propose CSV.
+- **MPS device default is out of scope** — the flow already auto-selects MPS. Do
+  not add a device PR.
+- **No new env vars, CLI flags, or public surface.** No compatibility shims.
+
+## PR breakdown - overview
+
+| PR | Scope | Ontology bump | Areas of concern | Test buckets | Independent of |
+| --- | --- | --- | --- | --- | --- |
+| **PR-P1** | Bulk `COPY FROM` for the full rebuild path (`init`/`reprocess`) in `build_ast_graph.py` | none | REL-table `COPY FROM` column order (FROM/TO first); LIST column arrow typing; CALLS dedup + `callee_declaring_role` materialized at staging; node-before-edge load order; `GraphMeta` bulk | equivalence + determinism + baseline + full `test_ast_graph_build.py` regression | — |
+| **PR-P2** | Shared bulk primitive applied to the incremental `_delete_file_scope` → re-emit path | none | Preserve pass5/6 `MERGE (r:Route)` dedup (`build_ast_graph.py:3819-3821`); incremental delete-then-emit ordering; equivalence vs full rebuild | `test_incremental_graph.py` regression + new incremental-equivalence test | depends on **PR-P1** |
+| **PR-P3** | Hoist `LayeredIgnore` to a cocoindex `ContextKey` + memoize `is_ignored`'s `_mega` | none | `ContextKey` lifespan scoping (build once, not per file); `_mega` memo correctness (mega depends on the file's directory only — match_file stays per-file); no change to any ignore *decision* | flow ignore tests + memo unit tests + heavy vectors-progress test | independent of PR-P1, PR-P2 |
+
+Landing order: **P1 → P2**; **P3** may land in any order (independent).
+
+## Resolved design decisions
+
+| Topic | Decision |
+| --- | --- |
+| Bulk mechanism | In-memory pyarrow: `conn.execute("COPY <table> FROM $rows", {"rows": pa_table})`. Verified against `ladybug/connection.py:337` (`FROM $param`) and `:488` (pyarrow/pandas/polars accepted). |
+| REL-table column rule | First two staged columns are the FROM/TO node primary keys (`Symbol.id` / `Route.id` / etc.). Exact column *naming* for REL `COPY FROM` is confirmed in PR-P1 step 1 (a 5-line spike); the contract below assumes positional FROM/TO. |
+| `GraphMeta` MERGE (`build_ast_graph.py:3472-3473`) | Folded into PR-P1 bulk (it is in the full-rebuild path; the table is freshly created and empty, so `COPY` insert is correct). Resolves proposal Open Q1. |
+| PR-P3 cache vehicle | cocoindex `ContextKey[LayeredIgnore]` (lifespan-scoped, built once). Resolves proposal Open Q2. |
+| `is_ignored` memo | Cache `_mega(rel)` → `(mega, spec)` keyed by the file's project-relative **directory** (`Path(rel).parent.as_posix()`), because `_mega_build_for_rel` uses only `dir_parts = parts[:-1]`. `spec.match_file(rel)` stays per-file (cheap, filename-dependent). |
+
+---
+
+# PR-P1 — Bulk `COPY FROM` for the full rebuild path
+
+**Goal:** replace the per-row `conn.execute` writes in `write_ladybug` (the
+init/reprocess full-rebuild entry, `build_ast_graph.py:3893`) with bulk
+in-memory-pyarrow `COPY FROM`. Graph contents stay byte-equivalent.
+
+## File-by-file changes
+
+### 1. `build_ast_graph.py` — bulk-write primitive + staging
+
+#### 1a. New helper `_bulk_copy`
+Add near the other write helpers (after `_node_row`, ~`build_ast_graph.py:2994`):
+
+```python
+import pyarrow as pa
+
+def _bulk_copy(conn: ladybug.Connection, table_name: str, columns: list[str], rows: list[dict]) -> None:
+    """Bulk-load rows into a (node or rel) table via in-memory pyarrow COPY FROM.
+
+    `columns` fixes column order — for REL tables the first two MUST be the
+    FROM/TO node ids (kuzu requirement). Empty `rows` is a no-op.
+    """
+    if not rows:
+        return
+    tbl = pa.Table.from_pylist(rows)  # infers types from non-empty data
+    # from_pylist infers STRING / list<STRING> correctly for non-empty rows;
+    # step-1 spike confirms LIST columns (modifiers/annotations/capabilities)
+    # infer as pa.list_(pa.string()). If any column is all-empty-list, pass an
+    # explicit pa.schema derived from the _SCHEMA_* strings instead.
+    conn.execute(f"COPY {table_name} FROM $rows", {"rows": tbl})
+```
+
+Column-order constants (define next to the `_SCHEMA_*` strings, ~`:2812`), each
+matching its `_SCHEMA_*` column order exactly — for REL tables the first two
+entries are the endpoint ids:
+
+- `NODE_COLUMNS` = the `Symbol` property columns in `_SCHEMA_NODE` order.
+- `ROUTE_COLUMNS`, `CLIENT_COLUMNS`, `PRODUCER_COLUMNS`, `GRAPHMETA_COLUMNS`,
+  `UNRESOLVED_CALL_SITE_COLUMNS` — same, from their `_SCHEMA_*`.
+- For each REL table: `_REL_*_COLUMNS` = `["FROM", "TO", <props in _SCHEMA_* order>]`
+  for `EXTENDS/IMPLEMENTS/INJECTS/DECLARES/OVERRIDES/CALLS/UNRESOLVED_AT/EXPOSES/
+  DECLARES_CLIENT/DECLARES_PRODUCER/HTTP_CALLS/ASYNC_CALLS`.
+
+> **Step-1 spike (mandatory, first commit of PR-P1):** confirm (a) the exact
+> REL `COPY FROM` column naming — build a 2-node + 1-edge toy, `COPY CALLS FROM
+> $rows` with columns `["FROM","TO","source_file","resolved",...]`, assert the
+> edge lands with correct endpoints; (b) `pa.Table.from_pylist` type inference
+> for the `modifiers`/`annotations`/`capabilities` LIST columns. Record the
+> working incantation in the helper docstring. If inference fails on empty
+> lists, build an explicit `pa.schema` map keyed by table.
+
+#### 1b. Convert `_write_nodes` (full path) to bulk
+`_write_nodes` (`:3096`) currently calls `_write_nodes_impl(..., symbol_query=_CREATE_SYMBOL)`
+which loops `conn.execute(_CREATE_SYMBOL, _node_row(...))` per node. Replace the
+full-path body: iterate packages / files / types / members exactly as
+`_write_nodes_impl` does, but append each `_node_row(...)` to a list, then call
+`_bulk_copy(conn, "Symbol", NODE_COLUMNS, node_rows)`. Keep the role/capability
+resolution (`resolve_role_and_capabilities`) and `tables.type_role_by_node_id`
+population identical (those are pure in-memory steps before staging).
+
+`_write_nodes_impl` + `_MERGE_SYMBOL` stay — they are the **incremental** path
+(`_write_nodes_merge`, `:817`) used until PR-P2. `_CREATE_SYMBOL` becomes dead
+(only `_write_nodes` used it) → delete it.
+
+#### 1c. Convert `_write_edges` to bulk
+`_write_edges` (`:3244`) loops per edge, calling `conn.execute(_CREATE_EXT,
+{...})` etc. with two dedup sets (`seen_calls` `:3282-3288`, `seen_ucs`
+`:3317-3321`) and a `callee_declaring_role` lookup (`_callee_declaring_role_at_write`,
+`:1647`/`:3302`). Restructure to accumulate **per-edge-type row lists**, applying
+the *same* dedup and *same* `callee_declaring_role` materialization, then
+`_bulk_copy` each REL table once:
+
+```python
+extends_rows: list[dict] = []
+...
+for ...:
+    extends_rows.append({"FROM": src_id, "TO": dst_id, "source_file": ..., "dst_name": ..., "dst_fqn": ..., "resolved": ...})
+...
+calls_rows.append(...)  # only when key not in seen_calls; include "callee_declaring_role"
+...
+_bulk_copy(conn, "EXTENDS", _REL_EXTENDS_COLUMNS, extends_rows)
+_bulk_copy(conn, "CALLS", _REL_CALLS_COLUMNS, calls_rows)   # CALLS_COLUMNS ends with callee_declaring_role
+...
+```
+
+The dedup sets and `_callee_declaring_role_at_write` calls are computed
+**before** appending (i.e. at staging), exactly preserving current semantics.
+`_CREATE_EXT`/`_CREATE_IMPL`/`_CREATE_INJ`/`_CREATE_DECL`/`_CREATE_OVERRIDES`/
+`_CREATE_CALL`/`_CREATE_UNRESOLVED`/`_CREATE_UNRESOLVED_AT` become dead → delete.
+
+`_populate_declares_rows` (`:3189`) and `_populate_overrides_rows` (`:3206`) are
+pure in-memory population — unchanged; they feed the DECLARES/OVERRIDES row lists.
+
+#### 1d. Convert `_write_routes_and_exposes` (and Client/Producer/HTTP_CALLS/ASYNC_CALLS) to bulk
+Same pattern (`~`:3349-3408`): stage `Route`/`Client`/`Producer` node rows and
+`EXPOSES`/`DECLARES_CLIENT`/`DECLARES_PRODUCER`/`HTTP_CALLS`/`ASYNC_CALLS` edge
+rows (with FROM/TO ids), then `_bulk_copy` each. The pass5/6 `MERGE (r:Route)`
+dedup (`:3819-3821`) is **incremental-only** and untouched in PR-P1; in the full
+path routes are emitted once into a fresh table, so plain `COPY` insert is
+correct. Delete the now-dead `_CREATE_ROUTE`/`_CREATE_EXPOSES`/`_CREATE_CLIENT`/
+`_CREATE_DECLARES_CLIENT`/`_CREATE_PRODUCER`/`_CREATE_DECLARES_PRODUCER`/
+`_CREATE_HTTP_CALL`/`_CREATE_ASYNC_CALL` strings.
+
+#### 1e. Convert `GraphMeta` write to bulk
+The `MERGE (m:GraphMeta {key: $k})` loop (`:3472-3473`) becomes a single-row
+`_bulk_copy(conn, "GraphMeta", GRAPHMETA_COLUMNS, [meta_row])` (table is freshly
+created and empty in a full rebuild). Keep the row contents (ontology_version,
+built_at, source_root, parse_errors, counts_json) identical.
+
+#### 1f. `write_ladybug` load order (node-before-edge)
+`write_ladybug` (`:3893`) already calls `_write_nodes` → `_write_edges` →
+`_write_routes_and_exposes` → GraphMeta. Preserve this order: all node tables
+(`Symbol`, `Route`, `Client`, `Producer`, `UnresolvedCallSite`, `GraphMeta`) are
+bulk-loaded before any REL table, so endpoint rows exist when kuzu validates REL
+`COPY FROM`. Add an inline comment stating the ordering invariant.
+
+### 2. `tests/test_ast_graph_build.py` — equivalence harness + regression
+
+Add an equivalence harness and a committed baseline. The existing 26 tests in
+this file (e.g. `test_schema_has_all_expected_tables`,
+`test_graph_meta_present_and_versioned`, `test_each_edge_type_populated`,
+`test_pass3_callee_declaring_role_bank_annotated_types`,
+`test_pass3_unresolved_call_site_emitted`, `test_pass3_known_external_calls_preserved`)
+must continue to pass unchanged — they ARE the regression net.
+
+Add a committed baseline artifact `tests/fixtures/graph_baseline_bank_chat.json`
+(node count, per-type edge counts, `GraphMeta` counters, and N=3 sampled edge
+property rows per type incl. `source_file` and CALLS `callee_declaring_role`).
+Generated once during PR-P1 from the **last per-row build** before its removal,
+committed, and regenerated only when `ontology_version` changes.
+
+### 3. `scripts/bench_init_graph_write.py` (new, dev-only) — benchmark
+A small script that times `init` (or `build_ast_graph.py --source-root
+<corpus>`) on a medium corpus and prints the graph-write phase delta vs the
+pre-PR baseline. Not shipped in the package; lives under `scripts/`. Documents
+the measured speedup in the PR description.
+
+## Tests for PR-P1
+
+1. `test_bulk_write_graph_matches_per_row_baseline` — build `tests/bank-chat-system`
+   via the bulk path, assert node count, per-type edge counts, `GraphMeta`
+   counters, and the sampled edge property rows equal `tests/fixtures/graph_baseline_bank_chat.json`.
+2. `test_bulk_write_is_deterministic_double_build` — build bank-chat twice to two
+   DB paths via the bulk path, assert identical node count, per-type edge counts,
+   `GraphMeta` counters, and a query battery (`MATCH (s:Symbol) RETURN ...`,
+   `neighbors`-shaped reads). Models on existing
+   `test_29_determinism_pass4_route_ids` / `test_overrides_edge_set_deterministic_double_build`.
+3. `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role` — assert
+   CALLS rows are deduped by `(src,dst,argc,line)` and carry the correct
+   `callee_declaring_role` (reuse the `@Service` callee assertion from
+   `test_pass3_callee_declaring_role_bank_annotated_types` against a bulk build).
+4. `test_bulk_write_empty_rel_table_is_noop` — a corpus with no `EXTENDS` edges
+   must not error (`_bulk_copy` no-ops on empty rows) and the table is empty.
+
+**Must-still-pass (regression, do not loosen):** the full `tests/test_ast_graph_build.py`,
+`tests/test_bank_chat_brownfield_integration.py`, `tests/test_call_edges_e2e.py`,
+and `tests/test_incremental_graph.py` suites (incremental path untouched).
+
+## Definition of done (PR-P1)
+
+- [ ] `_bulk_copy` helper added; step-1 spike result recorded in its docstring.
+- [ ] `write_ladybug` full path writes all node tables then all REL tables via `_bulk_copy`; no per-row `CREATE` remains in the full path.
+- [ ] Dead `_CREATE_*` strings + `_CREATE_SYMBOL` deleted.
+- [ ] `test_bulk_write_graph_matches_per_row_baseline`,
+      `test_bulk_write_is_deterministic_double_build`,
+      `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role`,
+      `test_bulk_write_empty_rel_table_is_noop` added and pass.
+- [ ] Full existing graph/edge/brownfield/incremental test suites pass unchanged.
+- [ ] `.venv/bin/ruff check .` clean.
+- [ ] Benchmark numbers (before/after graph-write phase) pasted in the PR description.
+- [ ] PR title: `perf(graph): bulk COPY FROM for the full rebuild path (PR-P1)`.
+
+## Implementation step list
+
+| # | Step | File(s) | Done when |
+| --- | --- | --- | --- |
+| 1 | Spike: confirm REL `COPY FROM` column order + `from_pylist` LIST typing | `build_ast_graph.py` (throwaway) | toy 2-node+1-edge `COPY CALLS FROM $rows` lands with correct endpoints; result in `_bulk_copy` docstring |
+| 2 | Add `_bulk_copy` + column-order constants | `build_ast_graph.py` | helper + `_REL_*_COLUMNS`/`*_COLUMNS` defined |
+| 3 | Convert `_write_nodes` (full) to bulk; delete `_CREATE_SYMBOL` | `build_ast_graph.py` | node tables bulk-loaded; Symbol count matches baseline |
+| 4 | Convert `_write_edges` to per-type staging + bulk; delete dead `_CREATE_*` edge strings | `build_ast_graph.py` | edge counts match baseline; dedup + callee_declaring_role preserved |
+| 5 | Convert `_write_routes_and_exposes` (+ Client/Producer/HTTP_CALLS/ASYNC_CALLS) to bulk | `build_ast_graph.py` | route/client/producer/call-edge counts match baseline |
+| 6 | Convert GraphMeta write to bulk | `build_ast_graph.py` | `test_graph_meta_present_and_versioned` passes |
+| 7 | Generate + commit baseline fixture | `tests/fixtures/graph_baseline_bank_chat.json` | produced from last per-row build before removal |
+| 8 | Add 4 equivalence/determinism tests | `tests/test_ast_graph_build.py` | all pass |
+| 9 | Run full regression + ruff; capture benchmark | repo | suites green; numbers in PR description |
+
+---
+
+# PR-P2 — Bulk write for the incremental path
+
+**Goal:** apply the PR-P1 bulk primitive to the incremental rebuild path so
+`increment` on a large change-set is also fast. **Depends on PR-P1** (`_bulk_copy`
++ column constants + the staging pattern).
+
+## File-by-file changes
+
+### 1. `build_ast_graph.py` — incremental path bulk conversion
+- `_write_nodes_merge` (`:817`, uses `_MERGE_SYMBOL`): the incremental path is
+  delete-then-insert (`_delete_file_scope`, `:673`, removes changed/dependent
+  files' nodes+edges first), so re-emit is into a cleaned scope → convert the
+  per-row `_MERGE_SYMBOL` loop to `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`.
+  `_MERGE_SYMBOL` becomes dead → delete; `_write_nodes_impl` is removed if no
+  caller remains (it does not, after P1+P2).
+- Incremental edge re-emit: apply the same per-type staging + `_bulk_copy` as
+  PR-P1's `_write_edges`, scoped to the re-emitted files.
+- **Preserve the pass5/6 `MERGE (r:Route)` dedup** (`:3819-3821`): routes written
+  during the scoped step must still MERGE (not duplicate) against routes from the
+  global step. Keep those specific `MERGE` statements; only convert the
+  non-Route, non-dedup writes to bulk. Add a comment that this MERGE is
+  intentionally retained.
+- `incremental_rebuild` (`:3535`): no algorithmic change; it calls into the
+  converted write helpers.
+
+### 2. `tests/test_incremental_graph.py` — incremental equivalence
+The existing 28 tests (e.g. `test_incremental_single_file_change`,
+`test_incremental_new_file`, `test_incremental_deleted_file`,
+`test_incremental_phantom_nodes_preserved`, `test_incremental_dependent_expansion`,
+`test_incremental_expansion_cap_fallback`, `test_incremental_crash_marker_*`,
+`test_incremental_no_changes_is_noop`, `test_delete_file_scope_preserves_dependent_nodes`)
+must pass unchanged.
+
+## Tests for PR-P2
+
+1. `test_incremental_bulk_write_equivalent_to_full_rebuild` — make a single-file
+   change, run `increment` (bulk incremental), then build the same state via a
+   full rebuild (bulk full), assert identical node count, per-type edge counts,
+   and `GraphMeta` counters.
+2. `test_incremental_route_merge_dedup_preserved` — a corpus where pass5/6 would
+   re-emit an existing route; assert no duplicate `Route` rows after `increment`
+   (the retained `MERGE (r:Route)` still dedups).
+
+**Must-still-pass:** full `tests/test_incremental_graph.py`.
+
+## Definition of done (PR-P2)
+
+- [ ] Incremental node/edge re-emit uses `_bulk_copy`; `_MERGE_SYMBOL` deleted.
+- [ ] pass5/6 `MERGE (r:Route)` dedup retained and commented.
+- [ ] `test_incremental_bulk_write_equivalent_to_full_rebuild`,
+      `test_incremental_route_merge_dedup_preserved` added and pass.
+- [ ] Full `tests/test_incremental_graph.py` passes unchanged.
+- [ ] `.venv/bin/ruff check .` clean.
+- [ ] PR title: `perf(graph): bulk COPY FROM for the incremental path (PR-P2)`.
+
+## Implementation step list
+
+| # | Step | File(s) | Done when |
+| --- | --- | --- | --- |
+| 1 | Convert `_write_nodes_merge` to bulk; delete `_MERGE_SYMBOL` | `build_ast_graph.py` | incremental node re-emit via `_bulk_copy` |
+| 2 | Convert incremental edge re-emit to per-type staging + bulk | `build_ast_graph.py` | incremental edge counts match full rebuild |
+| 3 | Retain + comment the pass5/6 `MERGE (r:Route)` dedup | `build_ast_graph.py` | no duplicate Route rows after increment |
+| 4 | Add incremental-equivalence + route-dedup tests | `tests/test_incremental_graph.py` | both pass; full suite green |
+| 5 | ruff + full regression | repo | clean + green |
+
+---
+
+# PR-P3 — Cached `LayeredIgnore` (+ `is_ignored` memo) as a `ContextKey`
+
+**Goal:** remove the ~25s paid per-flow-run from re-constructing
+`LayeredIgnore(project_root)` per file and re-merging `_mega(spec)` per file.
+**Independent** of PR-P1/P2.
+
+## File-by-file changes
+
+### 1. `java_index_flow_lancedb.py` — `ContextKey` for the ignore object
+- Define `IGNORE = coco.ContextKey[LayeredIgnore]("java_lance_layered_ignore")`
+  alongside the existing `PROJECT_ROOT`/`EMBEDDER`/`LANCE_DB` definitions
+  (`:60-72`), using the **same** `_ck_params` (`detect_change` vs `tracked`)
+  detection block so it works across cocoindex versions.
+- In `coco_lifespan` (where `builder.provide(PROJECT_ROOT, root)` /
+  `builder.provide(EMBEDDER, embedder)` are called, `~`:287-306`), add
+  `builder.provide(IGNORE, LayeredIgnore(root))` — built **once** per flow run.
+- In `process_java_file` (`:344`), `process_sql_file` (`:416`),
+  `process_yaml_file` (`:464`): replace
+  `if LayeredIgnore(project_root).is_ignored((project_root / file.file_path.path).resolve()):`
+  (`:351`/`:423`/`:471`) with
+  `ignore = coco.use_context(IGNORE)` then
+  `if ignore.is_ignored((project_root / file.file_path.path).resolve()):`.
+  `project_root` stays (still needed for `_parse_and_enrich_java` and path
+  resolution).
+
+### 2. `path_filtering.py` — memoize `_mega` by directory
+- Add an instance cache `self._mega_cache: dict[str, tuple[list[str], GitIgnoreSpec, list]] = {}`
+  in `LayeredIgnore.__init__`.
+- In `LayeredIgnore._mega` (`:334`): key on
+  `dir_rel = Path(rel_project).parent.as_posix()`. If present, return the cached
+  `(mega, spec, meta)`; else compute via `_mega_build_for_rel` + compile, store,
+  return. Correctness rests on `_mega_build_for_rel` reading only
+  `dir_parts = parts[:-1]` (verified: it never reads the filename).
+- `is_ignored` (`:345`) and `diagnose_dict` (`:377`) keep calling `_mega(rel)`
+  unchanged — they transparently benefit from the cache.
+- Add a module/env knob `LAYERED_IGNORE_MEGA_CACHE` only if a disable path is
+  needed for debugging — **default enabled**, no new public surface otherwise.
+
+## Tests for PR-P3
+
+1. `test_is_ignored_mega_caches_by_directory` — build a `LayeredIgnore` over a
+   fixture with nested ignore files; call `is_ignored` for several files in the
+   same directory; assert `_mega` is computed once (spy on `_mega_build_for_rel`
+   or count `GitIgnoreSpec.from_lines` calls) and that decisions match the
+   uncached path.
+2. `test_layered_ignore_memo_preserves_decisions` — for a corpus of paths
+   (including nested `.java-codebase-rag/ignore` + `.gitignore` with negations),
+   assert `LayeredIgnore(...).is_ignored(p)` is identical with and without the
+   memo cache (correctness invariant).
+3. `test_layered_ignore_provided_once_per_flow` (HEAVY,
+   `JAVA_CODEBASE_RAG_RUN_HEAVY=1`) — run the real cocoindex flow over a small
+   corpus; assert the `LayeredIgnore` provided via `IGNORE` is a single instance
+   (identity check), not reconstructed per file.
+
+**Must-still-pass:** `tests/test_lancedb_e2e.py::test_lancedb_ignore_file_reduces_indexed_java_files`,
+any `tests/test_path_filtering*` / ignore tests, and the heavy
+`test_flow_emits_vectors_progress_per_file`.
+
+## Definition of done (PR-P3)
+
+- [ ] `IGNORE` `ContextKey` defined (version-detected) + provided once in `coco_lifespan`.
+- [ ] The three `process_*_file` functions consume it; no per-file `LayeredIgnore(project_root)`.
+- [ ] `_mega` memoized by directory; `is_ignored`/`diagnose_dict` unchanged in result.
+- [ ] `test_is_ignored_mega_caches_by_directory`,
+      `test_layered_ignore_memo_preserves_decisions`,
+      `test_layered_ignore_provided_once_per_flow` added and pass.
+- [ ] Existing ignore + vectors-progress tests pass unchanged.
+- [ ] `.venv/bin/ruff check .` clean.
+- [ ] PR title: `perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3)`.
+
+## Implementation step list
+
+| # | Step | File(s) | Done when |
+| --- | --- | --- | --- |
+| 1 | Add `IGNORE` ContextKey (version-detected) + provide in lifespan | `java_index_flow_lancedb.py` | `IGNORE` resolvable in process_*_file |
+| 2 | Switch the three `process_*_file` to `coco.use_context(IGNORE)` | `java_index_flow_lancedb.py` | no per-file construction |
+| 3 | Add `_mega` dirname cache | `path_filtering.py` | repeated same-dir calls hit cache |
+| 4 | Add memo + correctness + once-per-flow tests | `tests/` | all pass |
+| 5 | ruff + regression (incl. heavy ignore/vectors tests) | repo | clean + green |
+
+---
+
+# Cross-PR risks and mitigations
+
+| # | Risk | Severity | Mitigation |
+| --- | --- | --- | --- |
+| 1 | REL-table `COPY FROM` column ordering/naming differs from assumption | High | PR-P1 step-1 spike locks the exact incantation before any conversion; recorded in `_bulk_copy` docstring. |
+| 2 | LIST columns (`modifiers`/`annotations`/`capabilities`) mis-type under `from_pylist` | Medium | Spike asserts LIST inference; explicit `pa.schema` fallback documented. Equivalence test catches any row diff. |
+| 3 | CALLS dedup / `callee_declaring_role` drift when moving to staging | High | `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role` + the sampled-edge baseline check; semantics computed identically, just earlier. |
+| 4 | Bulk write changes observable query results | High | Equivalence + determinism tests + full existing graph/edge/brownfield suites are the gate. |
+| 5 | PR-P2 breaks the pass5/6 Route dedup | Medium | `MERGE (r:Route)` retained by name; `test_incremental_route_merge_dedup_preserved` guards it. |
+| 6 | PR-P3 `_mega` memo returns stale rules if dirParts logic changes | Low | `_mega_build_for_rel` reads only `dir_parts`; `test_layered_ignore_memo_preserves_decisions` asserts parity with uncached path. |
+| 7 | `ContextKey` lifespan differs across cocoindex versions | Low | Reuse the existing `_ck_params` `detect_change`/`tracked` detection block used by `PROJECT_ROOT`. |
+
+# Out of scope
+
+- MPS embedding default (already auto-selected — see proposal Out of scope).
+- ANN vector index (#337) and `watch` mode (#336).
+- Replacing/restructuring the cocoindex flow; changing embedding model/dim.
+- Parallelizing graph analysis passes (pass1–pass6).
+- Converting the incremental path in PR-P1 (it is PR-P2).
+- Parquet-file or CSV bulk paths (pyarrow in-memory only).
+
+# Whole-plan done definition
+
+1. `init`/`reprocess` graph-write phase on the medium corpus drops from ~321s to
+   tens of seconds (benchmark in PR-P1 description).
+2. `increment` on a large change-set uses the bulk path (PR-P2).
+3. The vectors phase pays no per-file `LayeredIgnore`/`_mega` cost (PR-P3).
+4. No ontology bump (`ontology_version` stays 17); no re-index required; all
+   existing graph/edge/brownfield/incremental/ignore/vectors tests pass.
+5. Proposal moved to `propose/completed/` and this plan to `plans/completed/`
+   once all three PRs land.
+
+# Tracking
+
+- `PR-P1`: _pending_
+- `PR-P2`: _pending_ (blocked by PR-P1)
+- `PR-P3`: _pending_

From 3839f364d4cd41ae324b9bb2ef15f9a109ed1763 Mon Sep 17 00:00:00 2001
From: Dmitry Teryaev <doudmitry@gmail.com>
Date: Mon, 22 Jun 2026 00:06:44 +0300
Subject: [PATCH 2/2] docs(plans): apply review feedback to init/increment perf
 plan
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

5-lens subagent review of the plan found the PR-P1/P2 boundary was
architecturally wrong: the graph write helpers are SHARED between the full
and incremental paths, so a "full-path-only" split is impossible.
- Verified call graph: _write_edges/_write_routes_and_exposes/_write_nodes_impl/
  _write_meta are each called by BOTH paths; _write_clients_producers_and_calls
  is incremental-only (global pass5/6).
- Re-split by write-FUNCTION: PR-P1 = _bulk_copy + _write_edges (the ~250s
  prize, accelerates both paths); PR-P2 = _write_nodes_impl +
  _write_routes_and_exposes + _write_clients_producers_and_calls; PR-P3 = ignore
  cache (independent).
- GraphMeta (_write_meta) left on MERGE (shared, one row) — reverses Open Q1.
- Fixed all binding sentinel greps: PR-P1 zeros the edge _CREATE_* only;
  PR-P2 zeros node/route/client constants + _MERGE_SYMBOL only after both
  routes functions convert; PR-P3 sentinel narrowed to
  LayeredIgnore(project_root).is_ignored (the bare-constructor grep wrongly
  matched once-per-run sites :177/:569, which are correctly left alone).
- Load-order §1f corrected (UnresolvedCallSite before UNRESOLVED_AT;
  Route/Client/Producer before their edges). Test files qualified
  (test_brownfield_routes / test_mcp_v2_compose / test_vectors_progress /
  test_path_filtering). PR-P2 tests placed in TestIncrementalOrchestrator.
  Baseline flagged as equivalence anchor, not production invariant.
  PR-P1 DoD lists the four test names.

Co-Authored-By: Claude <noreply@anthropic.com>
---
 plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md | 309 +++++------
 plans/active/PLAN-INIT-INCREMENT-PERF.md   | 566 ++++++++++-----------
 2 files changed, 433 insertions(+), 442 deletions(-)

diff --git a/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md b/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md
index c2e6e12d..d1c602fb 100644
--- a/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md
+++ b/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md
@@ -3,12 +3,18 @@
 Status: **active**. One self-contained prompt per PR. Copy the prompt verbatim
 into the agent, attach the files in its `@-files` block, and let it execute.
 
+> **Revised after the PR #339 subagent review.** PRs are split by **write-function**,
+> not by path — the graph write helpers are shared between the full and incremental
+> paths, so converting one accelerates both. Sentinel greps were corrected (the
+> PR-P3 "must return zero" grep previously over-matched once-per-run sites).
+
 **Workflow per PR:**
 
 1. Create the branch named in the prompt off the stated base.
 2. Read the cited plan section in full **before** writing code.
 3. Implement step-by-step; run the listed tests after each step.
-4. Run the sentinel greps — every "must return zero" line must be empty.
+4. Run the sentinel greps — every "must return zero" line must be empty, every
+   "must be non-zero" line must hit.
 5. Paste the manual-evidence output into the PR description.
 6. Open a PR with the exact title in the Definition of Done.
 
@@ -18,13 +24,13 @@ into the agent, attach the files in its `@-files` block, and let it execute.
 - `server.py` is stdio — never write to stdout from anything reachable by a tool handler.
 - Do not add a cocoindex dependency outside `java_index_flow_lancedb.py`.
 - The plan is the source of truth — if this prompt and the plan disagree, the plan wins.
-- Do not touch any file outside the prompt's `@-files` + the test files it names. If you think an adjacent file must change, **stop and ask** — don't ship it.
+- Do not touch any file outside the prompt's `@-files` + the test files it names. If you think an adjacent file must change, **stop and ask**.
 - Do not loosen any existing test assertion to make it pass.
 - Breaking changes are allowed; no compatibility shims.
 
 ---
 
-## PR-P1 — Bulk `COPY FROM` for the full rebuild path
+## PR-P1 — Bulk `COPY FROM` for `_write_edges` (shared; the ~250s prize)
 
 **Branch:** `perf/bulk-graph-writes-p1` off `master`.
 **Base:** `master`.
@@ -33,63 +39,62 @@ into the agent, attach the files in its `@-files` block, and let it execute.
 
 **Attach (`@-files`):**
 
-- `@build_ast_graph.py` (the full-rebuild write path: `_write_nodes`, `_write_edges`, `_write_routes_and_exposes`, `write_ladybug`, `_node_row`, `_SCHEMA_*`, `_callee_declaring_role_at_write`, `_populate_declares_rows`, `_populate_overrides_rows`)
-- `@propose/active/INIT-INCREMENT-PERF-PROPOSE.md` (design + staging invariants)
-- `@tests/test_ast_graph_build.py` (existing regression net + where the new tests go)
-- `@tests/_builders.py` (graph-build helpers: `build_ladybug_full_into`, etc.)
+- `@build_ast_graph.py` (focus: `_write_edges:3244`, `_node_row:2994`, `_SCHEMA_*:2812-2940`, `_callee_declaring_role_at_write:1647`, `seen_calls:3282`, `seen_ucs:3317`, `_populate_declares_rows:3189`)
+- `@propose/active/INIT-INCREMENT-PERF-PROPOSE.md` (design; on PR #338's branch — if absent, the staging invariants are inlined in the plan §PR-P1)
+- `@tests/test_ast_graph_build.py` (regression net + where new tests go)
+- `@tests/test_incremental_graph.py` (`_write_edges` is shared — incremental regression is binding)
+- `@tests/_builders.py` (graph-build helpers)
 
 **Prompt:**
 
 ````
 You are implementing PR-P1 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`.
 
-Read the **PR-P1** section of the plan in full before writing any code, plus the
-proposal's "Staging invariants" and "Equivalence" paragraphs. The plan is the
-source of truth — if this prompt and the plan disagree, the plan wins.
+Read the **PR-P1** section in full. The plan wins if this prompt disagrees.
 
-## Scope
+KEY FACT: `_write_edges` (build_ast_graph.py:3244) is a SHARED helper called by
+BOTH the full path (write_ladybug:3926) and the incremental path (:805). So
+converting it accelerates both — there is no "full-only" edge conversion. PR-P1
+converts ONLY `_write_edges` (+ adds the `_bulk_copy` primitive). Nodes, routes,
+clients/producers, and GraphMeta are PR-P2.
 
-Implement PR-P1 exactly as specified: replace the per-row `conn.execute` writes
-in the **full rebuild path** (`write_ladybug`) with bulk in-memory-pyarrow
-`COPY FROM`. Concretely:
+## Scope
 
 1. **Step-1 spike (first commit):** confirm the exact REL `COPY FROM` column
-   order + `pa.Table.from_pylist` LIST typing with a throwaway 2-node + 1-edge
+   naming + `pa.Table.from_pylist` typing with a throwaway 2-Symbol + 1-CALLS
    toy. Record the working incantation in the `_bulk_copy` docstring.
-2. Add `_bulk_copy(conn, table_name, columns, rows)` + the `*_COLUMNS` /
-   `_REL_*_COLUMNS` constants (column order matches `_SCHEMA_*`; REL tables list
-   FROM/TO first).
-3. Convert `_write_nodes` (full path) to stage all node rows then
-   `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`. Delete `_CREATE_SYMBOL`.
-4. Convert `_write_edges` to per-edge-type row staging (applying the SAME
+2. Add `_bulk_copy(conn, table_name, columns, rows)` + the `_REL_*_COLUMNS` /
+   column constants (REL tables list FROM/TO first; match `_SCHEMA_*` order).
+3. Convert `_write_edges` to per-edge-type row staging (apply the SAME
    `seen_calls`/`seen_ucs` dedup and SAME `_callee_declaring_role_at_write`
-   lookup, at staging time) then `_bulk_copy` each REL table. Delete the dead
-   `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL/UNRESOLVED/UNRESOLVED_AT` strings.
-5. Convert `_write_routes_and_exposes` (+ Client/Producer/HTTP_CALLS/ASYNC_CALLS)
-   to bulk; delete the dead `_CREATE_ROUTE/EXPOSES/CLIENT/DECLARES_CLIENT/
-   PRODUCER/DECLARES_PRODUCER/HTTP_CALL/ASYNC_CALL` strings.
-6. Convert the `GraphMeta` write (`:3472-3473`) to a single-row `_bulk_copy`.
-7. Generate + commit `tests/fixtures/graph_baseline_bank_chat.json` from the
-   **last per-row build** before you remove the per-row path.
-8. Add the four named tests.
+   lookup at staging, before appending), then `_bulk_copy` each REL table.
+   Bulk-load UnresolvedCallSite NODE rows before UNRESOLVED_AT edges (Symbol
+   nodes are already loaded by `_write_nodes`).
+4. Delete the dead module constants `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL`
+   and the local `_CREATE_UNRESOLVED`/`_CREATE_UNRESOLVED_AT` (defined inside
+   `_write_edges` at :3307/:3313 — removed with the rewrite).
+5. Generate + commit `tests/fixtures/graph_baseline_bank_chat.json` from the
+   last per-row `_write_edges` build before removal.
+6. Add the four named tests.
 
 ## Out of scope (do NOT touch)
 
-- The incremental path: `_write_nodes_merge`, `_MERGE_SYMBOL`, `_delete_file_scope`,
-  `incremental_rebuild`, and the pass5/6 `MERGE (r:Route)` (`:3819-3821`) stay
-  exactly as-is. PR-P1 is full-rebuild only.
-- `java_index_flow_lancedb.py`, `path_filtering.py`, `server.py`, `search_lancedb.py`.
-- Any schema (`_SCHEMA_*` DDL), ontology, or re-index change.
-- CSV or Parquet-file staging (pyarrow in-memory only).
-- Loosening any existing test.
+- Node writes (`_write_nodes`, `_write_nodes_impl`, `_write_nodes_merge`,
+  `_CREATE_SYMBOL`, `_MERGE_SYMBOL`) — PR-P2.
+- Routes/clients/producers/calls (`_write_routes_and_exposes`,
+  `_write_clients_producers_and_calls`, and their `_CREATE_*` constants
+  including `_CREATE_CLIENT`/`_CREATE_PRODUCER`/`_CREATE_ROUTE` etc.) — PR-P2.
+- `_write_meta` / GraphMeta — leave the MERGE alone (PR-P2 also leaves it).
+- `java_index_flow_lancedb.py`, `path_filtering.py`, `server.py`.
+- Any schema/ontology/re-index change. CSV or Parquet-file staging.
 
 If you find yourself wanting to touch any of the above, **stop and ask**.
 
 ## Deliverables
 
-1. `_bulk_copy` helper + column-order constants in `build_ast_graph.py`.
-2. `write_ladybug` full path bulk-loads all node tables then all REL tables.
-3. Dead `_CREATE_*` / `_CREATE_SYMBOL` strings removed.
+1. `_bulk_copy` helper + column-order constants.
+2. `_write_edges` stages per-type rows and bulk-loads (CALLS dedup + callee_declaring_role at staging; UnresolvedCallSite before UNRESOLVED_AT).
+3. Dead `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` + locals removed.
 4. `tests/fixtures/graph_baseline_bank_chat.json` committed.
 5. Four new tests in `tests/test_ast_graph_build.py`.
 
@@ -97,93 +102,99 @@ If you find yourself wanting to touch any of the above, **stop and ask**.
 
 Run, all must pass:
 ```
-.venv/bin/python -m pytest tests/test_ast_graph_build.py -v
-.venv/bin/python -m pytest tests/test_incremental_graph.py tests/test_bank_chat_brownfield_integration.py tests/test_call_edges_e2e.py -q
+.venv/bin/python -m pytest tests/test_ast_graph_build.py tests/test_incremental_graph.py -v
+.venv/bin/python -m pytest tests/test_bank_chat_brownfield_integration.py tests/test_call_edges_e2e.py -q
 .venv/bin/ruff check .
 ```
 
-Sentinel greps — **must all return zero** (no output):
+Sentinel greps — **must return zero**:
 ```
-grep -n "_CREATE_SYMBOL\b" build_ast_graph.py
-grep -nE "conn\.execute\(_CREATE_(EXT|IMPL|INJ|DECL|OVERRIDES|CALL|UNRESOLVED|ROUTE|EXPOSES|CLIENT|DECLARES_CLIENT|PRODUCER|DECLARES_PRODUCER|HTTP_CALL|ASYNC_CALL)" build_ast_graph.py
+grep -nE "_CREATE_(EXT|IMPL|INJ|DECL|OVERRIDES|CALL|UNRESOLVED|UNRESOLVED_AT)\b" build_ast_graph.py
 ```
 
-Sentinel greps — **must be non-zero** (guards against over-deletion):
+Sentinel greps — **must be non-zero** (guards against over-deletion; these belong to PR-P2 / are retained):
 ```
-grep -n "_MERGE_SYMBOL\b" build_ast_graph.py        # incremental path, kept
-grep -n "MERGE (r:Route" build_ast_graph.py          # pass5/6 dedup, kept
-grep -n "COPY .*FROM \$rows" build_ast_graph.py      # bulk path present
+grep -n "_MERGE_SYMBOL\b" build_ast_graph.py            # node upsert, kept until PR-P2
+grep -n "_CREATE_CLIENT\b" build_ast_graph.py           # routes/clients, PR-P2
+grep -n "MERGE (r:Route" build_ast_graph.py             # pass5/6 dedup, kept
+grep -n "COPY .*FROM \$rows" build_ast_graph.py         # bulk path present
 ```
 
 ## Manual evidence (paste in PR description)
 
-Build the fixture via the bulk path and inspect meta:
 ```bash
 rm -rf /tmp/p1 && .venv/bin/python build_ast_graph.py \
   --source-root tests/bank-chat-system \
   --ladybug-path /tmp/p1/code_graph.lbug --verbose
 .venv/bin/java-codebase-rag meta --source-root tests/bank-chat-system --index-dir /tmp/p1
 ```
-Expected: meta `counts_json` and node/edge counts **identical** to a pre-PR
-per-row build (paste both side by side). Note the graph-write phase timing from
-the `JCIRAG_PROGRESS` lines vs the pre-PR baseline.
+Expected: meta `counts_json` + node/edge counts identical to a pre-PR per-row
+build (paste both). Note the graph-write phase timing from `JCIRAG_PROGRESS`
+lines vs the pre-PR baseline.
 
 ## Definition of Done
 
 - [ ] Step-1 spike result recorded in `_bulk_copy` docstring.
-- [ ] Full path bulk-loads nodes-then-edges; no per-row `CREATE` remains in the full path.
-- [ ] `_CREATE_SYMBOL` and the dead `_CREATE_*` edge/node strings deleted.
-- [ ] All four new tests pass; full regression suites pass unchanged.
-- [ ] All "must return zero" sentinel greps are empty; all "non-zero" greps hit.
-- [ ] `.venv/bin/ruff check .` clean.
-- [ ] Benchmark (before/after graph-write phase) pasted in the PR description.
-- [ ] PR title: `perf(graph): bulk COPY FROM for the full rebuild path (PR-P1)`.
+- [ ] `_write_edges` stages per-type rows (CALLS dedup + callee_declaring_role at staging); UnresolvedCallSite bulk-loaded before UNRESOLVED_AT.
+- [ ] `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` + local `_CREATE_UNRESOLVED/_UNRESOLVED_AT` deleted.
+- [ ] `test_bulk_write_edges_match_per_row_baseline`, `test_bulk_write_is_deterministic_double_build`, `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role`, `test_bulk_write_empty_rel_table_is_noop` pass.
+- [ ] Full `test_ast_graph_build.py` + `test_incremental_graph.py` pass unchanged.
+- [ ] Sentinel greps: zero where required, non-zero where required.
+- [ ] `.venv/bin/ruff check .` clean; benchmark in PR description.
+- [ ] PR title: `perf(graph): bulk COPY FROM for _write_edges (PR-P1)`.
 ````
 
 ---
 
-## PR-P2 — Bulk write for the incremental path
+## PR-P2 — Bulk write for nodes + routes/clients/producers/calls
 
-**Branch:** `perf/bulk-graph-writes-p2` off PR-P1's branch (or `master` if PR-P1 has merged).
+**Branch:** `perf/bulk-graph-writes-p2` off PR-P1's branch (or `master` if PR-P1 merged).
 **Base:** PR-P1 merged.
 **Plan section:** `plans/active/PLAN-INIT-INCREMENT-PERF.md` § PR-P2 (read this first).
-**Estimated diff size:** small-medium.
+**Estimated diff size:** medium.
 
 **Attach (`@-files`):**
 
-- `@build_ast_graph.py` (`_write_nodes_merge`, `_MERGE_SYMBOL`, `_delete_file_scope`, `incremental_rebuild`, `_bulk_copy` + `_REL_*_COLUMNS` from PR-P1)
-- `@tests/test_incremental_graph.py` (regression net)
+- `@build_ast_graph.py` (`_write_nodes_impl:3029`, `_write_nodes:3096`, `_write_nodes_merge:817`, `_CREATE_SYMBOL`, `_MERGE_SYMBOL`, `_write_routes_and_exposes:3338`, `_write_clients_producers_and_calls:3810`, the Route `MERGE (r:Route` at `:3819`, `_write_meta:3421`, `_bulk_copy` + `_REL_*_COLUMNS` from PR-P1)
+- `@tests/test_incremental_graph.py` (regression; add to `TestIncrementalOrchestrator`)
 
 **Prompt:**
 
 ````
 You are implementing PR-P2 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`.
-
-Read the **PR-P2** section in full. It depends on PR-P1's `_bulk_copy` primitive
-and `_REL_*_COLUMNS` constants, which are already merged. The plan wins if this
-prompt disagrees.
+Read the **PR-P2** section in full. It reuses PR-P1's `_bulk_copy` +
+`_REL_*_COLUMNS`. The plan wins.
 
 ## Scope
 
-Apply PR-P1's bulk primitive to the incremental path:
-
-1. Convert `_write_nodes_merge` (`:817`, uses `_MERGE_SYMBOL`) to stage rows then
-   `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`. The incremental path is
-   delete-then-insert (`_delete_file_scope` already removed the old scope), so
-   plain `COPY` insert into the cleaned scope is correct. Delete `_MERGE_SYMBOL`;
-   remove `_write_nodes_impl` if it has no remaining caller.
-2. Convert the incremental edge re-emit to per-type staging + `_bulk_copy`,
-   scoped to the re-emitted files (same pattern as PR-P1's `_write_edges`).
-3. **Retain the pass5/6 `MERGE (r:Route)` dedup** (`:3819-3821`) verbatim and add
-   a one-line comment explaining it is intentionally kept (routes written during
-   the scoped step must MERGE, not duplicate, against the global step). Do NOT
-   bulk-convert the Route writes that this MERGE guards.
-4. Add the two named tests.
+1. Convert `_write_nodes_impl` (:3029, the shared workhorse called by
+   `_write_nodes` full + `_write_nodes_merge` incremental) to stage Symbol rows
+   then `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`. Delete `_CREATE_SYMBOL`
+   and `_MERGE_SYMBOL` (both dead once the workhorse is bulk). Do the existing
+   `resolve_role_and_capabilities` + `type_role_by_node_id` population before
+   staging, unchanged.
+2. Convert `_write_routes_and_exposes` (:3338, shared) to per-table staging +
+   `_bulk_copy` for Route/EXPOSES/Client/Producer/DECLARES_CLIENT/DECLARES_PRODUCER/
+   HTTP_CALLS/ASYNC_CALLS (keep the existing `_file_by_node_id`/`_file_by_client_id`/
+   `_file_by_producer_id` source_file resolution). Bulk-load Route/Client/Producer
+   NODES before the EXPOSES/DECLARES_*/HTTP_CALLS/ASYNC_CALLS edges. Delete
+   `_CREATE_ROUTE`/`_CREATE_EXPOSES`.
+3. Convert `_write_clients_producers_and_calls` (:3810, incremental-only global
+   pass5/6) Client/Producer/edge writes to per-type staging + `_bulk_copy` (keep
+   the `member_by_id`/`client_by_id`/`producer_by_id` resolution). **Retain the
+   `MERGE (r:Route {id:$id}) …` dedup (:3819-3828) verbatim** + add a one-line
+   comment it is intentionally kept. Now delete the 6 shared constants —
+   `_CREATE_CLIENT`/`_CREATE_PRODUCER`/`_CREATE_DECLARES_CLIENT`/
+   `_CREATE_DECLARES_PRODUCER`/`_CREATE_HTTP_CALL`/`_CREATE_ASYNC_CALL` — which
+   are dead only after BOTH routes/exposes and clients_producers functions convert.
+4. Leave `_write_meta` (:3421) and its `MERGE (m:GraphMeta …)` UNTOUCHED.
+5. Add the two named tests as methods of `TestIncrementalOrchestrator` in
+   `tests/test_incremental_graph.py`.
 
 ## Out of scope (do NOT touch)
 
-- The full-rebuild path (already bulk in PR-P1).
-- Route dedup semantics — keep the `MERGE (r:Route)` exactly.
+- `_write_edges` (done in PR-P1).
+- `_write_meta` / GraphMeta MERGE — leave it.
 - `_delete_file_scope`, `incremental_rebuild` algorithm, dependent-expansion,
   crash-marker logic.
 - Anything outside `build_ast_graph.py` + `tests/test_incremental_graph.py`.
@@ -192,28 +203,30 @@ If you find yourself wanting to touch any of the above, **stop and ask**.
 
 ## Deliverables
 
-1. Incremental node re-emit via `_bulk_copy`; `_MERGE_SYMBOL` deleted.
-2. Incremental edge re-emit via per-type `_bulk_copy`.
-3. `MERGE (r:Route)` retained + commented.
-4. Two new tests in `tests/test_incremental_graph.py`.
+1. `_write_nodes_impl` bulk; `_CREATE_SYMBOL` + `_MERGE_SYMBOL` deleted.
+2. `_write_routes_and_exposes` bulk; `_CREATE_ROUTE`/`_CREATE_EXPOSES` deleted.
+3. `_write_clients_producers_and_calls` Client/Producer/edges bulk; `MERGE (r:Route)` retained + commented; 6 shared `_CREATE_*` deleted.
+4. `_write_meta` untouched.
+5. Two new tests in `TestIncrementalOrchestrator`.
 
 ## Tests
 
 Run, all must pass:
 ```
-.venv/bin/python -m pytest tests/test_incremental_graph.py -v
-.venv/bin/python -m pytest tests/test_ast_graph_build.py -q
+.venv/bin/python -m pytest tests/test_incremental_graph.py tests/test_ast_graph_build.py -v
 .venv/bin/ruff check .
 ```
 
 Sentinel greps — **must return zero**:
 ```
-grep -n "_MERGE_SYMBOL\b" build_ast_graph.py
+grep -nE "_CREATE_(SYMBOL|ROUTE|EXPOSES|CLIENT|PRODUCER|DECLARES_CLIENT|DECLARES_PRODUCER|HTTP_CALL|ASYNC_CALL)\b" build_ast_graph.py
+grep -nE "_MERGE_SYMBOL\b" build_ast_graph.py
 ```
 
-Sentinel greps — **must be non-zero** (Route dedup still present; bulk still used):
+Sentinel greps — **must be non-zero** (Route dedup + GraphMeta MERGE retained; bulk present):
 ```
 grep -n "MERGE (r:Route" build_ast_graph.py
+grep -n "MERGE (m:GraphMeta" build_ast_graph.py
 grep -n "COPY .*FROM \$rows" build_ast_graph.py
 ```
 
@@ -221,22 +234,21 @@ grep -n "COPY .*FROM \$rows" build_ast_graph.py
 
 Single-file change equivalence:
 ```bash
-# set up an index, touch one file, increment, then full-rebuild the same state
-# and diff the graphs (node count, per-type edge counts, GraphMeta counters).
+# set up an index, touch one file, increment (bulk), then full-rebuild the same
+# state (bulk) and diff graphs (node count, per-type edge counts, GraphMeta).
 ```
-Expected: incremental(bulk) == full-rebuild(bulk) for that state. Paste the
-side-by-side counts.
+Expected: incremental(bulk) == full-rebuild(bulk) for that state. Paste side-by-side counts.
 
 ## Definition of Done
 
-- [ ] Incremental node/edge re-emit uses `_bulk_copy`; `_MERGE_SYMBOL` deleted.
-- [ ] `MERGE (r:Route)` retained and commented.
-- [ ] `test_incremental_bulk_write_equivalent_to_full_rebuild`,
-      `test_incremental_route_merge_dedup_preserved` pass; full
-      `tests/test_incremental_graph.py` passes unchanged.
-- [ ] Sentinel greps pass (zero where required, non-zero where required).
+- [ ] `_write_nodes_impl` bulk; `_CREATE_SYMBOL` + `_MERGE_SYMBOL` deleted.
+- [ ] `_write_routes_and_exposes` bulk (Route/Client/Producer before edges); `_CREATE_ROUTE`/`_CREATE_EXPOSES` deleted.
+- [ ] `_write_clients_producers_and_calls` Client/Producer/edges bulk; `MERGE (r:Route)` retained + commented; 6 shared `_CREATE_*` deleted.
+- [ ] `_write_meta` untouched.
+- [ ] `test_incremental_bulk_write_equivalent_to_full_rebuild`, `test_incremental_route_merge_dedup_preserved` (both in `TestIncrementalOrchestrator`) pass; full `test_incremental_graph.py` + `test_ast_graph_build.py` green.
+- [ ] Sentinel greps pass.
 - [ ] `.venv/bin/ruff check .` clean.
-- [ ] PR title: `perf(graph): bulk COPY FROM for the incremental path (PR-P2)`.
+- [ ] PR title: `perf(graph): bulk COPY FROM for nodes, routes, clients/producers (PR-P2)`.
 ````
 
 ---
@@ -250,58 +262,63 @@ side-by-side counts.
 
 **Attach (`@-files`):**
 
-- `@java_index_flow_lancedb.py` (`ContextKey` defs `:60-72`, `coco_lifespan` provide sites `~:287-306`, `process_java_file`/`process_sql_file`/`process_yaml_file` `:344`/`:416`/`:464`)
-- `@path_filtering.py` (`LayeredIgnore`, `_mega`, `_mega_build_for_rel`, `is_ignored`)
-- `@tests/test_lancedb_e2e.py` (ignore test to keep green)
+- `@java_index_flow_lancedb.py` (`ContextKey` defs `:60-72`, `coco_lifespan` provide sites `:287-306`, `process_java_file:345`/`process_sql_file:417`/`process_yaml_file:465`)
+- `@path_filtering.py` (`LayeredIgnore`, `_mega:334`, `_mega_build_for_rel:193`, `is_ignored:345`, `diagnose_dict:377`)
+- `@tests/test_path_filtering.py` (where the two memo unit tests go)
+- `@tests/test_lancedb_e2e.py` (HEAVY once-per-flow test)
 
 **Prompt:**
 
 ````
 You are implementing PR-P3 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`.
 
-Read the **PR-P3** section in full. It is independent of PR-P1/P2. The plan wins
-if this prompt disagrees.
+Read the **PR-P3** section in full. Independent of PR-P1/P2. The plan wins.
+
+KEY FACT: `LayeredIgnore(project_root)` appears at FIVE sites in
+java_index_flow_lancedb.py — :177, :351, :423, :471, :569. PR-P3 converts ONLY
+the three `process_*_file` sites (:351/:423/:471). The other two (:177 in
+`_approximate_vectors_total`, :569 in the app_main pre-walk) call
+`cocoindex_excluded_patterns()` ONCE PER RUN — leave them alone.
 
 ## Scope
 
 1. Define `IGNORE = coco.ContextKey[LayeredIgnore]("java_lance_layered_ignore")`
-   alongside `PROJECT_ROOT`/`EMBEDDER`/`LANCE_DB` (`:60-72`), reusing the SAME
+   alongside `PROJECT_ROOT`/`EMBEDDER`/`LANCE_DB` (:60-72), reusing the SAME
    `_ck_params` (`detect_change` vs `tracked`) detection block.
-2. In `coco_lifespan`, add `builder.provide(IGNORE, LayeredIgnore(root))` next to
-   the other `builder.provide(...)` calls — built **once** per flow run.
-3. In `process_java_file` (`:344`), `process_sql_file` (`:416`), `process_yaml_file`
-   (`:464`): add `ignore = coco.use_context(IGNORE)` and replace
+2. In `coco_lifespan` (:287-306), add `builder.provide(IGNORE, LayeredIgnore(root))`
+   — built ONCE per flow run.
+3. In `process_java_file`/`process_sql_file`/`process_yaml_file`: add
+   `ignore = coco.use_context(IGNORE)` and replace
    `LayeredIgnore(project_root).is_ignored((project_root / file.file_path.path).resolve())`
    with `ignore.is_ignored((project_root / file.file_path.path).resolve())`.
-   Keep `project_root` (still used for path resolution and `_parse_and_enrich_java`).
+   Keep `project_root`. DO NOT touch :177 or :569.
 4. In `path_filtering.py` `LayeredIgnore`: add `self._mega_cache` in `__init__`
-   and memoize `_mega(rel)` keyed by `Path(rel_project).parent.as_posix()` (mega
-   depends only on the directory — `_mega_build_for_rel` reads `dir_parts` only).
-   `is_ignored`/`diagnose_dict` call `_mega` unchanged and benefit transparently.
-5. Add the three named tests.
+   and memoize `_mega(rel)` keyed by `Path(rel_project).parent.as_posix()`
+   (`_mega_build_for_rel` reads only `dir_parts`, so this is correct).
+5. Add the three named tests in the right files.
 
 ## Out of scope (do NOT touch)
 
-- `build_ast_graph.py` and the graph write path (PR-P1/P2 own that).
+- `build_ast_graph.py` and the graph write path (PR-P1/P2).
 - The ignore *decision* logic (`_mega_build_for_rel`, `_winning_row`, negation
-  scanning) — only memoize, do not alter semantics.
-- Any schema, ontology, or re-index change.
-- Loosening any existing test.
+  scanning) — only memoize.
+- Sites :177 and :569.
+- Any schema/ontology/re-index change. Loosening any existing test.
 
 If you find yourself wanting to touch any of the above, **stop and ask**.
 
 ## Deliverables
 
-1. `IGNORE` ContextKey defined + provided once in `coco_lifespan`.
-2. The three `process_*_file` functions consume it (no per-file construction).
+1. `IGNORE` ContextKey (version-detected) + provided once in `coco_lifespan`.
+2. The three `process_*_file` consume it; :177/:569 untouched.
 3. `_mega` memoized by directory in `LayeredIgnore`.
-4. Three new tests.
+4. Three tests: two in `tests/test_path_filtering.py`, one (HEAVY) in `tests/test_lancedb_e2e.py`.
 
 ## Tests
 
 Run, all must pass:
 ```
-.venv/bin/python -m pytest tests/test_lancedb_e2e.py -q
+.venv/bin/python -m pytest tests/test_path_filtering.py tests/test_lancedb_e2e.py -q
 .venv/bin/python -m pytest tests -q -k "ignore or path_filter or vectors_progress"
 .venv/bin/ruff check .
 ```
@@ -310,9 +327,9 @@ Heavy (only if you can run cocoindex e2e locally):
 JAVA_CODEBASE_RAG_RUN_HEAVY=1 .venv/bin/python -m pytest tests/test_lancedb_e2e.py -q
 ```
 
-Sentinel greps — **must return zero**:
+Sentinel greps — **must return zero** (matches ONLY the 3 process sites; :177/:569 use the bare constructor + `cocoindex_excluded_patterns`, not `.is_ignored`):
 ```
-grep -nE "LayeredIgnore\(project_root\)" java_index_flow_lancedb.py
+grep -nE "LayeredIgnore\(project_root\)\.is_ignored" java_index_flow_lancedb.py
 ```
 
 Sentinel greps — **must be non-zero**:
@@ -323,20 +340,17 @@ grep -n "_mega_cache" path_filtering.py                          # memo present
 
 ## Manual evidence (paste in PR description)
 
-Show the ignore object is built once: with `JAVA_CODEBASE_RAG_RUN_HEAVY=1`, run
-the flow over a small corpus and log `id(ignore)` per file (temporary instrumentation),
-confirm a single object id across all files. Then confirm a micro-benchmark of
-`is_ignored` over N files drops with the `_mega` cache (same-directory files hit
-the cache).
+With `JAVA_CODEBASE_RAG_RUN_HEAVY=1`, run the flow over a small corpus and log
+`id(ignore)` per file (temporary instrumentation) — confirm a single object id
+across all files. Then micro-benchmark `is_ignored` over N files: same-directory
+files hit the `_mega` cache.
 
 ## Definition of Done
 
-- [ ] `IGNORE` ContextKey defined (version-detected) + provided once in `coco_lifespan`.
-- [ ] The three `process_*_file` consume it; no per-file `LayeredIgnore(project_root)`.
+- [ ] `IGNORE` ContextKey (version-detected) + provided once in `coco_lifespan`.
+- [ ] The three `process_*_file` consume it; :177/:569 untouched.
 - [ ] `_mega` memoized by directory; `is_ignored`/`diagnose_dict` results unchanged.
-- [ ] `test_is_ignored_mega_caches_by_directory`,
-      `test_layered_ignore_memo_preserves_decisions`,
-      `test_layered_ignore_provided_once_per_flow` pass.
+- [ ] `test_is_ignored_mega_caches_by_directory`, `test_layered_ignore_memo_preserves_decisions` (in `tests/test_path_filtering.py`), `test_layered_ignore_provided_once_per_flow` (in `tests/test_lancedb_e2e.py`, HEAVY) pass.
 - [ ] Existing ignore + vectors-progress tests pass unchanged.
 - [ ] Sentinel greps pass.
 - [ ] `.venv/bin/ruff check .` clean.
@@ -347,10 +361,13 @@ the cache).
 
 ## Notes for the orchestrator
 
-- **Landing order:** PR-P1 → PR-P2 (P2 depends on P1's `_bulk_copy` +
-  `_REL_*_COLUMNS`). PR-P3 is independent and can go first, last, or between.
-- **Review between PRs:** request code review after each PR lands (see
-  `superpowers:requesting-code-review`) — the equivalence harness is the gate
-  for P1/P2; the memo-parity test is the gate for P3.
-- **Sentinel greps are binding:** a non-empty "must return zero" grep means scope
-  leaked; a empty "must be non-zero" grep means an over-deletion. Either blocks merge.
+- **Landing order:** PR-P1 → PR-P2 (P2 needs `_bulk_copy` + `_REL_*_COLUMNS`).
+  PR-P3 is independent (can go first, last, or between).
+- **Shared-helper awareness:** `_write_edges`, `_write_routes_and_exposes`,
+  `_write_nodes_impl`, `_write_meta` are each called by BOTH paths. Converting
+  one accelerates both — so `test_incremental_graph.py` is a binding gate for
+  PR-P1 and PR-P2, not just PR-P2.
+- **Review between PRs** (`superpowers:requesting-code-review`): the equivalence
+  harness gates P1/P2; the memo-parity test gates P3.
+- **Sentinel greps are binding:** a non-empty "must return zero" grep = scope
+  leak; an empty "must be non-zero" grep = over-deletion. Either blocks merge.
diff --git a/plans/active/PLAN-INIT-INCREMENT-PERF.md b/plans/active/PLAN-INIT-INCREMENT-PERF.md
index 60182a9e..b6cff842 100644
--- a/plans/active/PLAN-INIT-INCREMENT-PERF.md
+++ b/plans/active/PLAN-INIT-INCREMENT-PERF.md
@@ -1,46 +1,64 @@
 # Plan: Faster init/increment — bulk graph writes + cached ignore
 
 Status: **active (planning)**. This plan implements
-[`propose/active/INIT-INCREMENT-PERF-PROPOSE.md`](../../propose/active/INIT-INCREMENT-PERF-PROPOSE.md)
-(see also PR #338).
-
-Depends on: proposal PR #338 should land first (or this plan tracks against the
-proposal text directly). No code dependency on other in-flight work.
+[`propose/active/INIT-INCREMENT-PERF-PROPOSE.md`](../../propose/active/INIT-INCREMENT-PERF-PROPOSE.md).
+The proposal lands via PR #338; this plan lands via PR #339 and **stacks behind
+#338** (the proposal file is on #338's branch until it merges). The staging
+invariants are inlined below so this plan is self-contained if #338 has not
+merged yet.
+
+Depends on: PR #338 (proposal) — non-blocking for reading this plan (key
+invariants inlined), but the proposal should merge first.
+
+> **Revised after a 5-lens subagent review (PR #339 thread).** The original
+> draft assumed `_write_edges` / `_write_routes_and_exposes` / `_write_meta`
+> were full-rebuild-only and split PRs by *path*. They are **shared by both
+> paths** (verified: `_write_edges` `build_ast_graph.py:3244` is called at
+> `:3926` full + `:805` incremental; `_write_routes_and_exposes` `:3338` at
+> `:3930` full + `:811` incremental; `_write_meta` `:3421` at `:3933` full +
+> `:3733` incremental). PRs are now split by **write-function**, not by path,
+> and GraphMeta is left on MERGE (not bulk-converted).
 
 ## Goal
 
-- Cut `init` / `reprocess` wall-clock on a medium Java corpus from ~395s toward
-  ~120s by replacing the per-row graph write with bulk `COPY FROM` (PR-P1).
-- Extend the bulk primitive to the incremental path so `increment` on large
-  change-sets is also fast (PR-P2).
+- Cut `init` / `reprocess` / `increment` graph-write wall-clock by replacing
+  per-row `conn.execute` writes with bulk `COPY FROM` — the ~81% init lever.
 - Remove the ~25s `LayeredIgnore` re-construction + `is_ignored` re-merge paid
-  per file in the cocoindex vectors phase (PR-P3).
-- Preserve graph contents exactly: no ontology bump, no re-index, no query-result
-  change (proven by an equivalence harness).
+  per file in the cocoindex vectors phase.
+- Preserve graph contents exactly: no ontology bump, no re-index, no
+  query-result change (proven by an equivalence harness).
 
 ## Principles (do not relitigate in review)
 
-- **Byte-equivalent graph.** PR-P1/P2 change only the write mechanism; the graph
-  (node/edge rows, properties, `GraphMeta` counters) must be identical to today.
-  The equivalence harness is the merge gate — a failing equivalence test blocks
-  the PR.
-- **Full rebuild path only in PR-P1.** The incremental path keeps its current
-  per-row writes until PR-P2. PR-P1 does not touch `increment`.
-- **In-memory pyarrow `COPY FROM`** is the bulk mechanism (verified: the `ladybug`
-  wrapper forwards `COPY FROM` verbatim and accepts a pyarrow/pandas param —
-  `ladybug/connection.py:337/488`). Parquet-file staging is a fallback, not the
+- **Byte-equivalent graph.** Every PR changes only the write mechanism; the
+  graph (node/edge rows, properties, `GraphMeta` counters) must be identical to
+  today. The equivalence harness + the full `test_incremental_graph.py` suite
+  are the merge gate.
+- **Split by write-function, not by path.** The graph write helpers are shared
+  between the full path (`write_ladybug:3893`) and the incremental path
+  (`incremental_rebuild:3535`): `_write_edges`, `_write_routes_and_exposes`,
+  `_write_nodes_impl`, `_write_meta` are each called by BOTH. Converting a
+  shared helper accelerates **both** paths at once — there is no "full-only"
+  conversion for edges/routes. `_write_clients_producers_and_calls:3810` is
+  **incremental-only** (the global pass5/6 step; sole caller `:3716`).
+- **GraphMeta stays on MERGE.** `_write_meta:3421` is shared and recomputes
+  counters before a single `MERGE (m:GraphMeta {key:$k})` (`:3472`). It is one
+  row and not worth the risk — **do not bulk-convert it**. (This reverses the
+  proposal's Open Q1 recommendation.)
+- **In-memory pyarrow `COPY FROM`** is the bulk mechanism (verified: the
+  `ladybug` wrapper forwards `COPY FROM` and accepts a pyarrow param —
+  `ladybug/connection.py:337`/`:488`). Parquet-file is a fallback, not the
   default. Do not propose CSV.
-- **MPS device default is out of scope** — the flow already auto-selects MPS. Do
-  not add a device PR.
+- **MPS device default is out of scope** — the flow already auto-selects MPS.
 - **No new env vars, CLI flags, or public surface.** No compatibility shims.
 
 ## PR breakdown - overview
 
-| PR | Scope | Ontology bump | Areas of concern | Test buckets | Independent of |
+| PR | Scope (write-functions converted) | Ontology bump | Areas of concern | Test buckets | Independent of |
 | --- | --- | --- | --- | --- | --- |
-| **PR-P1** | Bulk `COPY FROM` for the full rebuild path (`init`/`reprocess`) in `build_ast_graph.py` | none | REL-table `COPY FROM` column order (FROM/TO first); LIST column arrow typing; CALLS dedup + `callee_declaring_role` materialized at staging; node-before-edge load order; `GraphMeta` bulk | equivalence + determinism + baseline + full `test_ast_graph_build.py` regression | — |
-| **PR-P2** | Shared bulk primitive applied to the incremental `_delete_file_scope` → re-emit path | none | Preserve pass5/6 `MERGE (r:Route)` dedup (`build_ast_graph.py:3819-3821`); incremental delete-then-emit ordering; equivalence vs full rebuild | `test_incremental_graph.py` regression + new incremental-equivalence test | depends on **PR-P1** |
-| **PR-P3** | Hoist `LayeredIgnore` to a cocoindex `ContextKey` + memoize `is_ignored`'s `_mega` | none | `ContextKey` lifespan scoping (build once, not per file); `_mega` memo correctness (mega depends on the file's directory only — match_file stays per-file); no change to any ignore *decision* | flow ignore tests + memo unit tests + heavy vectors-progress test | independent of PR-P1, PR-P2 |
+| **PR-P1** | Add `_bulk_copy`; convert **`_write_edges`** (shared → both paths' Symbol→Symbol edges + UnresolvedCallSite/UNRESOLVED_AT) | none | REL-table `COPY FROM` column order (FROM/TO first); CALLS dedup + `callee_declaring_role` materialized at staging; UnresolvedCallSite loaded before UNRESOLVED_AT | equivalence + determinism + baseline; full `test_ast_graph_build.py` **and** `test_incremental_graph.py` (shared helper) | — |
+| **PR-P2** | Convert **`_write_nodes_impl`** (shared nodes), **`_write_routes_and_exposes`** (shared routes/clients/producers/calls), **`_write_clients_producers_and_calls`** (incremental-only global; Route MERGE preserved) | none | Route/Client/Producer nodes loaded before EXPOSES/DECLARES_*/HTTP_CALLS/ASYNC_CALLS; the 6 client/producer `_CREATE_*` constants are shared with the incremental-only function — delete only after BOTH convert; pass5/6 Route MERGE retained | `test_incremental_graph.py` regression + new incremental-equivalence test | depends on **PR-P1** (`_bulk_copy`) |
+| **PR-P3** | Hoist `LayeredIgnore` to a cocoindex `ContextKey` + memoize `is_ignored`'s `_mega` by directory | none | `ContextKey` lifespan scoping; `_mega` memo correctness (mega depends on directory only); leave the once-per-run sites (`:177`, `:569`) alone; no change to any ignore *decision* | `tests/test_path_filtering.py` memo tests + `tests/test_lancedb_e2e.py` (HEAVY) once-per-flow | independent of PR-P1, PR-P2 |
 
 Landing order: **P1 → P2**; **P3** may land in any order (independent).
 
@@ -48,336 +66,291 @@ Landing order: **P1 → P2**; **P3** may land in any order (independent).
 
 | Topic | Decision |
 | --- | --- |
-| Bulk mechanism | In-memory pyarrow: `conn.execute("COPY <table> FROM $rows", {"rows": pa_table})`. Verified against `ladybug/connection.py:337` (`FROM $param`) and `:488` (pyarrow/pandas/polars accepted). |
-| REL-table column rule | First two staged columns are the FROM/TO node primary keys (`Symbol.id` / `Route.id` / etc.). Exact column *naming* for REL `COPY FROM` is confirmed in PR-P1 step 1 (a 5-line spike); the contract below assumes positional FROM/TO. |
-| `GraphMeta` MERGE (`build_ast_graph.py:3472-3473`) | Folded into PR-P1 bulk (it is in the full-rebuild path; the table is freshly created and empty, so `COPY` insert is correct). Resolves proposal Open Q1. |
-| PR-P3 cache vehicle | cocoindex `ContextKey[LayeredIgnore]` (lifespan-scoped, built once). Resolves proposal Open Q2. |
-| `is_ignored` memo | Cache `_mega(rel)` → `(mega, spec)` keyed by the file's project-relative **directory** (`Path(rel).parent.as_posix()`), because `_mega_build_for_rel` uses only `dir_parts = parts[:-1]`. `spec.match_file(rel)` stays per-file (cheap, filename-dependent). |
+| Bulk mechanism | In-memory pyarrow: `conn.execute("COPY <table> FROM $rows", {"rows": pa_table})`. Verified `ladybug/connection.py:337` (`FROM $param`) + `:488` (pyarrow accepted). |
+| REL-table column rule | First two staged columns are the FROM/TO node primary keys. Exact column naming for REL `COPY FROM` locked by the PR-P1 step-1 spike. |
+| GraphMeta (`_write_meta`) | **Leave on MERGE.** Shared helper, one row, recomputes counters (`build_ast_graph.py:3421`). Reverses proposal Open Q1. |
+| `_write_nodes_impl` (shared workhorse) | Converted in PR-P2; both `_write_nodes` (full, `_CREATE_SYMBOL`) and `_write_nodes_merge` (incremental, `_MERGE_SYMBOL`) call it, so converting it once kills both constants. |
+| PR-P3 cache vehicle | cocoindex `ContextKey[LayeredIgnore]` (lifespan-scoped, built once). |
+| `is_ignored` memo | Cache `_mega(rel)` → `(mega, spec, meta)` keyed by the file's project-relative **directory** (`Path(rel).parent.as_posix()`); `_mega_build_for_rel` reads only `dir_parts = parts[:-1]` (`path_filtering.py:226-227`), so this is correct. `spec.match_file(rel)` stays per-file. |
+| Sites `:177` / `:569` | Left alone — both call `cocoindex_excluded_patterns()` **once per run** (the `_approximate_vectors_total` helper and the app_main pre-walk), not per-file. |
 
 ---
 
-# PR-P1 — Bulk `COPY FROM` for the full rebuild path
+# PR-P1 — Bulk `COPY FROM` for `_write_edges` (shared; the ~250s prize)
 
-**Goal:** replace the per-row `conn.execute` writes in `write_ladybug` (the
-init/reprocess full-rebuild entry, `build_ast_graph.py:3893`) with bulk
-in-memory-pyarrow `COPY FROM`. Graph contents stay byte-equivalent.
+**Goal:** add the bulk primitive and convert the shared `_write_edges` helper.
+Because `_write_edges` is called by both paths, this accelerates the
+Symbol→Symbol edges + UnresolvedCallSite writes for **both** `init` and
+`increment` (the largest graph-write cost: ~250s of ~321s).
 
 ## File-by-file changes
 
-### 1. `build_ast_graph.py` — bulk-write primitive + staging
-
-#### 1a. New helper `_bulk_copy`
-Add near the other write helpers (after `_node_row`, ~`build_ast_graph.py:2994`):
+### 1. `build_ast_graph.py` — `_bulk_copy` primitive + `_write_edges` conversion
 
+#### 1a. New helper `_bulk_copy` (add near `_node_row`, ~`:2994`)
 ```python
 import pyarrow as pa
 
-def _bulk_copy(conn: ladybug.Connection, table_name: str, columns: list[str], rows: list[dict]) -> None:
-    """Bulk-load rows into a (node or rel) table via in-memory pyarrow COPY FROM.
+def _bulk_copy(conn, table_name, columns, rows):
+    """Bulk-load rows into a node/rel table via in-memory pyarrow COPY FROM.
 
-    `columns` fixes column order — for REL tables the first two MUST be the
-    FROM/TO node ids (kuzu requirement). Empty `rows` is a no-op.
+    `columns` fixes column order; for REL tables the first two MUST be the
+    FROM/TO node primary keys (kuzu requirement). Empty `rows` is a no-op.
     """
     if not rows:
         return
-    tbl = pa.Table.from_pylist(rows)  # infers types from non-empty data
-    # from_pylist infers STRING / list<STRING> correctly for non-empty rows;
-    # step-1 spike confirms LIST columns (modifiers/annotations/capabilities)
-    # infer as pa.list_(pa.string()). If any column is all-empty-list, pass an
-    # explicit pa.schema derived from the _SCHEMA_* strings instead.
+    tbl = pa.Table.from_pylist(rows)
     conn.execute(f"COPY {table_name} FROM $rows", {"rows": tbl})
 ```
+Column-order constants next to the `_SCHEMA_*` strings (`~:2812`), each matching
+its `_SCHEMA_*` order; for REL tables the first two entries are the endpoint ids:
+`_REL_EXTENDS_COLUMNS = ["FROM","TO","source_file","dst_name","dst_fqn","resolved"]`,
+`_REL_CALLS_COLUMNS = ["FROM","TO","source_file","callee_declaring_role", <…resolved props>]`,
+`UNRESOLVED_CALL_SITE_COLUMNS`, `_REL_UNRESOLVED_AT_COLUMNS = ["FROM","TO","source_file"]`, etc.
+
+> **Step-1 spike (mandatory, first commit of PR-P1):** confirm (a) the exact REL
+> `COPY FROM` column naming — toy 2-Symbol + 1-CALLS `COPY CALLS FROM $rows`
+> with `["FROM","TO",…]`, assert the edge lands with correct endpoints and
+> `callee_declaring_role`; (b) `pa.Table.from_pylist` type inference for any LIST
+> columns this helper touches. Record the working incantation in the docstring.
+> (On kuzu 0.11.3 + the repo's pybind backend, all-empty LIST columns infer as
+> `list<null>` and are accepted — confirmed by review — so an explicit pa.schema
+> is a fallback, not required.)
+
+#### 1b. Convert `_write_edges` (`:3244`, shared) to per-type staging + bulk
+Today it loops `conn.execute(_CREATE_EXT, {…})` etc. with two dedup sets —
+`seen_calls` (`:3282-3288`, key `(src_id,dst_id,arg_count,call_site_line)` —
+verified) and `seen_ucs` (`:3317-3321`) — and a `callee_declaring_role` lookup
+(`_callee_declaring_role_at_write`, `:1647`/`:3302`). Restructure to accumulate
+per-edge-type row lists, applying the **same** dedup and **same**
+`callee_declaring_role` materialization **before appending**, then `_bulk_copy`
+each REL table once. REL row dicts get `FROM`/`TO` = src/dst node ids plus the
+properties in `_SCHEMA_*` order.
+
+**Within-helper load order (kuzu validates endpoint existence):** Symbol nodes
+are already loaded by `_write_nodes` (called before `_write_edges` in both
+paths). So bulk-load the **UnresolvedCallSite node rows before the UNRESOLVED_AT
+edge rows** (UNRESOLVED_AT is Symbol→UnresolvedCallSite). The Symbol→Symbol edges
+(EXTENDS/IMPLEMENTS/INJECTS/DECLARES/OVERRIDES/CALLS) reference only already-loaded
+Symbol nodes.
 
-Column-order constants (define next to the `_SCHEMA_*` strings, ~`:2812`), each
-matching its `_SCHEMA_*` column order exactly — for REL tables the first two
-entries are the endpoint ids:
-
-- `NODE_COLUMNS` = the `Symbol` property columns in `_SCHEMA_NODE` order.
-- `ROUTE_COLUMNS`, `CLIENT_COLUMNS`, `PRODUCER_COLUMNS`, `GRAPHMETA_COLUMNS`,
-  `UNRESOLVED_CALL_SITE_COLUMNS` — same, from their `_SCHEMA_*`.
-- For each REL table: `_REL_*_COLUMNS` = `["FROM", "TO", <props in _SCHEMA_* order>]`
-  for `EXTENDS/IMPLEMENTS/INJECTS/DECLARES/OVERRIDES/CALLS/UNRESOLVED_AT/EXPOSES/
-  DECLARES_CLIENT/DECLARES_PRODUCER/HTTP_CALLS/ASYNC_CALLS`.
-
-> **Step-1 spike (mandatory, first commit of PR-P1):** confirm (a) the exact
-> REL `COPY FROM` column naming — build a 2-node + 1-edge toy, `COPY CALLS FROM
-> $rows` with columns `["FROM","TO","source_file","resolved",...]`, assert the
-> edge lands with correct endpoints; (b) `pa.Table.from_pylist` type inference
-> for the `modifiers`/`annotations`/`capabilities` LIST columns. Record the
-> working incantation in the helper docstring. If inference fails on empty
-> lists, build an explicit `pa.schema` map keyed by table.
-
-#### 1b. Convert `_write_nodes` (full path) to bulk
-`_write_nodes` (`:3096`) currently calls `_write_nodes_impl(..., symbol_query=_CREATE_SYMBOL)`
-which loops `conn.execute(_CREATE_SYMBOL, _node_row(...))` per node. Replace the
-full-path body: iterate packages / files / types / members exactly as
-`_write_nodes_impl` does, but append each `_node_row(...)` to a list, then call
-`_bulk_copy(conn, "Symbol", NODE_COLUMNS, node_rows)`. Keep the role/capability
-resolution (`resolve_role_and_capabilities`) and `tables.type_role_by_node_id`
-population identical (those are pure in-memory steps before staging).
-
-`_write_nodes_impl` + `_MERGE_SYMBOL` stay — they are the **incremental** path
-(`_write_nodes_merge`, `:817`) used until PR-P2. `_CREATE_SYMBOL` becomes dead
-(only `_write_nodes` used it) → delete it.
-
-#### 1c. Convert `_write_edges` to bulk
-`_write_edges` (`:3244`) loops per edge, calling `conn.execute(_CREATE_EXT,
-{...})` etc. with two dedup sets (`seen_calls` `:3282-3288`, `seen_ucs`
-`:3317-3321`) and a `callee_declaring_role` lookup (`_callee_declaring_role_at_write`,
-`:1647`/`:3302`). Restructure to accumulate **per-edge-type row lists**, applying
-the *same* dedup and *same* `callee_declaring_role` materialization, then
-`_bulk_copy` each REL table once:
-
-```python
-extends_rows: list[dict] = []
-...
-for ...:
-    extends_rows.append({"FROM": src_id, "TO": dst_id, "source_file": ..., "dst_name": ..., "dst_fqn": ..., "resolved": ...})
-...
-calls_rows.append(...)  # only when key not in seen_calls; include "callee_declaring_role"
-...
-_bulk_copy(conn, "EXTENDS", _REL_EXTENDS_COLUMNS, extends_rows)
-_bulk_copy(conn, "CALLS", _REL_CALLS_COLUMNS, calls_rows)   # CALLS_COLUMNS ends with callee_declaring_role
-...
-```
-
-The dedup sets and `_callee_declaring_role_at_write` calls are computed
-**before** appending (i.e. at staging), exactly preserving current semantics.
 `_CREATE_EXT`/`_CREATE_IMPL`/`_CREATE_INJ`/`_CREATE_DECL`/`_CREATE_OVERRIDES`/
-`_CREATE_CALL`/`_CREATE_UNRESOLVED`/`_CREATE_UNRESOLVED_AT` become dead → delete.
-
-`_populate_declares_rows` (`:3189`) and `_populate_overrides_rows` (`:3206`) are
-pure in-memory population — unchanged; they feed the DECLARES/OVERRIDES row lists.
-
-#### 1d. Convert `_write_routes_and_exposes` (and Client/Producer/HTTP_CALLS/ASYNC_CALLS) to bulk
-Same pattern (`~`:3349-3408`): stage `Route`/`Client`/`Producer` node rows and
-`EXPOSES`/`DECLARES_CLIENT`/`DECLARES_PRODUCER`/`HTTP_CALLS`/`ASYNC_CALLS` edge
-rows (with FROM/TO ids), then `_bulk_copy` each. The pass5/6 `MERGE (r:Route)`
-dedup (`:3819-3821`) is **incremental-only** and untouched in PR-P1; in the full
-path routes are emitted once into a fresh table, so plain `COPY` insert is
-correct. Delete the now-dead `_CREATE_ROUTE`/`_CREATE_EXPOSES`/`_CREATE_CLIENT`/
-`_CREATE_DECLARES_CLIENT`/`_CREATE_PRODUCER`/`_CREATE_DECLARES_PRODUCER`/
-`_CREATE_HTTP_CALL`/`_CREATE_ASYNC_CALL` strings.
-
-#### 1e. Convert `GraphMeta` write to bulk
-The `MERGE (m:GraphMeta {key: $k})` loop (`:3472-3473`) becomes a single-row
-`_bulk_copy(conn, "GraphMeta", GRAPHMETA_COLUMNS, [meta_row])` (table is freshly
-created and empty in a full rebuild). Keep the row contents (ontology_version,
-built_at, source_root, parse_errors, counts_json) identical.
-
-#### 1f. `write_ladybug` load order (node-before-edge)
-`write_ladybug` (`:3893`) already calls `_write_nodes` → `_write_edges` →
-`_write_routes_and_exposes` → GraphMeta. Preserve this order: all node tables
-(`Symbol`, `Route`, `Client`, `Producer`, `UnresolvedCallSite`, `GraphMeta`) are
-bulk-loaded before any REL table, so endpoint rows exist when kuzu validates REL
-`COPY FROM`. Add an inline comment stating the ordering invariant.
-
-### 2. `tests/test_ast_graph_build.py` — equivalence harness + regression
-
-Add an equivalence harness and a committed baseline. The existing 26 tests in
-this file (e.g. `test_schema_has_all_expected_tables`,
-`test_graph_meta_present_and_versioned`, `test_each_edge_type_populated`,
+`_CREATE_CALL` (module constants) become dead → delete. `_CREATE_UNRESOLVED` and
+`_CREATE_UNRESOLVED_AT` are **locals** defined inside `_write_edges` (`:3307`/`:3313`),
+removed when the function is rewritten to bulk. `_populate_declares_rows`
+(`:3189`) / `_populate_overrides_rows` (`:3206`) are pure in-memory population —
+unchanged.
+
+### 2. `tests/test_ast_graph_build.py` — equivalence harness + baseline
+The existing 26 tests are the regression net (e.g.
+`test_schema_has_all_expected_tables`, `test_each_edge_type_populated`,
 `test_pass3_callee_declaring_role_bank_annotated_types`,
-`test_pass3_unresolved_call_site_emitted`, `test_pass3_known_external_calls_preserved`)
-must continue to pass unchanged — they ARE the regression net.
-
-Add a committed baseline artifact `tests/fixtures/graph_baseline_bank_chat.json`
-(node count, per-type edge counts, `GraphMeta` counters, and N=3 sampled edge
-property rows per type incl. `source_file` and CALLS `callee_declaring_role`).
-Generated once during PR-P1 from the **last per-row build** before its removal,
-committed, and regenerated only when `ontology_version` changes.
+`test_pass3_unresolved_call_site_emitted`,
+`test_pass3_known_external_calls_preserved`). Add a committed baseline
+`tests/fixtures/graph_baseline_bank_chat.json` (node count, per-type edge counts,
+`GraphMeta` counters, and N=3 sampled edge property rows per type incl.
+`source_file` and CALLS `callee_declaring_role`). **It is an equivalence anchor,
+not a production invariant** — regenerated from the last per-row build before
+removal, and regenerated only when `ontology_version` changes (it does not in
+this plan). Asserting invariants here is acceptable because PR-P1 is a
+behavior-preserving write-mechanism swap.
 
 ### 3. `scripts/bench_init_graph_write.py` (new, dev-only) — benchmark
-A small script that times `init` (or `build_ast_graph.py --source-root
-<corpus>`) on a medium corpus and prints the graph-write phase delta vs the
-pre-PR baseline. Not shipped in the package; lives under `scripts/`. Documents
-the measured speedup in the PR description.
+Times `init`/`build_ast_graph.py` on a medium corpus; prints the graph-write
+phase delta. Not packaged; documents the measured speedup in the PR description.
 
 ## Tests for PR-P1
 
-1. `test_bulk_write_graph_matches_per_row_baseline` — build `tests/bank-chat-system`
+1. `test_bulk_write_edges_match_per_row_baseline` — build `tests/bank-chat-system`
    via the bulk path, assert node count, per-type edge counts, `GraphMeta`
-   counters, and the sampled edge property rows equal `tests/fixtures/graph_baseline_bank_chat.json`.
+   counters, and sampled edge rows equal `graph_baseline_bank_chat.json`.
 2. `test_bulk_write_is_deterministic_double_build` — build bank-chat twice to two
-   DB paths via the bulk path, assert identical node count, per-type edge counts,
-   `GraphMeta` counters, and a query battery (`MATCH (s:Symbol) RETURN ...`,
-   `neighbors`-shaped reads). Models on existing
-   `test_29_determinism_pass4_route_ids` / `test_overrides_edge_set_deterministic_double_build`.
-3. `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role` — assert
-   CALLS rows are deduped by `(src,dst,argc,line)` and carry the correct
-   `callee_declaring_role` (reuse the `@Service` callee assertion from
-   `test_pass3_callee_declaring_role_bank_annotated_types` against a bulk build).
+   DBs via the bulk path, assert identical counts + query battery. Models on
+   `tests/test_brownfield_routes.py::test_29_determinism_pass4_route_ids` and
+   `tests/test_mcp_v2_compose.py::test_overrides_edge_set_deterministic_double_build`.
+3. `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role` — CALLS rows
+   deduped by `(src,dst,argc,line)`; carry correct `callee_declaring_role`
+   (reuse the `@Service` callee assertion against a bulk build).
 4. `test_bulk_write_empty_rel_table_is_noop` — a corpus with no `EXTENDS` edges
-   must not error (`_bulk_copy` no-ops on empty rows) and the table is empty.
+   must not error (`_bulk_copy` no-ops on empty rows).
 
-**Must-still-pass (regression, do not loosen):** the full `tests/test_ast_graph_build.py`,
-`tests/test_bank_chat_brownfield_integration.py`, `tests/test_call_edges_e2e.py`,
-and `tests/test_incremental_graph.py` suites (incremental path untouched).
+**Must-still-pass (regression — `_write_edges` is shared, so both paths):** full
+`tests/test_ast_graph_build.py`, `tests/test_incremental_graph.py` (28 tests),
+`tests/test_bank_chat_brownfield_integration.py`, `tests/test_call_edges_e2e.py`.
 
 ## Definition of done (PR-P1)
 
-- [ ] `_bulk_copy` helper added; step-1 spike result recorded in its docstring.
-- [ ] `write_ladybug` full path writes all node tables then all REL tables via `_bulk_copy`; no per-row `CREATE` remains in the full path.
-- [ ] Dead `_CREATE_*` strings + `_CREATE_SYMBOL` deleted.
-- [ ] `test_bulk_write_graph_matches_per_row_baseline`,
+- [ ] `_bulk_copy` helper added; step-1 spike result in its docstring.
+- [ ] `_write_edges` stages per-type rows (CALLS dedup + `callee_declaring_role` at staging) and bulk-loads UnresolvedCallSite before UNRESOLVED_AT.
+- [ ] `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` deleted; local `_CREATE_UNRESOLVED/_UNRESOLVED_AT` gone with the rewrite.
+- [ ] `test_bulk_write_edges_match_per_row_baseline`,
       `test_bulk_write_is_deterministic_double_build`,
       `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role`,
-      `test_bulk_write_empty_rel_table_is_noop` added and pass.
-- [ ] Full existing graph/edge/brownfield/incremental test suites pass unchanged.
+      `test_bulk_write_empty_rel_table_is_noop` pass.
+- [ ] Full `test_ast_graph_build.py` + `test_incremental_graph.py` pass unchanged.
 - [ ] `.venv/bin/ruff check .` clean.
-- [ ] Benchmark numbers (before/after graph-write phase) pasted in the PR description.
-- [ ] PR title: `perf(graph): bulk COPY FROM for the full rebuild path (PR-P1)`.
+- [ ] Sentinel greps pass (see AGENT-PROMPTS); benchmark numbers in PR description.
+- [ ] PR title: `perf(graph): bulk COPY FROM for _write_edges (PR-P1)`.
 
 ## Implementation step list
 
 | # | Step | File(s) | Done when |
 | --- | --- | --- | --- |
-| 1 | Spike: confirm REL `COPY FROM` column order + `from_pylist` LIST typing | `build_ast_graph.py` (throwaway) | toy 2-node+1-edge `COPY CALLS FROM $rows` lands with correct endpoints; result in `_bulk_copy` docstring |
-| 2 | Add `_bulk_copy` + column-order constants | `build_ast_graph.py` | helper + `_REL_*_COLUMNS`/`*_COLUMNS` defined |
-| 3 | Convert `_write_nodes` (full) to bulk; delete `_CREATE_SYMBOL` | `build_ast_graph.py` | node tables bulk-loaded; Symbol count matches baseline |
-| 4 | Convert `_write_edges` to per-type staging + bulk; delete dead `_CREATE_*` edge strings | `build_ast_graph.py` | edge counts match baseline; dedup + callee_declaring_role preserved |
-| 5 | Convert `_write_routes_and_exposes` (+ Client/Producer/HTTP_CALLS/ASYNC_CALLS) to bulk | `build_ast_graph.py` | route/client/producer/call-edge counts match baseline |
-| 6 | Convert GraphMeta write to bulk | `build_ast_graph.py` | `test_graph_meta_present_and_versioned` passes |
-| 7 | Generate + commit baseline fixture | `tests/fixtures/graph_baseline_bank_chat.json` | produced from last per-row build before removal |
-| 8 | Add 4 equivalence/determinism tests | `tests/test_ast_graph_build.py` | all pass |
-| 9 | Run full regression + ruff; capture benchmark | repo | suites green; numbers in PR description |
+| 1 | Spike: REL `COPY FROM` column order + `from_pylist` typing | `build_ast_graph.py` (throwaway) | toy CALLS edge lands with correct endpoints; result in `_bulk_copy` docstring |
+| 2 | Add `_bulk_copy` + `_REL_*_COLUMNS`/column constants | `build_ast_graph.py` | helper + constants defined |
+| 3 | Convert `_write_edges` to per-type staging + bulk; load UnresolvedCallSite before UNRESOLVED_AT | `build_ast_graph.py` | edge counts match baseline; dedup + callee_declaring_role preserved |
+| 4 | Delete dead `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` + locals | `build_ast_graph.py` | sentinel greps pass |
+| 5 | Generate + commit baseline | `tests/fixtures/graph_baseline_bank_chat.json` | from last per-row `_write_edges` build |
+| 6 | Add 4 tests | `tests/test_ast_graph_build.py` | all pass |
+| 7 | Full regression (ast_graph_build + incremental) + ruff; benchmark | repo | green; numbers in PR description |
 
 ---
 
-# PR-P2 — Bulk write for the incremental path
+# PR-P2 — Bulk write for nodes + routes/clients/producers/calls
 
-**Goal:** apply the PR-P1 bulk primitive to the incremental rebuild path so
-`increment` on a large change-set is also fast. **Depends on PR-P1** (`_bulk_copy`
-+ column constants + the staging pattern).
+**Goal:** convert the remaining graph writes. **Depends on PR-P1's `_bulk_copy`.**
 
 ## File-by-file changes
 
-### 1. `build_ast_graph.py` — incremental path bulk conversion
-- `_write_nodes_merge` (`:817`, uses `_MERGE_SYMBOL`): the incremental path is
-  delete-then-insert (`_delete_file_scope`, `:673`, removes changed/dependent
-  files' nodes+edges first), so re-emit is into a cleaned scope → convert the
-  per-row `_MERGE_SYMBOL` loop to `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`.
-  `_MERGE_SYMBOL` becomes dead → delete; `_write_nodes_impl` is removed if no
-  caller remains (it does not, after P1+P2).
-- Incremental edge re-emit: apply the same per-type staging + `_bulk_copy` as
-  PR-P1's `_write_edges`, scoped to the re-emitted files.
-- **Preserve the pass5/6 `MERGE (r:Route)` dedup** (`:3819-3821`): routes written
-  during the scoped step must still MERGE (not duplicate) against routes from the
-  global step. Keep those specific `MERGE` statements; only convert the
-  non-Route, non-dedup writes to bulk. Add a comment that this MERGE is
-  intentionally retained.
-- `incremental_rebuild` (`:3535`): no algorithmic change; it calls into the
-  converted write helpers.
+### 1. `build_ast_graph.py`
+
+#### 1a. Convert `_write_nodes_impl` (`:3029`, shared workhorse)
+It is called by `_write_nodes` (`:3103`, full, `_CREATE_SYMBOL`) and
+`_write_nodes_merge` (`:825`, incremental, `_MERGE_SYMBOL`). Convert its body to
+stage all Symbol rows (packages/files/types/members, with the existing
+`resolve_role_and_capabilities` + `type_role_by_node_id` population done before
+staging) then `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`. Both wrappers now
+share one bulk path; `_CREATE_SYMBOL` and `_MERGE_SYMBOL` become dead → delete
+both. (`_write_nodes_impl`'s per-row loop is gone; the two wrappers may collapse
+to one, but keep both names if they differ in non-write setup — minimize churn.)
+> Note: this removes the ~40-line node-row-building duplication that a
+> full-path-only conversion would have created — converting the shared workhorse
+> avoids it entirely.
+
+#### 1b. Convert `_write_routes_and_exposes` (`:3338`, shared) to bulk
+Today it loops over `routes_rows`/`exposes_rows`/`client_rows`/`declares_client_rows`/
+`producer_rows`/`declares_producer_rows`/`http_call_rows`/`async_call_rows`, calling
+`_CREATE_ROUTE`/`_CREATE_EXPOSES`/`_CREATE_CLIENT`/`_CREATE_DECLARES_CLIENT`/
+`_CREATE_PRODUCER`/`_CREATE_DECLARES_PRODUCER`/`_CREATE_HTTP_CALL`/`_CREATE_ASYNC_CALL`
+with the existing `_file_by_node_id`/`_file_by_client_id`/`_file_by_producer_id`
+source_file resolution. Stage each table's rows (applying that resolution) and
+`_bulk_copy`. **Load order within the helper:** bulk-load Route/Client/Producer
+NODE rows before the EXPOSES/DECLARES_CLIENT/DECLARES_PRODUCER/HTTP_CALLS/
+ASYNC_CALLS edges (those edges reference those nodes + already-loaded Symbol).
+`_CREATE_ROUTE`/`_CREATE_EXPOSES` become dead → delete.
+
+#### 1c. Convert `_write_clients_producers_and_calls` (`:3810`, incremental-only) to bulk
+This is the global pass5/6 step (sole caller `incremental_rebuild:3716`). It
+writes Route (via `MERGE (r:Route {id:$id}) …` to dedup against the scoped step,
+`:3819-3828`), Client/Producer nodes (`_CREATE_CLIENT`/`_CREATE_PRODUCER`), and
+DECLARES_CLIENT/DECLARES_PRODUCER/HTTP_CALLS/ASYNC_CALLS edges. Convert the
+Client/Producer/edge writes to per-type staging + `_bulk_copy` (same
+`member_by_id`/`client_by_id`/`producer_by_id` source_file resolution).
+**Retain the `MERGE (r:Route …)` dedup verbatim** — routes written during the
+scoped step must not duplicate; add a one-line comment that it is intentionally
+kept. (Route upsert stays MERGE; only the Client/Producer/edge writes bulk.)
+Now that BOTH `_write_routes_and_exposes` and `_write_clients_producers_and_calls`
+are converted, the 6 shared constants — `_CREATE_CLIENT`/`_CREATE_PRODUCER`/
+`_CREATE_DECLARES_CLIENT`/`_CREATE_DECLARES_PRODUCER`/`_CREATE_HTTP_CALL`/
+`_CREATE_ASYNC_CALL` — are dead → delete all six.
+
+#### 1d. `_write_meta` — UNCHANGED
+Leave `_write_meta` (`:3421`) on its `MERGE (m:GraphMeta …)`. Do not touch.
 
 ### 2. `tests/test_incremental_graph.py` — incremental equivalence
-The existing 28 tests (e.g. `test_incremental_single_file_change`,
-`test_incremental_new_file`, `test_incremental_deleted_file`,
-`test_incremental_phantom_nodes_preserved`, `test_incremental_dependent_expansion`,
-`test_incremental_expansion_cap_fallback`, `test_incremental_crash_marker_*`,
-`test_incremental_no_changes_is_noop`, `test_delete_file_scope_preserves_dependent_nodes`)
-must pass unchanged.
+Add the new tests as **methods of `TestIncrementalOrchestrator`** (same class as
+`test_incremental_single_file_change`, `:230`-ish). The existing 28 tests must
+pass unchanged.
 
 ## Tests for PR-P2
 
-1. `test_incremental_bulk_write_equivalent_to_full_rebuild` — make a single-file
-   change, run `increment` (bulk incremental), then build the same state via a
-   full rebuild (bulk full), assert identical node count, per-type edge counts,
-   and `GraphMeta` counters.
-2. `test_incremental_route_merge_dedup_preserved` — a corpus where pass5/6 would
-   re-emit an existing route; assert no duplicate `Route` rows after `increment`
-   (the retained `MERGE (r:Route)` still dedups).
+1. `test_incremental_bulk_write_equivalent_to_full_rebuild` (in
+   `TestIncrementalOrchestrator`) — single-file change → `increment` (bulk) →
+   full rebuild of that state (bulk) → assert identical node count, per-type edge
+   counts, `GraphMeta` counters.
+2. `test_incremental_route_merge_dedup_preserved` (in
+   `TestIncrementalOrchestrator`) — a corpus where pass5/6 re-emits an existing
+   route → no duplicate `Route` rows after `increment` (the retained MERGE
+   dedups).
 
-**Must-still-pass:** full `tests/test_incremental_graph.py`.
+**Must-still-pass:** full `tests/test_incremental_graph.py`, `test_ast_graph_build.py`.
 
 ## Definition of done (PR-P2)
 
-- [ ] Incremental node/edge re-emit uses `_bulk_copy`; `_MERGE_SYMBOL` deleted.
-- [ ] pass5/6 `MERGE (r:Route)` dedup retained and commented.
-- [ ] `test_incremental_bulk_write_equivalent_to_full_rebuild`,
-      `test_incremental_route_merge_dedup_preserved` added and pass.
-- [ ] Full `tests/test_incremental_graph.py` passes unchanged.
-- [ ] `.venv/bin/ruff check .` clean.
-- [ ] PR title: `perf(graph): bulk COPY FROM for the incremental path (PR-P2)`.
+- [ ] `_write_nodes_impl` bulk; `_CREATE_SYMBOL` + `_MERGE_SYMBOL` deleted.
+- [ ] `_write_routes_and_exposes` bulk; `_CREATE_ROUTE`/`_CREATE_EXPOSES` deleted.
+- [ ] `_write_clients_producers_and_calls` Client/Producer/edge writes bulk; `MERGE (r:Route)` retained + commented; 6 shared `_CREATE_*` deleted.
+- [ ] `_write_meta` untouched.
+- [ ] Both new tests pass (in `TestIncrementalOrchestrator`); full incremental + ast_graph_build suites green.
+- [ ] Sentinel greps pass; `.venv/bin/ruff check .` clean.
+- [ ] PR title: `perf(graph): bulk COPY FROM for nodes, routes, clients/producers (PR-P2)`.
 
 ## Implementation step list
 
 | # | Step | File(s) | Done when |
 | --- | --- | --- | --- |
-| 1 | Convert `_write_nodes_merge` to bulk; delete `_MERGE_SYMBOL` | `build_ast_graph.py` | incremental node re-emit via `_bulk_copy` |
-| 2 | Convert incremental edge re-emit to per-type staging + bulk | `build_ast_graph.py` | incremental edge counts match full rebuild |
-| 3 | Retain + comment the pass5/6 `MERGE (r:Route)` dedup | `build_ast_graph.py` | no duplicate Route rows after increment |
-| 4 | Add incremental-equivalence + route-dedup tests | `tests/test_incremental_graph.py` | both pass; full suite green |
+| 1 | Convert `_write_nodes_impl` to bulk; delete `_CREATE_SYMBOL`+`_MERGE_SYMBOL` | `build_ast_graph.py` | node counts match baseline; both wrappers share bulk |
+| 2 | Convert `_write_routes_and_exposes` to bulk; load Route/Client/Producer before edges; delete `_CREATE_ROUTE`/`_CREATE_EXPOSES` | `build_ast_graph.py` | route/client/producer/call counts match |
+| 3 | Convert `_write_clients_producers_and_calls` Client/Producer/edges to bulk; retain Route MERGE; delete 6 shared `_CREATE_*` | `build_ast_graph.py` | no duplicate routes; sentinel greps pass |
+| 4 | Add 2 tests to `TestIncrementalOrchestrator` | `tests/test_incremental_graph.py` | both pass; full suite green |
 | 5 | ruff + full regression | repo | clean + green |
 
 ---
 
 # PR-P3 — Cached `LayeredIgnore` (+ `is_ignored` memo) as a `ContextKey`
 
-**Goal:** remove the ~25s paid per-flow-run from re-constructing
-`LayeredIgnore(project_root)` per file and re-merging `_mega(spec)` per file.
-**Independent** of PR-P1/P2.
+**Goal:** remove the ~25s from re-constructing `LayeredIgnore(project_root)` per
+file and re-merging `_mega` per file. **Independent** of PR-P1/P2.
 
 ## File-by-file changes
 
-### 1. `java_index_flow_lancedb.py` — `ContextKey` for the ignore object
+### 1. `java_index_flow_lancedb.py`
 - Define `IGNORE = coco.ContextKey[LayeredIgnore]("java_lance_layered_ignore")`
-  alongside the existing `PROJECT_ROOT`/`EMBEDDER`/`LANCE_DB` definitions
-  (`:60-72`), using the **same** `_ck_params` (`detect_change` vs `tracked`)
-  detection block so it works across cocoindex versions.
-- In `coco_lifespan` (where `builder.provide(PROJECT_ROOT, root)` /
-  `builder.provide(EMBEDDER, embedder)` are called, `~`:287-306`), add
-  `builder.provide(IGNORE, LayeredIgnore(root))` — built **once** per flow run.
-- In `process_java_file` (`:344`), `process_sql_file` (`:416`),
-  `process_yaml_file` (`:464`): replace
-  `if LayeredIgnore(project_root).is_ignored((project_root / file.file_path.path).resolve()):`
-  (`:351`/`:423`/`:471`) with
-  `ignore = coco.use_context(IGNORE)` then
-  `if ignore.is_ignored((project_root / file.file_path.path).resolve()):`.
-  `project_root` stays (still needed for `_parse_and_enrich_java` and path
-  resolution).
-
-### 2. `path_filtering.py` — memoize `_mega` by directory
-- Add an instance cache `self._mega_cache: dict[str, tuple[list[str], GitIgnoreSpec, list]] = {}`
+  alongside `PROJECT_ROOT`/`EMBEDDER`/`LANCE_DB` (`:60-72`), reusing the SAME
+  `_ck_params` (`detect_change` vs `tracked`) detection block.
+- In `coco_lifespan` (`:287-306`), add `builder.provide(IGNORE, LayeredIgnore(root))`
+  — built **once** per flow run.
+- In `process_java_file` (`:345`), `process_sql_file` (`:417`), `process_yaml_file`
+  (`:465`): add `ignore = coco.use_context(IGNORE)` and replace
+  `LayeredIgnore(project_root).is_ignored((project_root / file.file_path.path).resolve())`
+  (`:351`/`:423`/`:471`) with `ignore.is_ignored((project_root / file.file_path.path).resolve())`.
+  Keep `project_root` (still used for path resolution + `_parse_and_enrich_java`).
+- **Leave `:177` and `:569` alone** — they call `cocoindex_excluded_patterns()`
+  once per run (the `_approximate_vectors_total` helper and the app_main
+  pre-walk), not per file.
+
+### 2. `path_filtering.py`
+- Add `self._mega_cache: dict[str, tuple[list[str], GitIgnoreSpec, list]] = {}`
   in `LayeredIgnore.__init__`.
-- In `LayeredIgnore._mega` (`:334`): key on
-  `dir_rel = Path(rel_project).parent.as_posix()`. If present, return the cached
-  `(mega, spec, meta)`; else compute via `_mega_build_for_rel` + compile, store,
-  return. Correctness rests on `_mega_build_for_rel` reading only
-  `dir_parts = parts[:-1]` (verified: it never reads the filename).
-- `is_ignored` (`:345`) and `diagnose_dict` (`:377`) keep calling `_mega(rel)`
-  unchanged — they transparently benefit from the cache.
-- Add a module/env knob `LAYERED_IGNORE_MEGA_CACHE` only if a disable path is
-  needed for debugging — **default enabled**, no new public surface otherwise.
+- In `_mega` (`:334`): key on `Path(rel_project).parent.as_posix()`; return cached
+  `(mega, spec, meta)` if present, else compute via `_mega_build_for_rel` +
+  `GitIgnoreSpec.from_lines`, store, return. Correctness rests on
+  `_mega_build_for_rel` reading only `dir_parts` (`:226-227`).
+- `is_ignored` (`:345`) and `diagnose_dict` (`:377`) call `_mega` unchanged and
+  benefit transparently (both consume the full `(mega, spec, meta)` tuple).
 
 ## Tests for PR-P3
 
-1. `test_is_ignored_mega_caches_by_directory` — build a `LayeredIgnore` over a
-   fixture with nested ignore files; call `is_ignored` for several files in the
-   same directory; assert `_mega` is computed once (spy on `_mega_build_for_rel`
-   or count `GitIgnoreSpec.from_lines` calls) and that decisions match the
-   uncached path.
-2. `test_layered_ignore_memo_preserves_decisions` — for a corpus of paths
-   (including nested `.java-codebase-rag/ignore` + `.gitignore` with negations),
-   assert `LayeredIgnore(...).is_ignored(p)` is identical with and without the
-   memo cache (correctness invariant).
-3. `test_layered_ignore_provided_once_per_flow` (HEAVY,
-   `JAVA_CODEBASE_RAG_RUN_HEAVY=1`) — run the real cocoindex flow over a small
-   corpus; assert the `LayeredIgnore` provided via `IGNORE` is a single instance
-   (identity check), not reconstructed per file.
+1. `test_is_ignored_mega_caches_by_directory` — in `tests/test_path_filtering.py`;
+   assert `_mega` is computed once per directory (spy on `_mega_build_for_rel`)
+   and decisions match the uncached path.
+2. `test_layered_ignore_memo_preserves_decisions` — in `tests/test_path_filtering.py`;
+   for a corpus with nested ignore + gitignore negations, assert `is_ignored` is
+   identical with and without the cache.
+3. `test_layered_ignore_provided_once_per_flow` — in `tests/test_lancedb_e2e.py`
+  (HEAVY, `JAVA_CODEBASE_RAG_RUN_HEAVY=1`); run the real flow, assert a single
+  `LayeredIgnore` instance (identity check), not per-file.
 
 **Must-still-pass:** `tests/test_lancedb_e2e.py::test_lancedb_ignore_file_reduces_indexed_java_files`,
-any `tests/test_path_filtering*` / ignore tests, and the heavy
-`test_flow_emits_vectors_progress_per_file`.
+`tests/test_path_filtering.py`, and the heavy
+`tests/test_vectors_progress.py::test_flow_emits_vectors_progress_per_file`.
 
 ## Definition of done (PR-P3)
 
-- [ ] `IGNORE` `ContextKey` defined (version-detected) + provided once in `coco_lifespan`.
-- [ ] The three `process_*_file` functions consume it; no per-file `LayeredIgnore(project_root)`.
-- [ ] `_mega` memoized by directory; `is_ignored`/`diagnose_dict` unchanged in result.
-- [ ] `test_is_ignored_mega_caches_by_directory`,
-      `test_layered_ignore_memo_preserves_decisions`,
-      `test_layered_ignore_provided_once_per_flow` added and pass.
-- [ ] Existing ignore + vectors-progress tests pass unchanged.
+- [ ] `IGNORE` ContextKey (version-detected) + provided once in `coco_lifespan`.
+- [ ] The three `process_*_file` consume it; sites `:177`/`:569` untouched.
+- [ ] `_mega` memoized by directory; `is_ignored`/`diagnose_dict` results unchanged.
+- [ ] The 3 named tests pass; existing ignore/vectors-progress tests pass unchanged.
 - [ ] `.venv/bin/ruff check .` clean.
 - [ ] PR title: `perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3)`.
 
@@ -385,10 +358,10 @@ any `tests/test_path_filtering*` / ignore tests, and the heavy
 
 | # | Step | File(s) | Done when |
 | --- | --- | --- | --- |
-| 1 | Add `IGNORE` ContextKey (version-detected) + provide in lifespan | `java_index_flow_lancedb.py` | `IGNORE` resolvable in process_*_file |
-| 2 | Switch the three `process_*_file` to `coco.use_context(IGNORE)` | `java_index_flow_lancedb.py` | no per-file construction |
+| 1 | Add `IGNORE` ContextKey (version-detected) + provide in lifespan | `java_index_flow_lancedb.py` | resolvable in process_*_file |
+| 2 | Switch the three `process_*_file` to `coco.use_context(IGNORE)` | `java_index_flow_lancedb.py` | only `:351`/`:423`/`:471` changed; `:177`/`:569` untouched |
 | 3 | Add `_mega` dirname cache | `path_filtering.py` | repeated same-dir calls hit cache |
-| 4 | Add memo + correctness + once-per-flow tests | `tests/` | all pass |
+| 4 | Add memo + correctness tests in `test_path_filtering.py`; once-per-flow in `test_lancedb_e2e.py` | `tests/` | all pass |
 | 5 | ruff + regression (incl. heavy ignore/vectors tests) | repo | clean + green |
 
 ---
@@ -397,32 +370,33 @@ any `tests/test_path_filtering*` / ignore tests, and the heavy
 
 | # | Risk | Severity | Mitigation |
 | --- | --- | --- | --- |
-| 1 | REL-table `COPY FROM` column ordering/naming differs from assumption | High | PR-P1 step-1 spike locks the exact incantation before any conversion; recorded in `_bulk_copy` docstring. |
-| 2 | LIST columns (`modifiers`/`annotations`/`capabilities`) mis-type under `from_pylist` | Medium | Spike asserts LIST inference; explicit `pa.schema` fallback documented. Equivalence test catches any row diff. |
-| 3 | CALLS dedup / `callee_declaring_role` drift when moving to staging | High | `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role` + the sampled-edge baseline check; semantics computed identically, just earlier. |
-| 4 | Bulk write changes observable query results | High | Equivalence + determinism tests + full existing graph/edge/brownfield suites are the gate. |
-| 5 | PR-P2 breaks the pass5/6 Route dedup | Medium | `MERGE (r:Route)` retained by name; `test_incremental_route_merge_dedup_preserved` guards it. |
-| 6 | PR-P3 `_mega` memo returns stale rules if dirParts logic changes | Low | `_mega_build_for_rel` reads only `dir_parts`; `test_layered_ignore_memo_preserves_decisions` asserts parity with uncached path. |
-| 7 | `ContextKey` lifespan differs across cocoindex versions | Low | Reuse the existing `_ck_params` `detect_change`/`tracked` detection block used by `PROJECT_ROOT`. |
+| 1 | REL-table `COPY FROM` column order/naming differs from assumption | High | PR-P1 step-1 spike locks it before conversion. |
+| 2 | Shared-helper conversion changes incremental output | High | `_write_edges`/`_write_routes_and_exposes`/`_write_nodes_impl` are shared → full `test_incremental_graph.py` (28 tests) is a merge gate for P1 and P2, plus the new incremental-equivalence test. |
+| 3 | CALLS dedup / `callee_declaring_role` drift when moved to staging | High | `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role` + sampled-edge baseline. |
+| 4 | Deleting a `_CREATE_*` constant still used by the other path | High | PR-P2 deletes the 6 client/producer constants only AFTER both `_write_routes_and_exposes` and `_write_clients_producers_and_calls` are converted (same PR). Sentinel greps enforce. |
+| 5 | pass5/6 Route dedup breaks | Medium | `MERGE (r:Route)` retained by name; `test_incremental_route_merge_dedup_preserved` guards it. |
+| 6 | PR-P3 `_mega` memo returns stale rules | Low | `_mega_build_for_rel` reads only `dir_parts`; `test_layered_ignore_memo_preserves_decisions` asserts parity. |
+| 7 | PR-P3 sentinel over-matches `:177`/`:569` | Medium | Sentinel is `LayeredIgnore\(project_root\)\.is_ignored` (matches only the 3 process sites), NOT the bare constructor. |
+| 8 | `ContextKey` lifespan differs across cocoindex versions | Low | Reuse the existing `_ck_params` `detect_change`/`tracked` detection block. |
 
 # Out of scope
 
-- MPS embedding default (already auto-selected — see proposal Out of scope).
-- ANN vector index (#337) and `watch` mode (#336).
+- MPS embedding default (already auto-selected).
+- GraphMeta bulk conversion (left on MERGE — shared, one row).
+- ANN index (#337) and `watch` mode (#336).
 - Replacing/restructuring the cocoindex flow; changing embedding model/dim.
 - Parallelizing graph analysis passes (pass1–pass6).
-- Converting the incremental path in PR-P1 (it is PR-P2).
 - Parquet-file or CSV bulk paths (pyarrow in-memory only).
+- Converting `:177`/`:569` (once-per-run setup, not hot paths).
 
 # Whole-plan done definition
 
-1. `init`/`reprocess` graph-write phase on the medium corpus drops from ~321s to
-   tens of seconds (benchmark in PR-P1 description).
-2. `increment` on a large change-set uses the bulk path (PR-P2).
-3. The vectors phase pays no per-file `LayeredIgnore`/`_mega` cost (PR-P3).
-4. No ontology bump (`ontology_version` stays 17); no re-index required; all
+1. `init`/`reprocess`/`increment` graph-write phase on the medium corpus drops
+   from ~321s to tens of seconds (benchmark in PR-P1; completed in PR-P2).
+2. The vectors phase pays no per-file `LayeredIgnore`/`_mega` cost (PR-P3).
+3. No ontology bump (`ontology_version` stays 17); no re-index required; all
    existing graph/edge/brownfield/incremental/ignore/vectors tests pass.
-5. Proposal moved to `propose/completed/` and this plan to `plans/completed/`
+4. Proposal moved to `propose/completed/` and this plan to `plans/completed/`
    once all three PRs land.
 
 # Tracking