feat(code-index): experimental DuckDB + tree-sitter code-index plugin#58
Conversation
Add @vymalo/opencode-code-index (private, experimental, not published): a personal OpenCode plugin that indexes a repo into an embedded DuckDB store and exposes code_* tools — code_symbol, code_callers, code_callees, code_references, code_blast_radius, plus index_refresh / index_status. The index is content-addressed by git blob and scoped per branch (a branch is a path→blob manifest), so branch/worktree switches re-index only the delta and blast_radius stays branch-correct. Names resolve at query time against the active manifest. The call graph is built with tree-sitter and is deliberately sound but partial (no type info — generic obj.method() is dropped); edges carry a confidence tag (name | this | typed) for a future language-server/SCIP enrichment tier. Engine and resolution choices were validated by throwaway spikes before building (DuckDB recursive CTE for blast_radius; tree-sitter extraction on real source). - packages/opencode-code-index: src + 50 vitest tests (98% stmts, 85% branches) - pnpm-workspace: allow tree-sitter native builds - docs/code-index.md (reference) + plans/code-index.md (design) - README / CLAUDE.md / AGENTS.md / CHANGELOG: document the experimental plugin Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental, private OpenCode plugin, @vymalo/opencode-code-index, which indexes repositories into an embedded DuckDB store using a tree-sitter symbol graph to expose structural code-intelligence tools. Feedback on these changes focuses on enhancing robustness and reliability. Key recommendations include coordinating concurrent indexing requests using a promise map to prevent race conditions, validating that environment-provided paths are absolute, resolving relative database paths against the active worktree root, wrapping database operations in transactions to ensure atomicity, and separating file-reading from database-writing to prevent masked errors.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| interface WorktreeContext { | ||
| repo: GitRepo; | ||
| store: CodeIndexStore; | ||
| dbPath: string; | ||
| /** Branches already indexed (or confirmed indexed) in this process. */ | ||
| ensured: Set<string>; | ||
| } |
There was a problem hiding this comment.
Add an indexingPromises map to WorktreeContext to track active indexing operations per branch and prevent concurrent indexing race conditions.
| interface WorktreeContext { | |
| repo: GitRepo; | |
| store: CodeIndexStore; | |
| dbPath: string; | |
| /** Branches already indexed (or confirmed indexed) in this process. */ | |
| ensured: Set<string>; | |
| } | |
| interface WorktreeContext { | |
| repo: GitRepo; | |
| store: CodeIndexStore; | |
| dbPath: string; | |
| /** Branches already indexed (or confirmed indexed) in this process. */ | |
| ensured: Set<string>; | |
| /** Active indexing promise per branch to prevent concurrent indexing. */ | |
| indexingPromises: Map<string, Promise<void>>; | |
| } |
| async function ensureIndexed(ctx: WorktreeContext, force = false): Promise<string> { | ||
| const branch = await ctx.repo.currentBranch(); | ||
| if (!force && ctx.ensured.has(branch)) { | ||
| return branch; | ||
| } | ||
| const status = await ctx.store.status(branch); | ||
| if (force || status.files === 0) { | ||
| await indexRepo(ctx.repo, ctx.store, { | ||
| extensions: deps.options.extensions, | ||
| logger: deps.logger | ||
| }); | ||
| } | ||
| ctx.ensured.add(branch); | ||
| return branch; | ||
| } |
There was a problem hiding this comment.
Coordinate concurrent indexing requests for the same branch using a promise map. Parallel tool execution by AI agents can trigger ensureIndexed concurrently, leading to race conditions, PRIMARY KEY violations, or database locking errors.
async function ensureIndexed(ctx: WorktreeContext, force = false): Promise<string> {
const branch = await ctx.repo.currentBranch();
if (!force && ctx.ensured.has(branch)) {
return branch;
}
let promise = ctx.indexingPromises.get(branch);
if (!promise) {
const performIndex = async () => {
const status = await ctx.store.status(branch);
if (force || status.files === 0) {
await indexRepo(ctx.repo, ctx.store, {
extensions: deps.options.extensions,
logger: deps.logger
});
}
ctx.ensured.add(branch);
};
promise = performIndex().finally(() => {
ctx.indexingPromises.delete(branch);
});
ctx.indexingPromises.set(branch, promise);
}
await promise;
return branch;
}| import { homedir, platform } from "node:os"; | ||
| import { join } from "node:path"; |
There was a problem hiding this comment.
Import isAbsolute from node:path to validate that environment-provided directory paths are absolute before using them.
| import { homedir, platform } from "node:os"; | |
| import { join } from "node:path"; | |
| import { homedir, platform } from "node:os"; | |
| import { isAbsolute, join } from "node:path"; |
References
- When reading directory paths from environment variables (such as APPDATA or XDG_STATE_HOME), validate that the path is absolute (e.g., using isAbsolute) before using it. This prevents empty or relative environment values from resolving paths relative to the process current working directory (CWD).
| export function cacheDir(): string { | ||
| const home = homedir(); | ||
| switch (platform()) { | ||
| case "darwin": | ||
| return join(home, "Library", "Caches", NAMESPACE); | ||
| case "win32": | ||
| return join(process.env.LOCALAPPDATA ?? join(home, "AppData", "Local"), NAMESPACE); | ||
| default: | ||
| return join(process.env.XDG_CACHE_HOME ?? join(home, ".cache"), NAMESPACE); | ||
| } | ||
| } |
There was a problem hiding this comment.
Validate that the directory paths read from environment variables (LOCALAPPDATA and XDG_CACHE_HOME) are absolute using isAbsolute before using them. This prevents empty or relative environment values from resolving paths relative to the process current working directory (CWD).
| export function cacheDir(): string { | |
| const home = homedir(); | |
| switch (platform()) { | |
| case "darwin": | |
| return join(home, "Library", "Caches", NAMESPACE); | |
| case "win32": | |
| return join(process.env.LOCALAPPDATA ?? join(home, "AppData", "Local"), NAMESPACE); | |
| default: | |
| return join(process.env.XDG_CACHE_HOME ?? join(home, ".cache"), NAMESPACE); | |
| } | |
| } | |
| export function cacheDir(): string { | |
| const home = homedir(); | |
| switch (platform()) { | |
| case "darwin": | |
| return join(home, "Library", "Caches", NAMESPACE); | |
| case "win32": { | |
| const appData = process.env.LOCALAPPDATA; | |
| const base = appData && isAbsolute(appData) ? appData : join(home, "AppData", "Local"); | |
| return join(base, NAMESPACE); | |
| } | |
| default: { | |
| const xdgCache = process.env.XDG_CACHE_HOME; | |
| const base = xdgCache && isAbsolute(xdgCache) ? xdgCache : join(home, ".cache"); | |
| return join(base, NAMESPACE); | |
| } | |
| } | |
| } |
References
- When reading directory paths from environment variables (such as APPDATA or XDG_STATE_HOME), validate that the path is absolute (e.g., using isAbsolute) before using it. This prevents empty or relative environment values from resolving paths relative to the process current working directory (CWD).
| import { mkdir } from "node:fs/promises"; | ||
| import { dirname } from "node:path"; |
There was a problem hiding this comment.
Import isAbsolute and resolve from node:path to handle relative user-provided database paths and resolve them against the worktree root.
| import { mkdir } from "node:fs/promises"; | |
| import { dirname } from "node:path"; | |
| import { mkdir } from "node:fs/promises"; | |
| import { dirname, isAbsolute, resolve } from "node:path"; |
| async function buildContext(worktree: string): Promise<WorktreeContext> { | ||
| const repo = makeRepo(worktree); | ||
| if (!(await repo.isRepo())) { | ||
| throw new Error(NOT_A_REPO); | ||
| } | ||
| const repoId = await repo.repoId(); | ||
| const dbPath = deps.options.dbPath ?? defaultDbPath(repoId); | ||
| if (dbPath !== ":memory:") { | ||
| await mkdir(dirname(dbPath), { recursive: true }); | ||
| } | ||
| const store = await openStore(dbPath); | ||
| return { repo, store, dbPath, ensured: new Set() }; | ||
| } |
There was a problem hiding this comment.
Resolve relative user-provided dbPath against the active worktree root instead of the process current working directory (CWD) to ensure consistency across different directories.
async function buildContext(worktree: string): Promise<WorktreeContext> {
const repo = makeRepo(worktree);
if (!(await repo.isRepo())) {
throw new Error(NOT_A_REPO);
}
const repoId = await repo.repoId();
let dbPath = deps.options.dbPath ?? defaultDbPath(repoId);
if (dbPath !== ":memory:" && !isAbsolute(dbPath)) {
dbPath = resolve(worktree, dbPath);
}
if (dbPath !== ":memory:") {
await mkdir(dirname(dbPath), { recursive: true });
}
const store = await openStore(dbPath);
return { repo, store, dbPath, ensured: new Set(), indexingPromises: new Map() };
}| /** Insert a parsed blob (blob row + its symbols + its edges) in one shot. */ | ||
| async insertBlob(blobSha: string, lang: string, extraction: Extraction): Promise<void> { | ||
| await this.conn.run("INSERT INTO blob VALUES (?, ?)", [blobSha, lang]); | ||
| for (const d of extraction.defs) { | ||
| await this.conn.run("INSERT INTO symbol VALUES (?, ?, ?, ?)", [ | ||
| blobSha, | ||
| d.name, | ||
| d.kind, | ||
| d.line | ||
| ]); | ||
| } | ||
| for (const e of extraction.refs) { | ||
| await this.conn.run("INSERT INTO ref VALUES (?, ?, ?, ?, ?, ?)", [ | ||
| blobSha, | ||
| e.caller, | ||
| e.dstName, | ||
| e.kind, | ||
| e.line, | ||
| e.confidence | ||
| ]); | ||
| } | ||
| } |
There was a problem hiding this comment.
Wrap the multiple database insert statements in a transaction to ensure atomicity. If any insert fails or the process is interrupted, the database won't be left in a corrupted or partially populated state.
/** Insert a parsed blob (blob row + its symbols + its edges) in one shot. */
async insertBlob(blobSha: string, lang: string, extraction: Extraction): Promise<void> {
await this.conn.run("BEGIN TRANSACTION");
try {
await this.conn.run("INSERT INTO blob VALUES (?, ?)", [blobSha, lang]);
for (const d of extraction.defs) {
await this.conn.run("INSERT INTO symbol VALUES (?, ?, ?, ?)", [
blobSha,
d.name,
d.kind,
d.line
]);
}
for (const e of extraction.refs) {
await this.conn.run("INSERT INTO ref VALUES (?, ?, ?, ?, ?, ?)", [
blobSha,
e.caller,
e.dstName,
e.kind,
e.line,
e.confidence
]);
}
await this.conn.run("COMMIT");
} catch (err) {
await this.conn.run("ROLLBACK");
throw err;
}
}| /** Replace a branch/root's manifest with `entries` (delete-then-insert). */ | ||
| async replaceManifest(branch: string, root: string, entries: ManifestEntry[]): Promise<void> { | ||
| await this.conn.run("DELETE FROM manifest WHERE branch = ? AND root = ?", [branch, root]); | ||
| for (const e of entries) { | ||
| await this.conn.run("INSERT INTO manifest VALUES (?, ?, ?, ?)", [ | ||
| branch, | ||
| root, | ||
| e.path, | ||
| e.blobSha | ||
| ]); | ||
| } | ||
| } |
There was a problem hiding this comment.
Wrap the delete-then-insert manifest replacement in a transaction to ensure atomicity and prevent leaving the branch manifest in a deleted or partially written state on failure.
/** Replace a branch/root's manifest with 'entries' (delete-then-insert). */
async replaceManifest(branch: string, root: string, entries: ManifestEntry[]): Promise<void> {
await this.conn.run("BEGIN TRANSACTION");
try {
await this.conn.run("DELETE FROM manifest WHERE branch = ? AND root = ?", [branch, root]);
for (const e of entries) {
await this.conn.run("INSERT INTO manifest VALUES (?, ?, ?, ?)", [
branch,
root,
e.path,
e.blobSha
]);
}
await this.conn.run("COMMIT");
} catch (err) {
await this.conn.run("ROLLBACK");
throw err;
}
}| try { | ||
| const source = await repo.readBlob(entry.blobSha); | ||
| const extraction = extractFromSource(source, lang); | ||
| await store.insertBlob(entry.blobSha, lang, extraction); | ||
| indexed++; | ||
| } catch (err) { | ||
| // Record the blob with no symbols so we don't retry a file we can't parse. | ||
| await store.insertBlob(entry.blobSha, lang, { defs: [], refs: [] }); | ||
| options.logger?.warn("code_index_blob_failed", { | ||
| path: entry.path, | ||
| blob: entry.blobSha, | ||
| error: err instanceof Error ? err.message : String(err) | ||
| }); | ||
| } |
There was a problem hiding this comment.
Separate the read/parse operations from the database write operation in the try-catch block. If a fatal database error occurs during insertBlob, attempting to write to the database again in the catch block is highly likely to fail or mask the real database issue as a parsing failure.
let extraction: Extraction;
try {
const source = await repo.readBlob(entry.blobSha);
extraction = extractFromSource(source, lang);
} catch (err) {
options.logger?.warn("code_index_blob_failed", {
path: entry.path,
blob: entry.blobSha,
error: err instanceof Error ? err.message : String(err)
});
// Record the blob with no symbols so we don't retry a file we can't parse.
try {
await store.insertBlob(entry.blobSha, lang, { defs: [], refs: [] });
} catch (dbErr) {
options.logger?.error("code_index_fallback_write_failed", {
blob: entry.blobSha,
error: dbErr instanceof Error ? dbErr.message : String(dbErr)
});
}
continue;
}
await store.insertBlob(entry.blobSha, lang, extraction);
indexed++;…ginConfig OpenCode has no typed `pluginConfig` field; plugin options are passed via the `[specifier, options]` tuple inside the `plugin` array (delivered as the 2nd-arg PluginOptions, which is what this plugin actually reads). The previous example used a non-existent `pluginConfig` block keyed by package name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
demo/demo.mjs indexes the current branch's HEAD tree into a throwaway DuckDB file (OS temp dir, cleaned up) and drives the real code_* tools exactly as OpenCode would, printing the model-facing text. Wired as the `demo` package script and documented with sample output in docs/code-index.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Record the three alternative-closing choices behind the code-index plugin in the MADR-style ADR format: - 0002: embedded DuckDB over a graph-DB server (Neo4j/Qdrant; Cozo abandoned) - 0003: content-addressed-by-blob + per-branch manifest indexing model - 0004: tree-sitter-only "sound but partial" call-graph resolution Index updated; docs/code-index.md cross-links them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rn-cf5900 # Conflicts: # CHANGELOG.md
Apply the Gemini review findings on #58 (all verified as real, not false positives): - tools.ts: guard concurrent indexing. OpenCode runs tools in parallel, so two first-touch calls on a branch could both run indexRepo (duplicate INSERTs / lock contention). A per-branch in-flight promise (`startIndex`) now de-dupes concurrent runs; index_refresh shares the same guard. - config.ts: only honor LOCALAPPDATA / XDG_CACHE_HOME when absolute (the `??` idiom doesn't catch ""), else fall back to home — no CWD-relative cache dir. - tools.ts: resolve a relative dbPath override against the worktree, not the CWD. - store.ts: wrap insertBlob and replaceManifest in a transaction (BEGIN/COMMIT/ ROLLBACK) so a mid-write failure leaves no partial rows. - indexer.ts: separate read/parse from the DB write so a genuine DB error propagates (fatal) instead of being masked as a parse failure and retried. Tests: +4 (concurrent-index de-dup, relative-dbPath resolution, env-path fallback, transaction rollback) + a fallback-write-failure case. 56 tests, coverage 98% stmts / 85% branches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
✅ AI Governance check passedThis PR declares AI usage, references a source of truth, and provides verification evidence. Thank you. |
Review addressed — all five findings were real (verified, not false positives)Merged
Added 5 tests (concurrent-index de-dup, relative-dbPath resolution, env-path fallback, transaction rollback, fallback-write-failure). 56 tests, 98% stmts / 85% branches. Full pre-push gate green. (Per the AI Governance doctrine just adopted in #59: findings treated as claims and verified against the cited lines before acting — here all five held up.) |
1. Summary
Adds
@vymalo/opencode-code-index— a personal, experimental, private (not published) OpenCode plugin that indexes a repo into an embedded DuckDB store and exposes structuralcode_*tools:code_symbol,code_callers,code_callees,code_references,code_blast_radius, plusindex_refresh/index_status.Source of truth: the design doc
plans/code-index.md— this PR implements it.path→blobmanifest); branch/worktree switches re-index only the delta andblast_radiusstays branch-correct (names resolve at query time).confidence-tagged edges for a future typed tier).docs/code-index.md,plans/code-index.md, ADR-0002/0003/0004, a runnabledemo/.2. Intent
Source of truth / design rationale: https://github.com/vymalo/opencode-oauth2/blob/claude/priceless-shtern-cf5900/plans/code-index.md
3. Scope
In Scope
packages/opencode-code-index), its tests, docs/ADRs, and the runnable demo.pnpm-workspace.yaml: allow tree-sitter native builds.Out of Scope
docs_search/ embeddings) — designed inplans/code-index.md§10, deferred.4. Verification
I verified this change by:
blast_radiusthrough real tree-sitter + DuckDB).demo/demo.mjs.Commands run:
Results:
6. Risk Assessment
Risk level: Low — a private, opt-in, non-published plugin; not wired into any other package or CI gate. Adds native tree-sitter build deps (DuckDB ships prebuilt).
7. AI Usage Declaration
AI was used for:
Human verification:
🤖 Generated with Claude Code