fix(config): consistent index_dir/source_root resolution for CLI and MCP#316
Merged
Conversation
…urce_root A config file living in a subdirectory of the Java tree resolved inconsistently between the CLI and the MCP server, so no single index_dir value worked for both. Two compounding bugs: 1. _resolve_index_dir_path resolved a YAML `index_dir` relative to the already-resolved `source_root`, while `source_root` itself resolved relative to the config file's directory. A `../` in index_dir was re-applied on top of source_root and overshot by one level (the "init indexes ~/" symptom). YAML index_dir now resolves against the config file's directory, the same base as source_root. CLI/env index_dir and the default stay source_root-relative (unchanged). 2. server.main() passed source_root=_project_root() (the walk-up- discovered config dir) to resolve_operator_config, routing into the branch that treats it as an explicit override and skips the YAML source_root field. The CLI passes source_root=None, which honors the field -- so the same config produced a different effective root for init vs MCP (the "mcp can't find the index" symptom). main() now passes _source_root_for_operator_config() (env-or-None), so the MCP server honors YAML source_root exactly like the CLI; JAVA_CODEBASE_RAG_SOURCE_ROOT still wins when set. With both fixes a config in my-context/ next to source_root: ../ index_dir: ../.java-codebase-rag resolves identically for init and the MCP server. Docs: CONFIGURATION.md index_dir base comment + tips updated. No ontology/embedding change; existing indexes remain valid. Co-Authored-By: Claude <noreply@anthropic.com>
HumanBean17
added a commit
that referenced
this pull request
Jun 14, 2026
… index run_update passed the discovered config dir as an explicit source_root to resolve_operator_config, routing it into the branch that SKIPS the YAML source_root field. With a config living in a subdir next to `source_root: ../`, update then indexed that subdir (no Java) against the real index one level up, so cocoindex treated every indexed file as removed and deleted them — the "Updating index (Lance + graph)..." hang, and the ever-growing Lance `_deletions` + 1000s+ increment after a ctrl+C left cocoindex.db mid-reconcile. This is the same bug class #316 fixed for the MCP server (its docstring warns that a non-None source_root skips the YAML field); run_update was the last production caller still passing a discovered dir. Pass source_root=None so the YAML source_root is honored exactly like increment/init/reprocess. run_install is unaffected (it passes the user-confirmed Java root). Adds a regression test mirroring the reported layout (config in my-project-context/, source_root: ../, real index one level up) that captures the env handed to cocoindex and asserts SOURCE_ROOT resolves to the YAML root, not the config dir. No schema, ontology, embedding, or env-var change. Existing indexes remain valid; no reindex required. Co-Authored-By: Claude <noreply@anthropic.com>
HumanBean17
added a commit
that referenced
this pull request
Jun 14, 2026
… index (#320) run_update passed the discovered config dir as an explicit source_root to resolve_operator_config, routing it into the branch that SKIPS the YAML source_root field. With a config living in a subdir next to `source_root: ../`, update then indexed that subdir (no Java) against the real index one level up, so cocoindex treated every indexed file as removed and deleted them — the "Updating index (Lance + graph)..." hang, and the ever-growing Lance `_deletions` + 1000s+ increment after a ctrl+C left cocoindex.db mid-reconcile. This is the same bug class #316 fixed for the MCP server (its docstring warns that a non-None source_root skips the YAML field); run_update was the last production caller still passing a discovered dir. Pass source_root=None so the YAML source_root is honored exactly like increment/init/reprocess. run_install is unaffected (it passes the user-confirmed Java root). Adds a regression test mirroring the reported layout (config in my-project-context/, source_root: ../, real index one level up) that captures the env handed to cocoindex and asserts SOURCE_ROOT resolves to the YAML root, not the config dir. No schema, ontology, embedding, or env-var change. Existing indexes remain valid; no reindex required. Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A
.java-codebase-rag.ymlliving in a subdirectory of the Java tree (e.g.my-project-context/) resolved its relative paths inconsistently between the CLI (init/increment/reprocess) and the MCP server. No singleindex_dirvalue worked for both.Given the layout:
initsource_root: ../+index_dir: ../.java-codebase-rag~/(one level too high)source_root: ../+index_dir: .java-codebase-ragAfter this PR, both rows resolve identically and correctly:
source_root=MyProject,index_dir=MyProject/.java-codebase-rag.Root cause
Two compounding bugs in config resolution.
1.
index_dirandsource_rootused different bases (java_codebase_rag/config.py)source_rootresolved relative to the config file's directory (config_dir), butindex_dirresolved relative to the already-resolvedsource_root. So a../written inindex_dir(intended relative to the config file) was re-applied on top ofsource_rootand overshot by one level →~/.java-codebase-rag. The docs even contradicted themselves (CONFIGURATION.mdsaid source_root is "relative to the config file's parent directory" but index_dir is "relative to source_root").Fix: a YAML
index_dirnow resolves againstconfig_dir— the same base assource_root. CLI/envindex_dirand the default./.java-codebase-ragstaysource_root-relative (unchanged), so the common case (config at project root) is unaffected.2. The MCP server ignored the YAML
source_rootfield (server.py)main()calledresolve_operator_config(source_root=_project_root()), and_project_root()returns the walk-up-discovered config dir (non-None). A non-Nonesource_rootroutes into the "explicit override" branch that skips the YAMLsource_rootfield. The CLI passessource_root=None, which honors the field — so the same config file produced a different effective root forinitvs the MCP server.Fix:
main()now passes_source_root_for_operator_config()(JAVA_CODEBASE_RAG_SOURCE_ROOT-or-None). When the env override is unset, the MCP server runs the same walk-up + YAML-source_root-honoring path as the CLI.JAVA_CODEBASE_RAG_SOURCE_ROOTstill wins when set._project_root()is kept for the_resolve_lancedb_uri()fallback only.Verification
resolve_operator_configbefore and after the fix; init and MCP now agree.tests/test_config.py— 2 new tests (YAMLindex_dirresolves against config dir, for both../and bare forms).tests/test_mcp_server_project_root.py— 3 new tests (_source_root_for_operator_config()env-or-None semantics + init/MCP parity regression)..venv/bin/ruff check .— clean..venv/bin/python -m pytest tests(noJAVA_CODEBASE_RAG_RUN_HEAVY) — 774 passed, 11 skipped.User-visible behaviour changes
index_dirwritten relative to the config file now resolves against the config file's directory (previously: againstsource_root). This is a breaking change for configs where the config file lives in a subdirectory and a relativeindex_dirwas specified — but the old behaviour was already inconsistent betweeninitand MCP, so any such config was already broken for one of the two. The common case (config at project root, or no explicitindex_dir) is unchanged.source_rootfield it previously ignored (fixes the init/MCP divergence; also fixes thesource_rootused for microservice/scope detection).Scope / impact
mcp.json.exampleand the README zero-env-var note remain accurate.propose/active/doc: this is a bounded 2-file bugfix (plus docs + tests) with the approach pre-approved in the investigation, not a feature/schema change.🤖 Generated with Claude Code