Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ If vector hits come back and graph expansion adds neighbor symbols, the install

## Wire into an MCP host

The server discovers your project automatically: it walks up from cwd looking for `.java-codebase-rag.yml` (or `.yaml`), like git finds `.git`. No env vars required if you have a YAML config in your project tree. For full precedence details, see [`docs/CONFIGURATION.md`](./docs/CONFIGURATION.md).

### Claude Code

With the package installed, the console script `java-codebase-rag-mcp` is on your `PATH`. Register it project-scoped:
Expand All @@ -92,7 +94,7 @@ With the package installed, the console script `java-codebase-rag-mcp` is on you
claude mcp add --transport stdio java-codebase-rag -- java-codebase-rag-mcp
```

Then set env vars (`JAVA_CODEBASE_RAG_INDEX_DIR`, `JAVA_CODEBASE_RAG_SOURCE_ROOT`, `SBERT_MODEL`, …) in `.mcp.json` or your shell profile. For a project-scoped `.mcp.json` template, see [`mcp.json.example`](./mcp.json.example). Official docs: [Claude Code settings](https://docs.anthropic.com/en/docs/claude-code/settings).
No env vars needed — the server walks up from cwd to find `.java-codebase-rag.yml`. For a minimal `.mcp.json` template, see [`mcp.json.example`](./mcp.json.example). Official docs: [Claude Code settings](https://docs.anthropic.com/en/docs/claude-code/settings).

### Claude Desktop

Expand All @@ -102,16 +104,14 @@ Edit `claude_desktop_config.json` (macOS: `~/Library/Application Support/Claude/
{
"mcpServers": {
"java-codebase-rag": {
"command": "java-codebase-rag-mcp",
"env": {
"JAVA_CODEBASE_RAG_INDEX_DIR": "/ABSOLUTE/PATH/TO/.java-codebase-rag",
"JAVA_CODEBASE_RAG_SOURCE_ROOT": "/ABSOLUTE/PATH/TO/your-java-project"
}
"command": "java-codebase-rag-mcp"
}
}
}
```

The server discovers the project via walk-up from the cwd of the MCP host process. If your Java project is not the cwd, either set `JAVA_CODEBASE_RAG_SOURCE_ROOT` in the `env` block or add a `source_root` field to `.java-codebase-rag.yml` (see [`docs/CONFIGURATION.md`](./docs/CONFIGURATION.md)).

See [`mcp.json.example`](./mcp.json.example) for the same shape in `.mcp.json` (Claude Code project-scoped) form.

### Driving the MCP from an agent
Expand Down
35 changes: 30 additions & 5 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,22 @@ For the architecture rationale (the GPS metaphor, three-layer design, future wor

The operator-facing surface is **six** variables (plus MCP-only `JAVA_CODEBASE_RAG_SOURCE_ROOT` below). Precedence for knobs that also exist as CLI flags or YAML entries is **CLI flag > env var > YAML > built-in default** (see [`JAVA-CODEBASE-RAG-CLI.md`](./JAVA-CODEBASE-RAG-CLI.md)).

### Source root discovery and precedence

The server and CLI resolve the effective Java source root through a precedence chain:

| Priority | Source | How it resolves |
|---|---|---|
| 1 (highest) | CLI `--source-root` | Absolute, or relative to cwd |
| 2 | `JAVA_CODEBASE_RAG_SOURCE_ROOT` env var | Absolute, or relative to cwd |
| 3 | YAML `source_root` field | **Relative to the config file directory** (not cwd) |
| 4 | Walk-up discovery | Walk from cwd upward to find `.java-codebase-rag.yml`; uses the config file's directory as source root |
| 5 (lowest) | cwd | No config found, no YAML override |

Walk-up checks each directory from cwd upward for `.java-codebase-rag.yml` or `.java-codebase-rag.yaml`. The **first match wins** (closest to cwd). The walk stops at `$HOME` (inclusive — `$HOME` itself is checked) or the filesystem root. This mirrors how git finds `.git`.

### Variables

| Variable | Purpose |
|---|---|
| `JAVA_CODEBASE_RAG_INDEX_DIR` | Local filesystem **directory** for Lance tables, the Kuzu file `code_graph.kuzu`, and cocoindex state (`cocoindex.db`). Not a `lancedb://` or cloud URI — use a path. Default: `./.java-codebase-rag/` under the resolved Java tree root. |
Expand All @@ -31,14 +47,14 @@ The operator-facing surface is **six** variables (plus MCP-only `JAVA_CODEBASE_R
| `JAVA_CODEBASE_RAG_RUN_HEAVY` | Test gate: set to `1` / `true` / `yes` to run the slow cocoindex + Lance end-to-end test (`pytest`); not used in normal operator workflows. |
| `JAVA_CODEBASE_RAG_HINTS_ENABLED` | When `0` / `false` / `no`, suppress `hints_structured` and `advisories` from all MCP tool responses. Overridable via `.java-codebase-rag.yml` `hints.enabled`. Default: enabled. |

**MCP host launchers** also set `JAVA_CODEBASE_RAG_SOURCE_ROOT` to the Java repository root when it differs from the server process cwd (see `mcp.json.example` in the repo root).
**MCP host launchers** also set `JAVA_CODEBASE_RAG_SOURCE_ROOT` to the Java repository root when it differs from the server process cwd (see `mcp.json.example` in the repo root). When the env var is unset, the server walks up from cwd to discover the config automatically.

Only the names in the table above (plus `JAVA_CODEBASE_RAG_SOURCE_ROOT` for MCP hosts) are read as configuration. Project config belongs in **`.java-codebase-rag.yml`** (or `.yaml`).

**Paths and conventions** (for scripts and operators):

- **`JAVA_CODEBASE_RAG_INDEX_DIR`** — filesystem path to the index directory (not a URI). Lance opens this directory; Kuzu is always `<index-dir>/code_graph.kuzu`; cocoindex keeps **`cocoindex.db`** next to them.
- **Java tree root** — CLI: `--source-root` (else cwd). MCP stdio: set `JAVA_CODEBASE_RAG_SOURCE_ROOT` when the Java repo root differs from the server process cwd.
- **Java tree root** — CLI: `--source-root` (else walk-up discovery, else cwd). MCP stdio: `JAVA_CODEBASE_RAG_SOURCE_ROOT` env var (else walk-up from cwd). YAML: `source_root` field resolved relative to the config file directory.
- **`microservice_roots`** — configure only under **`microservice_roots:`** in `.java-codebase-rag.yml` (or `.yaml`).
- **Chunk context diagnostics / heavy tests** — `JAVA_CODEBASE_RAG_DEBUG_CONTEXT`, `JAVA_CODEBASE_RAG_RUN_HEAVY` (see the table above).

Expand All @@ -48,16 +64,24 @@ Python package: **`java_codebase_rag`** (`python -m java_codebase_rag.cli`).

## 2. Project YAML reference (`.java-codebase-rag.yml`)

A single file at the project root (the directory you pass as `--source-root`, or cwd) holds everything that isn't an environment variable. The two accepted filenames are `.java-codebase-rag.yml` and `.java-codebase-rag.yaml`; if both exist, `.yml` wins.
A single file at the project root (the directory you pass as `--source-root`, or discovered via walk-up, or cwd) holds everything that isn't an environment variable. The two accepted filenames are `.java-codebase-rag.yml` and `.java-codebase-rag.yaml`; if both exist, `.yml` wins.

**All keys are optional.** A project with no YAML at all uses built-in defaults plus env vars. Add only the keys you need.

```yaml
# .java-codebase-rag.yml — full reference, every key annotated.
# Place at the project root (same directory you pass as --source-root).
# Place at the project root (same directory you pass as --source-root),
# or anywhere above it — the server walks up from cwd to find it.

# -------- Core knobs (mirror env vars; precedence: CLI > env > YAML > default) --------

# Source root: where your Java source tree lives. When set, resolves relative to
# this config file's directory (not cwd). Useful when the config file lives outside
# the Java tree (e.g. in a monorepo root above multiple Java projects).
# When omitted, defaults to the directory containing this config file (found via walk-up).
# CLI: --source-root. Env: JAVA_CODEBASE_RAG_SOURCE_ROOT.
source_root: ./my-java-project

# Index directory: where Lance tables, code_graph.kuzu, and cocoindex.db live.
# - Tilde (`~`) is expanded; `$VAR` is NOT (use absolute paths or `~`).
# - Relative paths resolve against source_root, not cwd.
Expand Down Expand Up @@ -171,6 +195,7 @@ async_producer_overrides:

| Field | Expanded? | Notes |
|---|---|---|
| `source_root` | partial | `~` expanded; `$VAR` is NOT expanded. Relative paths resolve against the **config file directory** (not cwd). |
| `index_dir` | partial | `~` expanded; `$VAR` is NOT expanded. Relative paths resolve against `source_root`. |
| `embedding.model` (when path-shaped) | yes | Path-shape = starts with `/`, `./`, `../`, `~`, or contains `$`. Plain `org/name` is treated as a hub id and passed through. Applies to the value after CLI > env > YAML > default precedence. Long-lived MCP hosts also apply the same expansion when reading `SBERT_MODEL` from the process environment (so table metadata and search agree with `index_common` defaults). |
| `embedding.device` | n/a | Device strings (`cpu`, `cuda`, `mps`) aren't paths. |
Expand All @@ -179,7 +204,7 @@ async_producer_overrides:

**Tips & gotchas:**

- **The file must be at `source_root`**, not in `$HOME`. The MCP server reads `JAVA_CODEBASE_RAG_SOURCE_ROOT` to find it; the CLI uses `--source-root` (else cwd).
- **The file is discovered by walking up from cwd** — like git finds `.git`. Place it at or above your project root. The walk stops at `$HOME` (inclusive). You can also set `JAVA_CODEBASE_RAG_SOURCE_ROOT` or use `--source-root` to bypass discovery entirely.
- **Don't commit secrets** into this YAML — it sits next to your source tree and is read by every operator who clones it.
- **Rebuild after editing brownfield overrides.** Run a full `java-codebase-rag reprocess` (no flags) so Lance and Kuzu stay coherent, or use `--graph-only` / `--vectors-only` when you know only one store needs invalidation. Editing `embedding.model` requires a vector rebuild (`reprocess` or `--vectors-only`).
- **Diagnose what's loaded.** `java-codebase-rag meta` prints the resolved config and each value's `*_source` (`cli` / `env` / `yaml` / `default`) — see `embedding_model_source`, `embedding_device_source`, `index_dir_source`.
Expand Down
16 changes: 14 additions & 2 deletions java_codebase_rag/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from java_codebase_rag.config import (
ResolvedOperatorConfig,
describe_path_sizes,
discover_project_root,
emit_legacy_env_hints_if_present,
emit_legacy_yaml_hint_if_needed,
index_dir_has_existing_artifacts,
Expand Down Expand Up @@ -231,6 +232,18 @@ def _cmd_init(args: argparse.Namespace) -> int:
cfg = _resolved_from_ns(args)
_startup_hints(cfg)
cfg.apply_to_os_environ()
parent_cfg_dir = discover_project_root(cfg.source_root.parent)
if parent_cfg_dir is not None:
from java_codebase_rag.config import YAML_CONFIG_FILENAMES

for name in YAML_CONFIG_FILENAMES:
if (parent_cfg_dir / name).is_file():
print(
f"Warning: found existing config at {parent_cfg_dir / name}. "
"Creating a new project here will create a separate index.",
file=sys.stderr,
)
break
occupied, paths = index_dir_has_existing_artifacts(cfg.index_dir)
if occupied:
_emit(
Expand Down Expand Up @@ -521,13 +534,12 @@ def _cmd_tables(args: argparse.Namespace) -> int:


def _cmd_diagnose_ignore(args: argparse.Namespace) -> int:
import server # lazy
from path_filtering import LayeredIgnore # lazy

cfg = _resolved_from_ns(args)
_startup_hints(cfg)
cfg.apply_to_os_environ()
root = server._project_root()
root = cfg.source_root
raw = Path(args.path)
try:
abs_path = raw.resolve() if raw.is_absolute() else (root / raw).resolve()
Expand Down
53 changes: 49 additions & 4 deletions java_codebase_rag/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,28 @@ def emit_legacy_yaml_hint_if_needed(source_root: Path) -> None:
return


def discover_project_root(start: Path) -> Path | None:
"""Walk from *start* upward looking for a YAML config file.

Returns the directory containing the first matching config file
(closest to *start*), or ``None`` if no config is found before
reaching ``$HOME`` (inclusive — ``$HOME`` itself is checked) or
the filesystem root.
"""
home = Path.home().resolve()
cur = start.resolve()
while True:
for name in YAML_CONFIG_FILENAMES:
if (cur / name).is_file():
return cur
if cur == home:
return None
parent = cur.parent
if parent == cur:
return None
cur = parent


def find_yaml_config_file(source_root: Path) -> Path | None:
for name in YAML_CONFIG_FILENAMES:
p = source_root / name
Expand Down Expand Up @@ -277,10 +299,33 @@ def resolve_operator_config(
cli_embedding_model: str | None = None,
cli_embedding_device: str | None = None,
) -> ResolvedOperatorConfig:
root = (source_root or Path.cwd()).expanduser().resolve()
yaml_dict = load_yaml_mapping(root)
# Phase 1 — find the config file directory.
if source_root is not None:
config_dir = source_root.expanduser().resolve()
else:
discovered = discover_project_root(Path.cwd())
if discovered is not None:
config_dir = discovered
else:
config_dir = Path.cwd().resolve()

yaml_dict = load_yaml_mapping(config_dir)

# Phase 2 — resolve effective source root.
env_root = os.environ.get(ENV_SOURCE_ROOT, "").strip()
if source_root is not None:
effective_root = source_root.expanduser().resolve()
elif env_root:
effective_root = Path(env_root).expanduser().resolve()
else:
yaml_sr = yaml_dict.get("source_root")
if isinstance(yaml_sr, str) and yaml_sr.strip():
effective_root = (config_dir / Path(yaml_sr.strip()).expanduser()).resolve()
else:
effective_root = config_dir

index_dir, index_src = _resolve_index_dir_path(
source_root=root, cli_index_dir=cli_index_dir, yaml_dict=yaml_dict
source_root=effective_root, cli_index_dir=cli_index_dir, yaml_dict=yaml_dict
)
model, model_src = _pick_str(
cli_val=cli_embedding_model,
Expand All @@ -304,7 +349,7 @@ def resolve_operator_config(
ku = index_dir / "code_graph.kuzu"
coco = index_dir / "cocoindex.db"
return ResolvedOperatorConfig(
source_root=root,
source_root=effective_root,
index_dir=index_dir,
kuzu_path=ku,
cocoindex_db=coco,
Expand Down
37 changes: 31 additions & 6 deletions mcp.json.example
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,37 @@
"mcpServers": {
"java-codebase-rag": {
"type": "stdio",
"command": "java-codebase-rag-mcp",
"env": {
"JAVA_CODEBASE_RAG_INDEX_DIR": "/ABSOLUTE/PATH/TO/.java-codebase-rag",
"JAVA_CODEBASE_RAG_SOURCE_ROOT": "/ABSOLUTE/PATH/TO/your-java-project",
"SBERT_MODEL": "sentence-transformers/all-MiniLM-L6-v2"
}
"command": "java-codebase-rag-mcp"
}
}
}

// ──────────────────────────────────────────────────────────────────────────────
// 1. MINIMAL CONFIG — no env vars required
//
// Requires a `.java-codebase-rag.yml` in (or above) your Java project root.
// The server walks up from cwd to find the config file (like git finds .git).
// Run `java-codebase-rag init` from the project root first to create the index.
//
// Claude Code: drop this as `.mcp.json` in your project root.
// Claude Desktop: paste into ~/Library/Application Support/Claude/claude_desktop_config.json
// and add `"cwd": "/path/to/your-java-project"` inside the server block.
// ──────────────────────────────────────────────────────────────────────────────

// ──────────────────────────────────────────────────────────────────────────────
// 2. FULL CONFIG — explicit env vars (works without .java-codebase-rag.yml)
//
// {
// "mcpServers": {
// "java-codebase-rag": {
// "type": "stdio",
// "command": "java-codebase-rag-mcp",
// "env": {
// "JAVA_CODEBASE_RAG_INDEX_DIR": "/ABSOLUTE/PATH/TO/.java-codebase-rag",
// "JAVA_CODEBASE_RAG_SOURCE_ROOT": "/ABSOLUTE/PATH/TO/your-java-project",
// "SBERT_MODEL": "sentence-transformers/all-MiniLM-L6-v2"
// }
// }
// }
// }
// ──────────────────────────────────────────────────────────────────────────────
13 changes: 9 additions & 4 deletions server.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
emit_vectors_finish,
emit_vectors_start,
)
from java_codebase_rag.config import emit_legacy_env_hints_if_present, resolved_sbert_model_for_process_env, resolve_operator_config
from java_codebase_rag.config import discover_project_root, emit_legacy_env_hints_if_present, resolved_sbert_model_for_process_env, resolve_operator_config
from kuzu_queries import KuzuGraph, resolve_kuzu_path
from mcp.server.fastmcp import FastMCP
from pydantic import BaseModel, Field
Expand Down Expand Up @@ -94,7 +94,7 @@ class IndexInfoOutput(BaseModel):
def _resolve_lancedb_uri() -> str:
raw = os.environ.get("JAVA_CODEBASE_RAG_INDEX_DIR", "").strip()
if not raw:
raw = str((Path.cwd() / ".java-codebase-rag").resolve())
raw = str((_project_root() / ".java-codebase-rag").resolve())
p = Path(raw).expanduser()
if not str(raw).startswith(("s3://", "gs://", "az://")):
try:
Expand All @@ -108,6 +108,9 @@ def _project_root() -> Path:
env = os.environ.get("JAVA_CODEBASE_RAG_SOURCE_ROOT", "").strip()
if env:
return Path(env).expanduser().resolve()
discovered = discover_project_root(Path.cwd())
if discovered is not None:
return discovered
return Path.cwd().resolve()


Expand Down Expand Up @@ -575,8 +578,10 @@ def main() -> None:

# Load YAML config and apply embedding settings to environment
# This ensures SBERT_MODEL and SBERT_DEVICE from .java-codebase-rag.yml are available
# before any tool handler runs (same behavior as CLI path)
cfg = resolve_operator_config(source_root=_project_root())
# before any tool handler runs (same behavior as CLI path).
# Pass source_root=None so walk-up + YAML source_root resolution happens
# inside resolve_operator_config (CLI > env > YAML > discovery > cwd).
cfg = resolve_operator_config(source_root=None)
cfg.apply_to_os_environ()
mcp_v2.set_hints_enabled(cfg.hints_enabled)

Expand Down
Loading
Loading