feat: add disk-based embedding cache by BeamNawapat · Pull Request #331 · zilliztech/claude-context

BeamNawapat · 2026-04-25T09:14:56Z

Summary

Add a transparent, disk-based embedding cache to skip redundant embedding API calls when re-indexing the same code. On a re-index, only chunks whose content has not been embedded before hit the API; cached chunks load from disk in milliseconds.

Motivation

Embedding API calls are the slowest and most expensive step in indexing. When a codebase is re-indexed (after force: true, switching machines, or recovering from a failed run), every chunk is sent through the embedding provider again — even if its content is byte-identical to a previous run.

In practice this wastes:

API quota / cost — large monorepos can re-burn $1–$5 per re-index
Latency — re-indexing a 10k-file repo against VoyageAI takes minutes purely for embeddings
Provider rate limits — easy to hit on Gemini / OpenAI free tiers

A small content-addressed cache keyed by SHA256(content) per (provider, dimension) eliminates this waste entirely for unchanged chunks.

Changes

packages/core/src/embedding/embedding-cache.ts (new, ~130 LOC) — EmbeddingCache class with get, set, getBatch, cleanup methods. Storage: ~/.context/embedding-cache/{provider}_{dimension}/XX/{sha256}.json (hierarchical to avoid single-dir overflow). No external dependencies — uses Node fs / crypto / path / os.
packages/core/src/embedding/index.ts — Export EmbeddingCache.
packages/core/src/context.ts — Initialize EmbeddingCache in constructor keyed by ${provider}_${dimension}. New private cachedEmbedBatch() wraps embedding.embedBatch(): returns cached vectors instantly, only sends uncached chunks to the API. Indexing path now calls cachedEmbedBatch() instead of embedding.embedBatch() directly. Async TTL cleanup runs once on startup (non-blocking).

Behavior:

Per-model isolation prevents cross-contamination when switching providers (e.g., voyage-code-3 vs text-embedding-3-small).
Best-effort design: any cache I/O error falls back to a normal API call. Corrupted JSON, missing files, permission errors all degrade gracefully.
Hit rate logged per batch: [Cache] 75% hit (3/4 cached, 1 embedded).
Stale entries auto-removed on startup based on EMBEDDING_CACHE_MAX_AGE_DAYS (default 30).

Configuration

Env var	Default	Purpose
`EMBEDDING_CACHE`	`true`	Enable/disable. Set to `false` to opt out completely.
`EMBEDDING_CACHE_DIR`	`~/.context/embedding-cache`	Storage location.
`EMBEDDING_CACHE_MAX_AGE_DAYS`	`30`	TTL for cleanup-on-startup. Set `0` to disable cleanup.

Usage

Zero-config — cache is on by default. Re-indexing the same content shows the hit rate:

[Context] 💾 Embedding cache enabled for model: VoyageAI_1024
[Cache] ✅ All 47 embeddings from cache
[Cache] 88% hit (44/50 cached, 6 embedded)

Disable temporarily:

EMBEDDING_CACHE=false npx @zilliz/claude-context-mcp@latest

Move cache to a shared location:

EMBEDDING_CACHE_DIR=/mnt/team-cache/embeddings npx @zilliz/claude-context-mcp@latest

Test plan

pnpm build passes (core + mcp)
Index a small repo, then re-index → cache hit rate should be ~100%
Edit one file, re-index → only changed chunks re-embedded
Switch EMBEDDING_PROVIDER → new cache directory created, old one untouched
EMBEDDING_CACHE=false → no cache directory created, no [Cache] logs
Delete ~/.context/embedding-cache/ mid-run → next batch falls back to API gracefully

Notes for reviewers

Cache files are JSON {"v": [vector...], "d": dimension} — small (~6KB per 1024-dim float vector) but consider compression in a follow-up if storage becomes a concern.
No write locking; concurrent indexers writing the same key would race, but the result is functionally identical so this is intentionally not guarded.
Old cache entries from previous models are not deleted on provider switch (only TTL cleanup applies). Trade-off: simpler logic vs slightly larger disk usage. A clear_cache MCP tool could be added in a follow-up.

Cache embedding vectors to ~/.context/embedding-cache/ keyed by SHA256(content) per model. On re-index, only uncached chunks hit the API — cached chunks load from disk instantly. Logs cache hit rate per batch. Disable with EMBEDDING_CACHE=false.

Delete cached embeddings not modified in 30 days (configurable via EMBEDDING_CACHE_MAX_AGE_DAYS). Runs async on startup, non-blocking. Removes empty prefix directories after cleanup.

Copilot

Pull request overview

Adds a transparent, disk-based embedding cache to reduce redundant embedding API calls during re-indexing by persisting embeddings keyed by content hash and routing batch embedding through the cache.

Changes:

Introduces EmbeddingCache for disk-backed get/set/getBatch/cleanup of embeddings.
Exports the cache from the embedding module and wires it into Context to cache embedBatch() results.
Adds startup cache initialization + background cleanup and logs cache hit rates.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
packages/core/src/embedding/index.ts	Exports the new `EmbeddingCache` entrypoint.
packages/core/src/embedding/embedding-cache.ts	Implements the disk-backed embedding cache and TTL cleanup.
packages/core/src/context.ts	Instantiates the cache and routes chunk batch embedding through it.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

        }
+
+        // Initialize embedding cache
+        const cacheModel = `${this.embedding.getProvider()}_${this.embedding.getDimension()}`;


+        const uncachedTexts = uncachedIndices.map(i => contents[i]);
+        const newEmbeddings = await this.embedding.embedBatch(uncachedTexts);
+
+        for (let j = 0; j < uncachedIndices.length; j++) {
+            results[uncachedIndices[j]] = newEmbeddings[j];
+            this.embeddingCache.set(contents[uncachedIndices[j]], newEmbeddings[j]);


+
+    private getCachePath(contentHash: string): string {
+        const prefix = contentHash.slice(0, 2);
+        return path.join(this.cacheDir, prefix, contentHash.slice(0, 12) + '.json');


+    async cleanup(maxAgeDays?: number): Promise<void> {
+        if (!this.enabled) return;
+
+        const days = maxAgeDays ?? parseInt(envManager.get('EMBEDDING_CACHE_MAX_AGE_DAYS') || '30', 10);


+            const prefixDirs = fs.readdirSync(this.cacheDir);
+            for (const prefix of prefixDirs) {
+                const prefixPath = path.join(this.cacheDir, prefix);
+                if (!fs.statSync(prefixPath).isDirectory()) continue;
+
+                const files = fs.readdirSync(prefixPath);
+                for (const file of files) {
+                    const filePath = path.join(prefixPath, file);
+                    const stat = fs.statSync(filePath);
+                    if (stat.mtimeMs < cutoff) {
+                        fs.unlinkSync(filePath);
+                        deleted++;
+                    }
+                }
+
+                // Remove empty prefix dirs
+                if (fs.readdirSync(prefixPath).length === 0) {
+                    fs.rmdirSync(prefixPath);


+            const data = JSON.parse(fs.readFileSync(cachePath, 'utf-8'));
+            return { vector: data.v, dimension: data.d };


- Use full SHA256 (64 chars) in filename, not 12-char prefix (collision risk) - Validate JSON shape in get() (Array.isArray, dimension match) and pass expectedDimension to constructor for stricter cross-model isolation - cleanup() now uses fs.promises (truly async, no event-loop block) - cleanup() guards maxAgeDays <= 0 / non-finite (prevents purge-everything) - updateEmbedding() now reinitializes the cache so model switches don't serve stale vectors from the previous model - cachedEmbedBatch() dedupes duplicate strings within a single batch so identical chunks don't each hit the API

BeamNawapat · 2026-04-25T10:24:49Z

Addressed Copilot review feedback in ea99ede:

Full SHA256 in filename (no truncation/collision risk)
get() validates JSON shape (Array.isArray, dimension match)
cleanup() rewritten with fs.promises (truly async, no event-loop block) and guards maxAgeDays <= 0
updateEmbedding() reinitializes the cache so model switches can't serve stale vectors
cachedEmbedBatch() dedupes duplicate strings within a single batch so identical chunks share one API call

All checks still green.

BeamNawapat added 2 commits April 25, 2026 16:10

feat: auto-cleanup stale embedding cache files on startup

b37fb63

Delete cached embeddings not modified in 30 days (configurable via EMBEDDING_CACHE_MAX_AGE_DAYS). Runs async on startup, non-blocking. Removes empty prefix directories after cleanup.

Copilot AI review requested due to automatic review settings April 25, 2026 09:14

Copilot started reviewing on behalf of BeamNawapat April 25, 2026 09:15 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add disk-based embedding cache#331

feat: add disk-based embedding cache#331
BeamNawapat wants to merge 3 commits intozilliztech:masterfrom
BeamNawapat:pr/embedding-cache

BeamNawapat commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

BeamNawapat commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const data = JSON.parse(fs.readFileSync(cachePath, 'utf-8'));
		return { vector: data.v, dimension: data.d };

Conversation

BeamNawapat commented Apr 25, 2026

Summary

Motivation

Changes

Configuration

Usage

Test plan

Notes for reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

BeamNawapat commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants