English | Français | Español | Deutsch | Italiano | Português | Nederlands | Polski | Русский | 日本語 | 中文 | العربية | 한국어
Permanent memory for AI agents. Single binary, zero dependencies, MCP native.
ICM gives your AI agent a real memory — not a note-taking tool, not a context manager, a memory.
ICM (Infinite Context Memory)
┌──────────────────────┬─────────────────────────┐
│ MEMORIES (Topics) │ MEMOIRS (Knowledge) │
│ │ │
│ Episodic, temporal │ Permanent, structured │
│ │ │
│ ┌───┐ ┌───┐ ┌───┐ │ ┌───┐ │
│ │ m │ │ m │ │ m │ │ │ C │──depends_on──┐ │
│ └─┬─┘ └─┬─┘ └─┬─┘ │ └───┘ │ │
│ │decay │ │ │ │ refines ┌─▼─┐│
│ ▼ ▼ ▼ │ ┌─▼─┐ │ C ││
│ weight decreases │ │ C │──part_of──>└───┘│
│ over time unless │ └───┘ │
│ accessed/critical │ Concepts + Relations │
├──────────────────────┴─────────────────────────┤
│ SQLite + FTS5 + sqlite-vec │
│ Hybrid search: BM25 (30%) + cosine (70%) │
└─────────────────────────────────────────────────┘
Two memory models:
- Memories — store/recall with temporal decay by importance. Critical memories never fade, low-importance ones decay naturally. Filter by topic or keyword.
- Memoirs — permanent knowledge graphs. Concepts linked by typed relations (
depends_on,contradicts,superseded_by, ...). Filter by label. - Feedback — record corrections when AI predictions are wrong. Search past mistakes before making new predictions. Closed-loop learning.
# Homebrew (macOS / Linux)
brew tap rtk-ai/tap && brew install icm
# Quick install
curl -fsSL https://raw.githubusercontent.com/rtk-ai/icm/main/install.sh | sh
# From source
cargo install --path crates/icm-cli# Auto-detect and configure all supported tools
icm initConfigures 14 tools in one command:
| Tool | Config file | Format |
|---|---|---|
| Claude Code | ~/.claude.json |
JSON |
| Claude Desktop | ~/Library/.../claude_desktop_config.json |
JSON |
| Cursor | ~/.cursor/mcp.json |
JSON |
| Windsurf | ~/.codeium/windsurf/mcp_config.json |
JSON |
| VS Code / Copilot | ~/Library/.../Code/User/mcp.json |
JSON |
| Gemini Code Assist | ~/.gemini/settings.json |
JSON |
| Zed | ~/.zed/settings.json |
JSON |
| Amp | ~/.config/amp/settings.json |
JSON |
| Amazon Q | ~/.aws/amazonq/mcp.json |
JSON |
| Cline | VS Code globalStorage | JSON |
| Roo Code | VS Code globalStorage | JSON |
| Kilo Code | VS Code globalStorage | JSON |
| OpenAI Codex CLI | ~/.codex/config.toml |
TOML |
| OpenCode | ~/.config/opencode/opencode.json |
JSON |
Or manually:
# Claude Code
claude mcp add icm -- icm serve
# Compact mode (shorter responses, saves tokens)
claude mcp add icm -- icm serve --compact
# Any MCP client: command = "icm", args = ["serve"]icm init --mode skillInstalls slash commands and rules for Claude Code (/recall, /remember), Cursor (.mdc rule), Roo Code (.md rule), and Amp (/icm-recall, /icm-remember).
icm init --mode hookInstalls all 3 extraction layers as Claude Code hooks:
Claude Code hooks:
| Hook | Event | What it does |
|---|---|---|
icm hook pre |
PreToolUse | Auto-allow icm CLI commands (no permission prompt) |
icm hook post |
PostToolUse | Extract facts from tool output every 15 calls |
icm hook compact |
PreCompact | Extract memories from transcript before context compression |
icm hook prompt |
UserPromptSubmit | Inject recalled context at the start of each prompt |
OpenCode plugin (auto-installed to ~/.config/opencode/plugins/icm.js):
| OpenCode event | ICM Layer | What it does |
|---|---|---|
tool.execute.after |
Layer 0 | Extract facts from tool output |
experimental.session.compacting |
Layer 1 | Extract from conversation before compaction |
session.created |
Layer 2 | Recall context at session start |
ICM can be used via CLI (icm commands) or MCP server (icm serve). Both access the same database.
| CLI | MCP | |
|---|---|---|
| Latency | ~30ms (direct binary) | ~50ms (JSON-RPC stdio) |
| Token cost | 0 (hook-based, invisible) | ~20-50 tokens/call (tool schema) |
| Setup | icm init --mode hook |
icm init --mode mcp |
| Works with | Claude Code, OpenCode (via hooks/plugins) | All 14 MCP-compatible tools |
| Auto-extraction | Yes (hooks trigger icm extract) |
Yes (MCP tools call store) |
| Best for | Power users, token savings | Universal compatibility |
icm dashboard # or: icm tuiInteractive TUI with 5 tabs: Overview, Topics, Memories, Health, Memoirs. Keyboard navigation (vim-style: j/k, g/G, Tab, 1-5), live search (/), auto-refresh.
Requires the tui feature (enabled by default). Build without: cargo install --path crates/icm-cli --no-default-features --features embeddings.
# Store
icm store -t "my-project" -c "Use PostgreSQL for the main DB" -i high -k "db,postgres"
# Recall
icm recall "database choice"
icm recall "auth setup" --topic "my-project" --limit 10
icm recall "architecture" --keyword "postgres"
# Manage
icm forget <memory-id>
icm consolidate --topic "my-project"
icm topics
icm stats
# Extract facts from text (rule-based, zero LLM cost)
echo "The parser uses Pratt algorithm" | icm extract -p my-project# Create a memoir
icm memoir create -n "system-architecture" -d "System design decisions"
# Add concepts with labels
icm memoir add-concept -m "system-architecture" -n "auth-service" \
-d "Handles JWT tokens and OAuth2 flows" -l "domain:auth,type:service"
# Link concepts
icm memoir link -m "system-architecture" --from "api-gateway" --to "auth-service" -r depends-on
# Search with label filter
icm memoir search -m "system-architecture" "authentication"
icm memoir search -m "system-architecture" "service" --label "domain:auth"
# Inspect neighborhood
icm memoir inspect -m "system-architecture" "auth-service" -D 2
# Export graph (formats: json, dot, ascii, ai)
icm memoir export -m "system-architecture" -f ascii # Box-drawing with confidence bars
icm memoir export -m "system-architecture" -f dot # Graphviz DOT (color = confidence level)
icm memoir export -m "system-architecture" -f ai # Markdown optimized for LLM context
icm memoir export -m "system-architecture" -f json # Structured JSON with all metadata
# Generate SVG visualization
icm memoir export -m "system-architecture" -f dot | dot -Tsvg > graph.svg| Tool | Description |
|---|---|
icm_memory_store |
Store with auto-dedup (>85% similarity → update instead of duplicate) |
icm_memory_recall |
Search by query, filter by topic and/or keyword |
icm_memory_update |
Edit a memory in-place (content, importance, keywords) |
icm_memory_forget |
Delete a memory by ID |
icm_memory_consolidate |
Merge all memories of a topic into one summary |
icm_memory_list_topics |
List all topics with counts |
icm_memory_stats |
Global memory statistics |
icm_memory_health |
Per-topic hygiene audit (staleness, consolidation needs) |
icm_memory_embed_all |
Backfill embeddings for vector search |
| Tool | Description |
|---|---|
icm_memoir_create |
Create a new memoir (knowledge container) |
icm_memoir_list |
List all memoirs |
icm_memoir_show |
Show memoir details and all concepts |
icm_memoir_add_concept |
Add a concept with labels |
icm_memoir_refine |
Update a concept's definition |
icm_memoir_search |
Full-text search, optionally filtered by label |
icm_memoir_search_all |
Search across all memoirs |
icm_memoir_link |
Create typed relation between concepts |
icm_memoir_inspect |
Inspect concept and graph neighborhood (BFS) |
icm_memoir_export |
Export graph (json, dot, ascii, ai) with confidence levels |
| Tool | Description |
|---|---|
icm_feedback_record |
Record a correction when an AI prediction was wrong |
icm_feedback_search |
Search past corrections to inform future predictions |
icm_feedback_stats |
Feedback statistics: total count, breakdown by topic, most applied |
part_of · depends_on · related_to · contradicts · refines · alternative_to · caused_by · instance_of · superseded_by
Episodic memory (Topics) captures decisions, errors, preferences. Each memory has a weight that decays over time based on importance:
| Importance | Decay | Prune | Behavior |
|---|---|---|---|
critical |
none | never | Never forgotten, never pruned |
high |
slow (0.5x rate) | never | Fades slowly, never auto-deleted |
medium |
normal | yes | Standard decay, pruned when weight < threshold |
low |
fast (2x rate) | yes | Quickly forgotten |
Decay is access-aware: frequently recalled memories decay slower (decay / (1 + access_count × 0.1)). Applied automatically on recall (if >24h since last decay).
Memory hygiene is built-in:
- Auto-dedup: storing content >85% similar to an existing memory in the same topic updates it instead of creating a duplicate
- Consolidation hints: when a topic exceeds 7 entries,
icm_memory_storewarns the caller to consolidate - Health audit:
icm_memory_healthreports per-topic entry count, average weight, stale entries, and consolidation needs - No silent data loss: critical and high-importance memories are never auto-pruned
Semantic memory (Memoirs) captures structured knowledge as a graph. Concepts are permanent — they get refined, never decayed. Use superseded_by to mark obsolete facts instead of deleting them.
With embeddings enabled, ICM uses hybrid search:
- FTS5 BM25 (30%) — full-text keyword matching
- Cosine similarity (70%) — semantic vector search via sqlite-vec
Default model: intfloat/multilingual-e5-base (768d, 100+ languages). Configurable in your config file:
[embeddings]
# enabled = false # Disable entirely (no model download)
model = "intfloat/multilingual-e5-base" # 768d, multilingual (default)
# model = "intfloat/multilingual-e5-small" # 384d, multilingual (lighter)
# model = "intfloat/multilingual-e5-large" # 1024d, multilingual (best accuracy)
# model = "Xenova/bge-small-en-v1.5" # 384d, English-only (fastest)
# model = "jinaai/jina-embeddings-v2-base-code" # 768d, code-optimizedTo skip the embedding model download entirely, use any of these:
icm --no-embeddings serve # CLI flag
ICM_NO_EMBEDDINGS=1 icm serve # Environment variableOr set enabled = false in your config file. ICM will fall back to FTS5 keyword search (still works, just no semantic matching).
Changing the model automatically re-creates the vector index (existing embeddings are cleared and can be regenerated with icm_memory_embed_all).
Single SQLite file. No external services, no network dependency.
~/Library/Application Support/dev.icm.icm/memories.db # macOS
~/.local/share/dev.icm.icm/memories.db # Linux
C:\Users\<user>\AppData\Local\icm\icm\data\memories.db # Windows
icm config # Show active configConfig file location (platform-specific, or $ICM_CONFIG):
~/Library/Application Support/dev.icm.icm/config.toml # macOS
~/.config/icm/config.toml # Linux
C:\Users\<user>\AppData\Roaming\icm\icm\config\config.toml # Windows
See config/default.toml for all options.
ICM extracts memories automatically via three layers:
Layer 0: Pattern hooks Layer 1: PreCompact Layer 2: UserPromptSubmit
(zero LLM cost) (zero LLM cost) (zero LLM cost)
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ PostToolUse hook │ │ PreCompact hook │ │ UserPromptSubmit │
│ │ │ │ │ │
│ • Bash errors │ │ Context about to │ │ User sends prompt │
│ • git commits │ │ be compressed → │ │ → icm recall │
│ • config changes │ │ extract memories │ │ → inject context │
│ • decisions │ │ from transcript │ │ │
│ • preferences │ │ before they're │ │ Agent starts with │
│ • learnings │ │ lost forever │ │ relevant memories │
│ • constraints │ │ │ │ already loaded │
│ │ │ Same patterns + │ │ │
│ Rule-based, no LLM│ │ --store-raw fallbk│ │ │
└──────────────────┘ └──────────────────┘ └──────────────────┘
| Layer | Status | LLM cost | Hook command | Description |
|---|---|---|---|---|
| Layer 0 | Implemented | 0 | icm hook post |
Rule-based keyword extraction from tool output |
| Layer 1 | Implemented | 0 | icm hook compact |
Extract from transcript before context compression |
| Layer 2 | Implemented | 0 | icm hook prompt |
Inject recalled memories on each user prompt |
All 3 layers are installed automatically by icm init --mode hook.
ICM Benchmark (1000 memories, 384d embeddings)
──────────────────────────────────────────────────────────
Store (no embeddings) 1000 ops 34.2 ms 34.2 µs/op
Store (with embeddings) 1000 ops 51.6 ms 51.6 µs/op
FTS5 search 100 ops 4.7 ms 46.6 µs/op
Vector search (KNN) 100 ops 59.0 ms 590.0 µs/op
Hybrid search 100 ops 95.1 ms 951.1 µs/op
Decay (batch) 1 ops 5.8 ms 5.8 ms/op
──────────────────────────────────────────────────────────
Apple M1 Pro, in-memory SQLite, single-threaded. icm bench --count 1000
Multi-session workflow with a real Rust project (12 files, ~550 lines). Sessions 2+ show the biggest gains as ICM recalls instead of re-reading files.
ICM Agent Benchmark (10 sessions, model: haiku, 3 runs averaged)
══════════════════════════════════════════════════════════════════
Without ICM With ICM Delta
Session 2 (recall)
Turns 5.7 4.0 -29%
Context (input) 99.9k 67.5k -32%
Cost $0.0298 $0.0249 -17%
Session 3 (recall)
Turns 3.3 2.0 -40%
Context (input) 74.7k 41.6k -44%
Cost $0.0249 $0.0194 -22%
══════════════════════════════════════════════════════════════════
icm bench-agent --sessions 10 --model haiku
Agent recalls specific facts from a dense technical document across sessions. Session 1 reads and memorizes; sessions 2+ answer 10 factual questions without the source text.
ICM Recall Benchmark (10 questions, model: haiku, 5 runs averaged)
══════════════════════════════════════════════════════════════════════
No ICM With ICM
──────────────────────────────────────────────────────────────────────
Average score 5% 68%
Questions passed 0/10 5/10
══════════════════════════════════════════════════════════════════════
icm bench-recall --model haiku
Same test with local models — pure context injection, no tool use needed.
Model Params No ICM With ICM Delta
─────────────────────────────────────────────────────────
qwen2.5:14b 14B 4% 97% +93%
mistral:7b 7B 4% 93% +89%
llama3.1:8b 8B 4% 93% +89%
qwen2.5:7b 7B 4% 90% +86%
phi4:14b 14B 6% 79% +73%
llama3.2:3b 3B 0% 76% +76%
gemma2:9b 9B 4% 76% +72%
qwen2.5:3b 3B 2% 58% +56%
─────────────────────────────────────────────────────────
scripts/bench-ollama.sh qwen2.5:14b
Standard academic benchmark — 500 questions across 6 memory abilities, from the LongMemEval paper (ICLR 2025).
LongMemEval Results — ICM (oracle variant, 500 questions)
════════════════════════════════════════════════════════════════
Category Retrieval Answer (Sonnet)
────────────────────────────────────────────────────────────────
single-session-user 100.0% 91.4%
temporal-reasoning 100.0% 85.0%
single-session-assistant 100.0% 83.9%
multi-session 100.0% 81.2%
knowledge-update 100.0% 80.8%
single-session-preference 100.0% 50.0%
────────────────────────────────────────────────────────────────
OVERALL 100.0% 82.0%
════════════════════════════════════════════════════════════════
- Retrieval = does ICM find the right information? 100% across all categories.
- Answer = can the LLM produce the correct answer from retrieved context? Depends on the LLM, not ICM.
- The retrieval score is the ICM benchmark. The answer score reflects the downstream LLM capability.
scripts/bench-longmemeval.py --judge claude --workers 8
All benchmarks use real API calls — no mocks, no simulated responses, no cached answers.
- Agent benchmark: Creates a real Rust project in a tempdir. Runs N sessions with
claude -p --output-format json. Without ICM: empty MCP config. With ICM: real MCP server + auto-extraction + context injection. - Knowledge retention: Uses a fictional technical document (the "Meridian Protocol"). Scores answers by keyword matching against expected facts. 120s timeout per invocation.
- Isolation: Each run uses its own tempdir and fresh SQLite DB. No session persistence.
| Document | Description |
|---|---|
| Technical Architecture | Crate structure, search pipeline, decay model, sqlite-vec integration, testing |
| User Guide | Installation, topic organization, consolidation, extraction, troubleshooting |
| Product Overview | Use cases, benchmarks, comparison with alternatives |
Source-Available — Free for individuals and teams ≤ 20 people. Enterprise license required for larger organizations. Contact: contact@rtk-ai.app
