archon-memory-core

Every memory system degrades. Ours endures.

Agents remember what users tell them — until a user changes their mind. "My dog's name is Max" today, "actually it's Milo" tomorrow. Both sit in memory. At query time, the retriever returns whichever scores higher, and the agent confidently contradicts itself. The Agentic Memory Benchmark v2.3 injects this class of contradictory fact daily across a simulated 90-day horizon and measures which systems resolve the contradiction — and which drown in it.

At 250-query scale, 3-seed mean, over 90 simulated days:

archon-memory-core (with consolidation): 99.2% top-1 — flat from day 7 to day 90

Same retriever, no consolidation: 49.2% — collapses after 13 days of accumulated contradictions

LangChain 32k-token buffer (top-1): 0.0% — answer exists in context, but top-ranked chunk is noise

Naive word-overlap: 0.0% — never resolves a single contradiction

Full results, seeds, and reproducible harness: benchmark/amb_v2/results/v2.3/large/STATUS.md. Preregistered: benchmark/amb_v2/PREREGISTERED.md.

Supersede-aware consolidation. When a new fact contradicts an old one, the old fact is archived with a link to what replaced it — not left to compete at retrieval time.
Ranked top-1 retrieval. The chunk the LLM actually attends to is earned, not handed out by recency.
Credentials never decay. Type-aware salience keeps high-value facts retrievable after any volume of noise.
Replay any recall. Trace every retrieval event back to its source chunks — answer "why did it remember that?"
Local-first. Runs entirely on ChromaDB + Ollama. Your memory never leaves your machine unless you opt in.

Apache 2.0. pip install archon-memory-core. Python ≥ 3.10.

Alternatives you might be evaluating

Looking at this space? Here's how archon-memory-core compares to the other options:

vs Letta (formerly MemGPT) — Letta is a full agent runtime; this is a library.
vs Mem0 — both are Apache 2.0 libraries; we resolve contradictions at retrieval, Mem0 reconciles at write.
vs Zep — Zep is a temporal knowledge graph server; this is in-process.
vs LangGraph Store — LangGraph Store is a KV primitive; this is an opinionated memory policy.

Why this exists

Every production agent hits the same wall. The naive approach — dump everything into a vector store, retrieve by cosine — works on day one. By month three you're drowning in stale context, duplicated noise, and contradictory facts that silently return the wrong answer.

LangChain's buffer expires by design. Mem0 stores contradictions without resolving them. MemGPT's consolidation only runs on GPT-4. archon-memory-core is the answer to the question none of them asked: what if memory got better the longer you used it?

See Why not just use a bigger context window? for the cost/quality math against the most common alternative.

Quickstart

pip install archon-memory-core

from archon_memory_core import MemoryStore

store = MemoryStore()
store.add("The production API key lives in the keychain", type="credential")
store.add("Project uses Python 3.12 with uv for lockfile management", type="technical")

results = store.search("where is the API key?")
print(results[0].text)
# "The production API key lives in the keychain"

Async-first (recommended for agents)

from archon_memory_core import AsyncMemoryStore

store = AsyncMemoryStore()
await store.add("User prefers terse responses", type="personal")
results = await store.search("user communication preferences")

With LangChain

from langchain.agents import AgentExecutor
from archon_memory_core.integrations.langchain import AgentMemoryStore

memory = AgentMemoryStore()
agent = AgentExecutor(..., memory=memory)

With LlamaIndex

from llama_index.core.agent import ReActAgent
from archon_memory_core.integrations.llamaindex import AgentMemoryStore

memory = AgentMemoryStore()
agent = ReActAgent.from_tools(..., memory=memory)

See docs/INTEGRATIONS.md for the full adapter reference.

The Benchmark — AMB v2.3

The Agentic Memory Benchmark v2.3 is the longitudinal, preregistered test that separates systems that remember from systems that merely store. Contradictory facts are injected every day for 90 simulated days; the primary metric is top-1 accuracy — whether the chunk the LLM actually attends to contains the answer.

v2.3 large-scale results (250 queries × 2,300 confusers · 3-seed mean)

System	Mode	Day 7	Day 14	Day 30	Day 60	Day 90
archon-memory-core	tuned (with consolidation)	99.3%	99.2%	99.2%	99.2%	99.2%
archon-memory-core	stock (retrieval only)	70.2%	49.2%	49.2%	49.2%	49.2%
LangChain 32k buffer	any-in-context	100%	100%	100%	100%	56%
LangChain 32k buffer	top-1	0.0%	0.0%	0.0%	0.0%	0.0%
Naive word-overlap	—	5.0%	0.0%	0.0%	0.0%	0.8%

Standard deviation ≤ 0.01 on every cell. Seeds: 42, 43, 44. Full per-seed breakdown and raw JSON: benchmark/amb_v2/results/v2.3/large/STATUS.md.

The LangChain split is the v2.3 thesis. The answer exists in the 32k-token buffer right up to day 60 — but the top-ranked chunk (what the LLM attends to) is the most recent addition, which is usually a confuser. Context length ≠ memory without ranking.

AMB is an open leaderboard. Mem0, MemGPT, Letta, pgvector pipelines, custom builds — submit against the preregistered harness. See benchmark/LEADERBOARD.md. Mem0 adapter is on the roadmap; any framework can be contributed as a PR.

Reproduce

git clone https://github.com/atw4757-byte/archon-memory-core
cd archon-memory-core/benchmark/amb_v2
make bench                    # runs the full v2.3 grid (~20 min on M-series)

Seeds, scenarios, confusers, and adapter code are all in the repo. Preregistered protocol: PREREGISTERED.md.

See Why not just use a bigger context window? for the cost/quality math.

Architecture

store.add(text, type, source, agent)
  ├── ChromaDB upsert (always)
  └── Hindsight retain (optional, graceful fallback)

store.search(query, n, type, since, agent)
  ├── 1. Cosine retrieval (4x candidate pool)
  ├── 2. Salience + recency scoring (adaptive per query type)
  ├── 3. Cross-encoder re-ranking (ms-marco-MiniLM, optional)
  ├── 4. MMR diversity selection (λ=0.7)
  ├── 5. Atomic fact augmentation
  └── 6. Dynamic tail pruning

WorkingMemory (4-7 slots, Miller's Law)
  └── flush() → long-term store

Nightly Consolidation (local Mistral/Qwen via Ollama)
  ├── Cluster by source + type + entity co-occurrence
  ├── Compress clusters into semantic facts
  ├── Resolve contradictions toward newer truth
  └── Archive originals (soft delete, never hard delete)

MemoryGraph      — entity extraction + 2-hop expansion
ForgettingPolicy — salience decay + stale detection + health scoring

How it compares

Feature	archon-memory-core	LangChain	Naive Vector	Mem0	MemGPT
Nightly consolidation	Local LLM	—	—	Partial	GPT-4 only
Active forgetting	Yes	—	—	—	—
Contradiction resolution	Yes, logged	—	—	Partial	Partial
Salience scoring	Type + access + graph	—	—	Partial	—
Entity graph	Yes	—	—	—	—
Agent namespacing	Yes	—	—	—	—
Replay / observability	Yes	—	—	—	—
Eval harness included	AMB (200 queries)	—	—	—	—
Self-maintenance cron	Yes	—	—	—	—
Runs fully local	Ollama + ChromaDB	Partial	Yes	—	—
License	Apache 2.0	MIT	—	MIT	Apache 2.0

Own a system on this list and disagree? Submit a correction.

Advanced usage

Working Memory

from archon_memory_core import WorkingMemory, MemoryStore

store = MemoryStore()
wm = WorkingMemory(max_slots=7)
wm.add("User prefers terse responses")
wm.flush(store)  # end-of-session persistence

Consolidation (requires Ollama)

from archon_memory_core import Consolidator

consolidator = Consolidator(store, min_cluster=3)
report = consolidator.run(dry_run=True)
print(f"Would consolidate {report['clusters_viable']} clusters")

report = consolidator.run()
print(f"Archived {report['archived']} chunks into {report['consolidated']} facts")

Eval Against Your Data

from archon_memory_core import MemoryEval

ev = MemoryEval(store)
ev.add_query("Where is the API key?", expected_facts=["keychain"], type="credential")

report = ev.run(n=5, version="my-config")
print(f"Score: {report['composite']}/10")

Agent Namespacing

store.add("Project uses Python 3.12", type="technical")                # shared
store.add("Internal scratchpad", type="session", agent="researcher")   # agent-private

results = store.search("Python version", agent="researcher")  # sees shared + researcher

Valid Chunk Types

VALID_TYPES = {
    "fact", "personal", "professional", "credential", "financial",
    "goal", "project_status", "technical", "session", "task",
    "observation", "dream", "lesson",
}

credential and lesson never decay. session decays aggressively after 30 days.

Installation

pip install archon-memory-core                     # core
pip install "archon-memory-core[reranker]"         # + cross-encoder
pip install "archon-memory-core[graph]"            # + entity graph
pip install "archon-memory-core[langchain]"        # + LangChain adapter
pip install "archon-memory-core[llamaindex]"       # + LlamaIndex adapter
pip install "archon-memory-core[all]"              # everything

Requirements: Python ≥ 3.10, chromadb ≥ 0.5.0. Optional: Ollama with mistral:latest or qwen2.5:7b for consolidation.

Operations

Scheduling nightly consolidation

Consolidation is designed to run as a nightly job (cron, launchd, systemd timer). Two foot-guns to avoid:

1. Python version in the scheduler's PATH. On macOS, cron runs with a minimal PATH that resolves python3 to Apple's system /usr/bin/python3 (Python 3.9), which does not support the PEP 604 X | None syntax used in this package. The job will crash on import and fail silently. Fix by pinning the scheduler's PATH to your 3.10+ interpreter first:

# macOS crontab — put Homebrew Python ahead of Apple's system Python
PATH=/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin
30 3 * * * python -m archon_memory_core.consolidate >> ~/.local/log/amc-consolidate.log 2>&1

Or reference the interpreter explicitly:

30 3 * * * /opt/homebrew/bin/python3 -m archon_memory_core.consolidate >> ~/.log 2>&1

On Linux with a venv, the cleanest form is /path/to/venv/bin/python -m archon_memory_core.consolidate.

2. Monitor for silent failure. Because schedulers swallow stderr by default, consolidation can crash for weeks without notice — and memory quality degrades in the meantime (retrieval precision falls as near-duplicate chunks accumulate). Tail your log for Traceback and verify a successful completion record appears every ~24 hours. A one-line check:

grep -q '"dry_run": false' <(tail -n 500 ~/.log) && \
  find ~/.log -mmin -$((28*60)) | grep -q . && echo OK || echo STALE

Wire that into whatever monitoring you already have (Healthchecks.io, PagerDuty, a Slack webhook — this is a low-cardinality check, no heavy infra needed).

Roadmap

Q2 2026 — shipped: AMB v2.3 longitudinal benchmark (90-day simulated decay, daily contradiction injection, preregistered grid). Public leaderboard live.
Q2 2026 — in flight: Mem0 + MemGPT + Letta adapters submitted to the leaderboard. Hosted dashboard at divergencerouter.com/amc/.
Q3 2026: Pro tier (memory health dashboard, eval runs, replay debugger). See ROADMAP.md and PRICING.md.
Q4 2026: Multilingual benchmark suite. Enterprise private-VPC deploy.

Pricing

Free forever. The OSS library is complete and will remain so.

Paid tiers for observability, evals, team features, and hosted services are on the roadmap — see PRICING.md for the tier structure and ENTERPRISE.md for private-deploy details.

License

Apache 2.0. See LICENSE.

Contributing

See CONTRIBUTING.md. Benchmarks, adapters, and bug reports especially welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
benchmark		benchmark
docs		docs
examples		examples
scripts		scripts
specs		specs
src/archon_memory_core		src/archon_memory_core
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
ENTERPRISE.md		ENTERPRISE.md
LICENSE		LICENSE
PRICING.md		PRICING.md
README.md		README.md
REPRODUCE.md		REPRODUCE.md
ROADMAP.md		ROADMAP.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

archon-memory-core

Alternatives you might be evaluating

Why this exists

Quickstart

Async-first (recommended for agents)

With LangChain

With LlamaIndex

The Benchmark — AMB v2.3

v2.3 large-scale results (250 queries × 2,300 confusers · 3-seed mean)

Reproduce

Architecture

How it compares

Advanced usage

Working Memory

Consolidation (requires Ollama)

Eval Against Your Data

Agent Namespacing

Valid Chunk Types

Installation

Operations

Scheduling nightly consolidation

Roadmap

Pricing

License

Contributing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

archon-memory-core

Alternatives you might be evaluating

Why this exists

Quickstart

Async-first (recommended for agents)

With LangChain

With LlamaIndex

The Benchmark — AMB v2.3

v2.3 large-scale results (250 queries × 2,300 confusers · 3-seed mean)

Reproduce

Architecture

How it compares

Advanced usage

Working Memory

Consolidation (requires Ollama)

Eval Against Your Data

Agent Namespacing

Valid Chunk Types

Installation

Operations

Scheduling nightly consolidation

Roadmap

Pricing

License

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages