Your AI forgets everything when the conversation ends. soul.py fixes that.
📖 NEW: The book is out! Soul: Building AI Agents That Remember Who They Are — everything here + deep dives on identity, memory patterns, multi-agent coordination, and the philosophy of persistent AI. Get it on Amazon →
📄 Research paper: Persistent Identity in AI Agents: A Multi-Anchor Architecture for Resilient Memory and Continuity — arXiv:2604.09588 [cs.AI]. Formalizes the identity anchor concept, RAG+RLM hybrid retrieval, and the multi-anchor resilience roadmap. 18 pages.
from hybrid_agent import HybridAgent
agent = HybridAgent()
agent.ask("My name is Prahlad and I'm building an AI research lab.")
# New process. New session. Memory persists.
agent = HybridAgent()
result = agent.ask("What do you know about me?")
print(result["answer"])
# → "You're Prahlad, building an AI research lab."No database. No server. Just markdown files and smart retrieval.
| Version | Demo | What it shows |
|---|---|---|
| v0.1 | soul.themenonlab.com | Memory persists across sessions |
| v1.0 | soulv1.themenonlab.com | Semantic RAG retrieval |
| v2.0 | soulv2.themenonlab.com | Auto query routing: RAG + RLM |
| v0.2.0 | — | Modulizer: 50% token savings, zero-deps |
| Ask Darwin | soul-book.themenonlab.com | 📖 Book companion — watch routing decisions live |
Soul: Building AI Agents That Remember Who They Are
The complete guide to persistent AI memory. Covers:
- Why agents forget (and the architectural fix)
- Identity vs Memory (SOUL.md vs MEMORY.md)
- RAG vs RLM (when to use each)
- Multi-agent memory sharing
- Darwinian evolution of agent identity
- Working code in every chapter
| Topic | Link |
|---|---|
| Getting Started | Persistent Memory for LLM Agents |
| v2.0 Architecture | RAG + RLM Hybrid — How It Works |
| Comparison | soul.py vs mem0 vs Zep vs Letta |
| Token Efficiency | v0.2.0 Modulizer — Token Savings |
| Agent Identity | Darwin: Evolution, Identity, and AI Agents |
| LangChain / LlamaIndex | soul.py Integrations Guide |
| Enterprise | Is soul.py Enterprise-Ready? |
Evaluated on LoCoMo (Snap Research) — 1,986 questions across 10 long conversations testing single-hop recall, multi-hop reasoning, open-domain knowledge, and temporal understanding.
| Config | Overall | Single-hop | Multi-hop | Open-domain | Temporal |
|---|---|---|---|---|---|
| RLM | 70.0% | 54.1% | 82.1% | 55.1% | 40.0% |
| Hybrid | 65.6% | 46.0% | 79.5% | 56.0% | 29.8% |
| Auto | 64.1% | 42.6% | 78.5% | 58.8% | 26.7% |
| Qdrant (RAG) | 63.4% | 36.5% | 78.7% | 59.4% | 27.0% |
| BM25 | 63.1% | 38.4% | 77.8% | 50.8% | 29.3% |
RLM outperforms all baselines by 4–7 points, with the largest gains on temporal reasoning (+10pts) and direct recall (+8pts). Full methodology and per-category breakdowns at menonpg.github.io/soul-benchmarks.
pip install soul-agent
pip install soul-agent[anthropic]
pip install soul-agent[openai]
pip install soul-agent[gemini] # ✅ Now available!Large MEMORY.md files burn tokens. Modulizer splits them into indexed modules and retrieves only what's relevant.
# Split your memory into modules
soul modulize MEMORY.md
# Creates:
# modules/INDEX.md (1.7KB)
# modules/projects.md
# modules/tools.md
# ...Two-phase retrieval:
- Read INDEX.md (always small)
- LLM picks relevant modules
- Load only those modules
Results: 47% fewer tokens on 25KB MEMORY.md. Zero infrastructure — no vector DB, no embeddings.
from soul import Agent
agent = Agent(use_modules=True) # default when modules exist
response = agent.ask("What tools have I used?")
# Check what was loaded
stats = agent.get_memory_stats()
# {'mode': 'modules', 'modules_read': ['tools.md'], 'total_kb': 5.5}CLI commands:
soul modulize <file>— split into modulessoul modules list— view modulessoul chat --no-modules— disable (opt-out)
soul init # creates SOUL.md and MEMORY.md# v0.1 — simple markdown memory (great starting point)
from soul import Agent
agent = Agent(provider="anthropic")
agent.ask("Remember this.")
# v2.0 — automatic RAG + RLM routing (this repo's default)
from hybrid_agent import HybridAgent
agent = HybridAgent() # auto-detects best retrieval per query
result = agent.ask("What do you know about me?")
print(result["answer"])
print(result["route"]) # "RAG" or "RLM"soul.py works with any LLM provider — no SDK lock-in:
# Anthropic (default)
agent = HybridAgent(provider="anthropic") # Uses ANTHROPIC_API_KEY
# Google Gemini
agent = HybridAgent(
provider="gemini",
chat_model="gemini-2.5-pro", # or gemini-2.0-flash, gemini-2.5-flash
router_model="gemini-2.0-flash", # keep router cheap
) # Uses GEMINI_API_KEY
# OpenAI
agent = HybridAgent(provider="openai") # Uses OPENAI_API_KEY
# Local via Ollama
agent = HybridAgent(
provider="openai-compatible",
base_url="http://localhost:11434/v1",
chat_model="llama3.2",
)| Provider | Default Model | Env Var |
|---|---|---|
anthropic |
claude-haiku-4-5 | ANTHROPIC_API_KEY |
gemini |
gemini-2.0-flash | GEMINI_API_KEY |
openai |
gpt-4o-mini | OPENAI_API_KEY |
openai-compatible |
llama3.2 | OPENAI_API_KEY (optional) |
Don't want to manage local files? SoulMate API gives you persistent memory as a service:
from soulmate import SoulMateClient
# Sign up at soulmate-api.themenonlab.com/docs
client = SoulMateClient(
api_key="sm_live_...",
anthropic_key="sk-ant-..." # BYOK — your own Anthropic key
)
# That's it. Memory persists in the cloud.
response = client.ask("My name is Prahlad.")
response = client.ask("What's my name?") # → "Prahlad"| Local (soul.py) | Cloud (SoulMate API) |
|---|---|
| Files on your machine | Managed cloud storage |
| You control everything | Zero infrastructure |
| Git-versioned memory | API-based, instant setup |
| Free forever | Free tier available |
Get started: soulmate-api.themenonlab.com/docs
soul.py uses two markdown files as persistent state:
| File | Purpose |
|---|---|
SOUL.md |
Identity — who the agent is, how it behaves |
MEMORY.md |
Memory — timestamped log of every exchange |
v2.0 adds a query router that automatically dispatches to the right retrieval strategy:
Your query
↓
Router (fast LLM call)
├── FOCUSED (~90%) → RAG — vector search, sub-second
└── EXHAUSTIVE (~10%) → RLM — recursive synthesis, thorough
Architecture based on: RAG + RLM: The Complete Knowledge Base Architecture
| Branch | Description | Best for |
|---|---|---|
main |
v2.0 — RAG + RLM hybrid (default) | Production use |
v2.0-rag-rlm |
Same as main, versioned | Pinning to v2 |
v1.0-rag |
RAG only, no RLM | Simpler setup |
v0.1-stable |
Pure markdown, zero deps | Learning / prototyping |
result = agent.ask("What is my name?")
result["answer"] # the response
result["route"] # "RAG" or "RLM"
result["router_ms"] # router latency
result["retrieval_ms"] # retrieval latency
result["total_ms"] # total latency
result["rag_context"] # retrieved chunks (RAG path)
result["rlm_meta"] # chunk stats (RLM path)agent = HybridAgent(
soul_path="SOUL.md",
memory_path="MEMORY.md",
mode="auto", # "auto" | "rag" | "rlm"
qdrant_url="...", # or set QDRANT_URL env var
qdrant_api_key="...", # or QDRANT_API_KEY
azure_embedding_endpoint="...", # or AZURE_EMBEDDING_ENDPOINT
azure_embedding_key="...", # or AZURE_EMBEDDING_KEY
k=5, # RAG retrieval count
)Falls back to BM25 (keyword) if Qdrant/Azure not configured.
soul.py isn't just for personal memory — the same architecture works for custom knowledge bases. Combine both in a single agent:
agent = HybridAgent(
soul_path="SOUL.md",
memory_path="MEMORY.md", # Per-user memory
knowledge_dir="./knowledge", # Your corpus (docs, products, policies)
)
# Index your knowledge base once
agent.index_knowledge()
# Now the agent searches both pools
agent.ask("What's the return policy?") # → Knowledge base
agent.ask("What was I asking about earlier?") # → User memory
agent.ask("Which product fits my needs?") # → BothExample use cases:
| Agent Type | Knowledge Base | Memory |
|---|---|---|
| Support Bot | Product docs, policies, FAQs | Customer history, preferences |
| Research Assistant | Paper corpus, methodologies | User's focus, papers read |
| Onboarding Buddy | Company handbook, org chart | New hire's role, questions |
| Book Companion | Full book content | Reader's interests, progress |
Darwin (the AI companion for the Soul book) uses exactly this pattern — the entire book indexed as knowledge, plus per-reader conversation memory.
See the Memory Architecture Patterns guide for detailed implementation patterns.
Already using a framework? Drop in soul.py memory with one line:
| Framework | Package | Install |
|---|---|---|
| LangChain | langchain-soul | pip install langchain-soul |
| LlamaIndex | llamaindex-soul | pip install llamaindex-soul |
| CrewAI | crewai-soul | pip install crewai-soul |
# LangChain
from langchain_soul import SoulChatMessageHistory
history = SoulChatMessageHistory(session_id="user-123")
# LlamaIndex
from llamaindex_soul import SoulChatStore
chat_store = SoulChatStore()
# CrewAI
from crewai_soul import SoulMemory
memory = SoulMemory()Each integration includes:
- soul-agent — RAG + RLM hybrid retrieval
- soul-schema — Database semantic layer (auto-document your tables)
- SoulMate client — Managed cloud option
Tested on the LoCoMo long-conversation memory benchmark (1,986 questions, scored by Gemini 2.0 Flash):
| System | Overall | Multi-Hop | Notes |
|---|---|---|---|
| XMem | 91.5% | 92.3% | Uses Gemini 3-flash |
| Memobase | 75.8% | 46.9% | |
| Zep | 75.1% | 66.0% | |
| soul.py (RLM) | 70.0% | 82.1% | Gemini 2.0 Flash |
| Mem0g (YC 24) | 68.4% | 47.2% | |
| Mem0 (YC 24) | 66.9% | 51.2% | |
| LangMem | 58.1% | 47.9% | |
| OpenAI | 52.9% | 42.9% |
soul.py RLM beats Mem0 and LangMem on overall score and achieves the highest multi-hop reasoning score (82.1%) of any system tested. It trails XMem, Memobase, and Zep on overall — though XMem uses a significantly more capable model.
Full results & data → · Interactive dashboard →
Those are orchestration frameworks. soul.py is a primitive — persistent identity and memory you can drop into anything you're building.
- No framework lock-in — works with any LLM provider, or with your favorite framework via integrations above
- Human-readable — SOUL.md and MEMORY.md are plain text
- Version-controllable — git diff your agent's memories
- Composable — use just the parts you need
See ROADMAP.md for planned features and how to contribute.
MIT
@software{menon2026soul,
author = {Menon, Prahlad G.},
title = {soul.py: Persistent Identity and Memory for LLM Agents},
year = {2026},
url = {https://github.com/menonpg/soul.py}
}