soul.py 🧠

Your AI forgets everything when the conversation ends. soul.py fixes that.

📖 NEW: The book is out! Soul: Building AI Agents That Remember Who They Are — everything here + deep dives on identity, memory patterns, multi-agent coordination, and the philosophy of persistent AI. Get it on Amazon →

📄 Research paper: Persistent Identity in AI Agents: A Multi-Anchor Architecture for Resilient Memory and Continuity — arXiv:2604.09588 [cs.AI]. Formalizes the identity anchor concept, RAG+RLM hybrid retrieval, and the multi-anchor resilience roadmap. 18 pages.

from hybrid_agent import HybridAgent

agent = HybridAgent()
agent.ask("My name is Prahlad and I'm building an AI research lab.")

# New process. New session. Memory persists.
agent = HybridAgent()
result = agent.ask("What do you know about me?")
print(result["answer"])
# → "You're Prahlad, building an AI research lab."

No database. No server. Just markdown files and smart retrieval.

▶ Live Demos

Version	Demo	What it shows
v0.1	soul.themenonlab.com	Memory persists across sessions
v1.0	soulv1.themenonlab.com	Semantic RAG retrieval
v2.0	soulv2.themenonlab.com	Auto query routing: RAG + RLM
v0.2.0	—	Modulizer: 50% token savings, zero-deps
Ask Darwin	soul-book.themenonlab.com	📖 Book companion — watch routing decisions live

📚 The Book

Soul: Building AI Agents That Remember Who They Are

The complete guide to persistent AI memory. Covers:

Why agents forget (and the architectural fix)
Identity vs Memory (SOUL.md vs MEMORY.md)
RAG vs RLM (when to use each)
Multi-agent memory sharing
Darwinian evolution of agent identity
Working code in every chapter

→ Available on Amazon

📚 Documentation & Blog

Topic	Link
Getting Started	Persistent Memory for LLM Agents
v2.0 Architecture	RAG + RLM Hybrid — How It Works
Comparison	soul.py vs mem0 vs Zep vs Letta
Token Efficiency	v0.2.0 Modulizer — Token Savings
Agent Identity	Darwin: Evolution, Identity, and AI Agents
LangChain / LlamaIndex	soul.py Integrations Guide
Enterprise	Is soul.py Enterprise-Ready?

📊 Benchmarks

LoCoMo Benchmark Results →

Evaluated on LoCoMo (Snap Research) — 1,986 questions across 10 long conversations testing single-hop recall, multi-hop reasoning, open-domain knowledge, and temporal understanding.

Config	Overall	Single-hop	Multi-hop	Open-domain	Temporal
RLM	70.0%	54.1%	82.1%	55.1%	40.0%
Hybrid	65.6%	46.0%	79.5%	56.0%	29.8%
Auto	64.1%	42.6%	78.5%	58.8%	26.7%
Qdrant (RAG)	63.4%	36.5%	78.7%	59.4%	27.0%
BM25	63.1%	38.4%	77.8%	50.8%	29.3%

RLM outperforms all baselines by 4–7 points, with the largest gains on temporal reasoning (+10pts) and direct recall (+8pts). Full methodology and per-category breakdowns at menonpg.github.io/soul-benchmarks.

Install

pip install soul-agent
pip install soul-agent[anthropic]
pip install soul-agent[openai]
pip install soul-agent[gemini]   # ✅ Now available!

🆕 v0.2.0 — Modulizer (50% Token Savings)

Large MEMORY.md files burn tokens. Modulizer splits them into indexed modules and retrieves only what's relevant.

# Split your memory into modules
soul modulize MEMORY.md

# Creates:
# modules/INDEX.md (1.7KB)
# modules/projects.md
# modules/tools.md
# ...

Two-phase retrieval:

Read INDEX.md (always small)
LLM picks relevant modules
Load only those modules

Results: 47% fewer tokens on 25KB MEMORY.md. Zero infrastructure — no vector DB, no embeddings.

from soul import Agent

agent = Agent(use_modules=True)  # default when modules exist
response = agent.ask("What tools have I used?")

# Check what was loaded
stats = agent.get_memory_stats()
# {'mode': 'modules', 'modules_read': ['tools.md'], 'total_kb': 5.5}

CLI commands:

soul modulize <file> — split into modules
soul modules list — view modules
soul chat --no-modules — disable (opt-out)

Full writeup →

Quickstart

soul init   # creates SOUL.md and MEMORY.md

# v0.1 — simple markdown memory (great starting point)
from soul import Agent
agent = Agent(provider="anthropic")
agent.ask("Remember this.")

# v2.0 — automatic RAG + RLM routing (this repo's default)
from hybrid_agent import HybridAgent
agent = HybridAgent()  # auto-detects best retrieval per query
result = agent.ask("What do you know about me?")
print(result["answer"])
print(result["route"])   # "RAG" or "RLM"

Multi-Provider Support

soul.py works with any LLM provider — no SDK lock-in:

# Anthropic (default)
agent = HybridAgent(provider="anthropic")  # Uses ANTHROPIC_API_KEY

# Google Gemini
agent = HybridAgent(
    provider="gemini",
    chat_model="gemini-2.5-pro",       # or gemini-2.0-flash, gemini-2.5-flash
    router_model="gemini-2.0-flash",   # keep router cheap
)  # Uses GEMINI_API_KEY

# OpenAI
agent = HybridAgent(provider="openai")  # Uses OPENAI_API_KEY

# Local via Ollama
agent = HybridAgent(
    provider="openai-compatible",
    base_url="http://localhost:11434/v1",
    chat_model="llama3.2",
)

Provider	Default Model	Env Var
`anthropic`	claude-haiku-4-5	`ANTHROPIC_API_KEY`
`gemini`	gemini-2.0-flash	`GEMINI_API_KEY`
`openai`	gpt-4o-mini	`OPENAI_API_KEY`
`openai-compatible`	llama3.2	`OPENAI_API_KEY` (optional)

☁️ SoulMate API — Managed Cloud Option

Don't want to manage local files? SoulMate API gives you persistent memory as a service:

from soulmate import SoulMateClient

# Sign up at soulmate-api.themenonlab.com/docs
client = SoulMateClient(
    api_key="sm_live_...",
    anthropic_key="sk-ant-..."  # BYOK — your own Anthropic key
)

# That's it. Memory persists in the cloud.
response = client.ask("My name is Prahlad.")
response = client.ask("What's my name?")  # → "Prahlad"

Local (soul.py)	Cloud (SoulMate API)
Files on your machine	Managed cloud storage
You control everything	Zero infrastructure
Git-versioned memory	API-based, instant setup
Free forever	Free tier available

Get started: soulmate-api.themenonlab.com/docs

How it works

soul.py uses two markdown files as persistent state:

File	Purpose
`SOUL.md`	Identity — who the agent is, how it behaves
`MEMORY.md`	Memory — timestamped log of every exchange

v2.0 adds a query router that automatically dispatches to the right retrieval strategy:

Your query
    ↓
Router (fast LLM call)
    ├── FOCUSED  (~90%) → RAG — vector search, sub-second
    └── EXHAUSTIVE (~10%) → RLM — recursive synthesis, thorough

Architecture based on: RAG + RLM: The Complete Knowledge Base Architecture

Branches

Branch	Description	Best for
`main`	v2.0 — RAG + RLM hybrid (default)	Production use
`v2.0-rag-rlm`	Same as main, versioned	Pinning to v2
`v1.0-rag`	RAG only, no RLM	Simpler setup
`v0.1-stable`	Pure markdown, zero deps	Learning / prototyping

v2.0 API

result = agent.ask("What is my name?")

result["answer"]        # the response
result["route"]         # "RAG" or "RLM"
result["router_ms"]     # router latency
result["retrieval_ms"]  # retrieval latency
result["total_ms"]      # total latency
result["rag_context"]   # retrieved chunks (RAG path)
result["rlm_meta"]      # chunk stats (RLM path)

v2.0 Setup

agent = HybridAgent(
    soul_path="SOUL.md",
    memory_path="MEMORY.md",
    mode="auto",                    # "auto" | "rag" | "rlm"
    qdrant_url="...",               # or set QDRANT_URL env var
    qdrant_api_key="...",           # or QDRANT_API_KEY
    azure_embedding_endpoint="...", # or AZURE_EMBEDDING_ENDPOINT
    azure_embedding_key="...",      # or AZURE_EMBEDDING_KEY
    k=5,                            # RAG retrieval count
)

Falls back to BM25 (keyword) if Qdrant/Azure not configured.

📚 Knowledge Bases + Memory

soul.py isn't just for personal memory — the same architecture works for custom knowledge bases. Combine both in a single agent:

agent = HybridAgent(
    soul_path="SOUL.md",
    memory_path="MEMORY.md",        # Per-user memory
    knowledge_dir="./knowledge",     # Your corpus (docs, products, policies)
)

# Index your knowledge base once
agent.index_knowledge()

# Now the agent searches both pools
agent.ask("What's the return policy?")         # → Knowledge base
agent.ask("What was I asking about earlier?")  # → User memory
agent.ask("Which product fits my needs?")      # → Both

Example use cases:

Agent Type	Knowledge Base	Memory
Support Bot	Product docs, policies, FAQs	Customer history, preferences
Research Assistant	Paper corpus, methodologies	User's focus, papers read
Onboarding Buddy	Company handbook, org chart	New hire's role, questions
Book Companion	Full book content	Reader's interests, progress

Darwin (the AI companion for the Soul book) uses exactly this pattern — the entire book indexed as knowledge, plus per-reader conversation memory.

See the Memory Architecture Patterns guide for detailed implementation patterns.

🔌 Framework Integrations

Already using a framework? Drop in soul.py memory with one line:

Framework	Package	Install
LangChain	langchain-soul	`pip install langchain-soul`
LlamaIndex	llamaindex-soul	`pip install llamaindex-soul`
CrewAI	crewai-soul	`pip install crewai-soul`

# LangChain
from langchain_soul import SoulChatMessageHistory
history = SoulChatMessageHistory(session_id="user-123")

# LlamaIndex
from llamaindex_soul import SoulChatStore
chat_store = SoulChatStore()

# CrewAI
from crewai_soul import SoulMemory
memory = SoulMemory()

Each integration includes:

soul-agent — RAG + RLM hybrid retrieval
soul-schema — Database semantic layer (auto-document your tables)
SoulMate client — Managed cloud option

📊 Benchmarks

Tested on the LoCoMo long-conversation memory benchmark (1,986 questions, scored by Gemini 2.0 Flash):

System	Overall	Multi-Hop	Notes
XMem	91.5%	92.3%	Uses Gemini 3-flash
Memobase	75.8%	46.9%
Zep	75.1%	66.0%
soul.py (RLM)	70.0%	82.1%	Gemini 2.0 Flash
Mem0g (YC 24)	68.4%	47.2%
Mem0 (YC 24)	66.9%	51.2%
LangMem	58.1%	47.9%
OpenAI	52.9%	42.9%

soul.py RLM beats Mem0 and LangMem on overall score and achieves the highest multi-hop reasoning score (82.1%) of any system tested. It trails XMem, Memobase, and Zep on overall — though XMem uses a significantly more capable model.

Full results & data → · Interactive dashboard →

Why not LangChain / LlamaIndex / MemGPT?

Those are orchestration frameworks. soul.py is a primitive — persistent identity and memory you can drop into anything you're building.

No framework lock-in — works with any LLM provider, or with your favorite framework via integrations above
Human-readable — SOUL.md and MEMORY.md are plain text
Version-controllable — git diff your agent's memories
Composable — use just the parts you need

Roadmap

See ROADMAP.md for planned features and how to contribute.

License

MIT

Citation

@software{menon2026soul,
  author = {Menon, Prahlad G.},
  title  = {soul.py: Persistent Identity and Memory for LLM Agents},
  year   = {2026},
  url    = {https://github.com/menonpg/soul.py}
}

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
examples		examples
modulizer		modulizer
soulmate		soulmate
tests		tests
.gitignore		.gitignore
PROJECT.md		PROJECT.md
README.md		README.md
ROADMAP.md		ROADMAP.md
hybrid_agent.py		hybrid_agent.py
modular_memory.py		modular_memory.py
pyproject.toml		pyproject.toml
rag_memory.py		rag_memory.py
rlm_memory.py		rlm_memory.py
router.py		router.py
soul.py		soul.py
soul_cli.py		soul_cli.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

soul.py 🧠

▶ Live Demos

📚 The Book

📚 Documentation & Blog

📊 Benchmarks

Install

🆕 v0.2.0 — Modulizer (50% Token Savings)

Quickstart

Multi-Provider Support

☁️ SoulMate API — Managed Cloud Option

How it works

Branches

v2.0 API

v2.0 Setup

📚 Knowledge Bases + Memory

🔌 Framework Integrations

📊 Benchmarks

Why not LangChain / LlamaIndex / MemGPT?

Roadmap

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

soul.py 🧠

▶ Live Demos

📚 The Book

📚 Documentation & Blog

📊 Benchmarks

Install

🆕 v0.2.0 — Modulizer (50% Token Savings)

Quickstart

Multi-Provider Support

☁️ SoulMate API — Managed Cloud Option

How it works

Branches

v2.0 API

v2.0 Setup

📚 Knowledge Bases + Memory

🔌 Framework Integrations

📊 Benchmarks

Why not LangChain / LlamaIndex / MemGPT?

Roadmap

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages