A biologically-inspired memory architecture that brings hippocampal memory consolidation to large language models
Model • Results • Installation • Usage • Citation
HippoFormer integrates hippocampal memory mechanisms directly into transformer architectures. Inspired by how the human hippocampus selectively consolidates important memories through Sharp Wave Ripples (SPW-Rs), HippoFormer learns to:
- Selectively tag important tokens (like the brain identifies significant events)
- Consolidate memories through priority-based replay (like sleep consolidation)
- Maintain stable representations through drift calibration
| Model | Parameters | WikiText-2 PPL |
|---|---|---|
| GPT-2 | 124M | 29.41 |
| Gemma-2B | 2B | ~18 |
| HippoFormer | 2B + 15M | 11.83 |
Our ablation analysis validates that both hippocampal components are essential:
| Configuration | PPL | Δ PPL |
|---|---|---|
| Full HippoFormer | 11.83 | — |
| Without Salience Gate | 39.75 | +27.92 |
| Without Memory Buffer | 89.84 | +78.01 |
| Random Salience | 89.84 | +78.01 |
| Metric | Value | Interpretation |
|---|---|---|
| Content/Function Word Ratio | 2.11x | Content words tagged more (selective memory) |
| Long-Range PPL Benefit | +6.95 | Better on late tokens (remembers context) |
| Buffer Priority | 4.9/5.0 | High-importance items retained |
| Temporal Coherence | 0.58 | Nearby tokens tagged together |
# Clone repository
git clone https://github.com/Gustav-Proxi/HippoFormer.git
cd HippoFormer
# Install with training dependencies
pip install -e ".[train]"
# Or install everything
pip install -e ".[all]"- Python 3.10+
- PyTorch 2.0+
- Transformers 4.36+
- CUDA 11.8+ (for GPU training)
from hippoformer import HippoFormer, HippoFormerConfig
from transformers import AutoTokenizer
import torch
# Initialize
config = HippoFormerConfig(
base_model_name="google/gemma-2b",
freeze_base=True,
use_lora=True,
)
model = HippoFormer(config)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
# Load pretrained weights
ckpt = torch.load("pytorch_model.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"], strict=False)
# Generate
inputs = tokenizer("The capital of France is", return_tensors="pt")
outputs = model.generate(inputs["input_ids"], max_new_tokens=20)
print(tokenizer.decode(outputs[0]))from huggingface_hub import hf_hub_download
# Download checkpoint
ckpt_path = hf_hub_download(
repo_id="Gustav-Proxi/HippoFormer-Gemma2B",
filename="pytorch_model.pt"
)
# Load
ckpt = torch.load(ckpt_path, map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"], strict=False)# Train on WikiText-2
python -m hippoformer.train \
--dataset wikitext \
--dataset_config wikitext-2-raw-v1 \
--batch_size 8 \
--num_epochs 3 \
--output_dir ./outputs# Run comprehensive evaluation
python -m evaluation.comprehensive_eval \
--checkpoint ./outputs/checkpoint-step-110000/checkpoint.pt \
--output results.json \
--device cudaThe salience gate implements a dual-pathway importance scoring mechanism inspired by hippocampal Sharp Wave Ripples:
# Local pathway: token-intrinsic importance
local_scores = MLP(hidden_states) # Like single-electrode ripple detection
# Global pathway: contextual importance
global_scores = CrossAttention(hidden_states) # Like population synchrony
# Combined with learnable weighting
salience = sigmoid(w * local + (1-w) * global - threshold)Priority-based buffer with multi-round replay consolidation:
# Store with priority = salience * importance_weight
buffer.store(keys, values, priorities)
# Multi-round replay with exponential decay
for round in range(max_rounds):
consolidated = replay(buffer, decay_rate ** round)| Parameter | Default | Description |
|---|---|---|
buffer_size |
2048 | Memory buffer capacity |
decay_rate |
0.9 | Consolidation decay per round |
importance_range |
[1.0, 5.0] | Min/max importance weights |
salience_threshold |
0.0 | Initial threshold (learned) |
HippoFormer is inspired by hippocampal memory consolidation mechanisms:
| Brain Mechanism | HippoFormer Implementation |
|---|---|
| Sharp Wave Ripples (SPW-Rs) | Salience Gate (dual-pathway detection) |
| Memory tagging | Importance weights [1.0 - 5.0] |
| Sleep replay | Multi-round consolidation with decay |
| Synaptic homeostasis | Drift calibration |
Key insight: The hippocampus doesn't remember everything equally. It selectively tags important experiences and consolidates them through replay during sleep. HippoFormer brings this mechanism to transformers.
HippoFormer/
├── hippoformer/
│ ├── config.py # HippoFormerConfig
│ ├── model.py # Main HippoFormer model
│ ├── train.py # Training script
│ ├── losses.py # Multi-objective losses
│ ├── salience/
│ │ └── gate.py # SalienceGate module
│ ├── memory/
│ │ └── buffer.py # DifferentiablePriorityBuffer
│ └── drift/
│ └── calibrator.py # EmbeddingDriftCalibrator
├── evaluation/
│ ├── metrics.py # PPL, BLEU, ROUGE, F1
│ ├── ablation.py # Ablation framework
│ ├── comprehensive_eval.py # Full evaluation suite
│ └── visualization.py # Paper figures
├── scripts/
│ ├── runpod/ # Cloud training scripts
│ └── aws/ # AWS deployment
└── tests/ # Unit tests
@misc{hippoformer2025,
title={HippoFormer: Hippocampal Memory Selection for Transformers},
author={Vaishak Girish Kumar and Sanika},
year={2025},
howpublished={\url{https://github.com/Gustav-Proxi/HippoFormer}},
}- Vaishak Girish Kumar (https://github.com/Gustav-Proxi)
- Sanika (https://github.com/Sanika0212)
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
- Built on Gemma by Google DeepMind
- Inspired by hippocampal memory research
- Training infrastructure on RunPod
HippoFormer — Bringing biological memory to artificial intelligence
