HippoFormer

Hippocampal Memory Selection for Transformers

A biologically-inspired memory architecture that brings hippocampal memory consolidation to large language models

Model • Results • Installation • Usage • Citation

Overview

HippoFormer integrates hippocampal memory mechanisms directly into transformer architectures. Inspired by how the human hippocampus selectively consolidates important memories through Sharp Wave Ripples (SPW-Rs), HippoFormer learns to:

Selectively tag important tokens (like the brain identifies significant events)
Consolidate memories through priority-based replay (like sleep consolidation)
Maintain stable representations through drift calibration

Results

Perplexity Comparison

Model	Parameters	WikiText-2 PPL
GPT-2	124M	29.41
Gemma-2B	2B	~18
HippoFormer	2B + 15M	11.83

Ablation Study

Our ablation analysis validates that both hippocampal components are essential:

Configuration	PPL	Δ PPL
Full HippoFormer	11.83	—
Without Salience Gate	39.75	+27.92
Without Memory Buffer	89.84	+78.01
Random Salience	89.84	+78.01

Brain-Like Behavior Validation

Metric	Value	Interpretation
Content/Function Word Ratio	2.11x	Content words tagged more (selective memory)
Long-Range PPL Benefit	+6.95	Better on late tokens (remembers context)
Buffer Priority	4.9/5.0	High-importance items retained
Temporal Coherence	0.58	Nearby tokens tagged together

Installation

# Clone repository
git clone https://github.com/Gustav-Proxi/HippoFormer.git
cd HippoFormer

# Install with training dependencies
pip install -e ".[train]"

# Or install everything
pip install -e ".[all]"

Requirements

Python 3.10+
PyTorch 2.0+
Transformers 4.36+
CUDA 11.8+ (for GPU training)

Usage

Quick Start

from hippoformer import HippoFormer, HippoFormerConfig
from transformers import AutoTokenizer
import torch

# Initialize
config = HippoFormerConfig(
    base_model_name="google/gemma-2b",
    freeze_base=True,
    use_lora=True,
)
model = HippoFormer(config)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")

# Load pretrained weights
ckpt = torch.load("pytorch_model.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"], strict=False)

# Generate
inputs = tokenizer("The capital of France is", return_tensors="pt")
outputs = model.generate(inputs["input_ids"], max_new_tokens=20)
print(tokenizer.decode(outputs[0]))

Load from HuggingFace

from huggingface_hub import hf_hub_download

# Download checkpoint
ckpt_path = hf_hub_download(
    repo_id="Gustav-Proxi/HippoFormer-Gemma2B",
    filename="pytorch_model.pt"
)

# Load
ckpt = torch.load(ckpt_path, map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"], strict=False)

Training

# Train on WikiText-2
python -m hippoformer.train \
    --dataset wikitext \
    --dataset_config wikitext-2-raw-v1 \
    --batch_size 8 \
    --num_epochs 3 \
    --output_dir ./outputs

Evaluation

# Run comprehensive evaluation
python -m evaluation.comprehensive_eval \
    --checkpoint ./outputs/checkpoint-step-110000/checkpoint.pt \
    --output results.json \
    --device cuda

Architecture Details

Salience Gate

The salience gate implements a dual-pathway importance scoring mechanism inspired by hippocampal Sharp Wave Ripples:

# Local pathway: token-intrinsic importance
local_scores = MLP(hidden_states)  # Like single-electrode ripple detection

# Global pathway: contextual importance
global_scores = CrossAttention(hidden_states)  # Like population synchrony

# Combined with learnable weighting
salience = sigmoid(w * local + (1-w) * global - threshold)

Memory Consolidator

Priority-based buffer with multi-round replay consolidation:

# Store with priority = salience * importance_weight
buffer.store(keys, values, priorities)

# Multi-round replay with exponential decay
for round in range(max_rounds):
    consolidated = replay(buffer, decay_rate ** round)

Key Hyperparameters

Parameter	Default	Description
`buffer_size`	2048	Memory buffer capacity
`decay_rate`	0.9	Consolidation decay per round
`importance_range`	[1.0, 5.0]	Min/max importance weights
`salience_threshold`	0.0	Initial threshold (learned)

Neuroscience Background

HippoFormer is inspired by hippocampal memory consolidation mechanisms:

Brain Mechanism	HippoFormer Implementation
Sharp Wave Ripples (SPW-Rs)	Salience Gate (dual-pathway detection)
Memory tagging	Importance weights [1.0 - 5.0]
Sleep replay	Multi-round consolidation with decay
Synaptic homeostasis	Drift calibration

Key insight: The hippocampus doesn't remember everything equally. It selectively tags important experiences and consolidates them through replay during sleep. HippoFormer brings this mechanism to transformers.

Project Structure

HippoFormer/
├── hippoformer/
│   ├── config.py           # HippoFormerConfig
│   ├── model.py            # Main HippoFormer model
│   ├── train.py            # Training script
│   ├── losses.py           # Multi-objective losses
│   ├── salience/
│   │   └── gate.py         # SalienceGate module
│   ├── memory/
│   │   └── buffer.py       # DifferentiablePriorityBuffer
│   └── drift/
│       └── calibrator.py   # EmbeddingDriftCalibrator
├── evaluation/
│   ├── metrics.py          # PPL, BLEU, ROUGE, F1
│   ├── ablation.py         # Ablation framework
│   ├── comprehensive_eval.py  # Full evaluation suite
│   └── visualization.py    # Paper figures
├── scripts/
│   ├── runpod/             # Cloud training scripts
│   └── aws/                # AWS deployment
└── tests/                  # Unit tests

Citation

@misc{hippoformer2025,
  title={HippoFormer: Hippocampal Memory Selection for Transformers},
  author={Vaishak Girish Kumar and Sanika},
  year={2025},
  howpublished={\url{https://github.com/Gustav-Proxi/HippoFormer}},
}

Contributors

Vaishak Girish Kumar (https://github.com/Gustav-Proxi)
Sanika (https://github.com/Sanika0212)

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Acknowledgments

Built on Gemma by Google DeepMind
Inspired by hippocampal memory research
Training infrastructure on RunPod

HippoFormer — Bringing biological memory to artificial intelligence

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.claude		.claude
brainllm.egg-info		brainllm.egg-info
docs		docs
evaluation		evaluation
hippoformer		hippoformer
notebooks		notebooks
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HippoFormer

Hippocampal Memory Selection for Transformers

Overview

Results

Perplexity Comparison

Ablation Study

Brain-Like Behavior Validation

Installation

Requirements

Usage

Quick Start

Load from HuggingFace

Training

Evaluation

Architecture Details

Salience Gate

Memory Consolidator

Key Hyperparameters

Neuroscience Background

Project Structure

Citation

Contributors

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HippoFormer

Hippocampal Memory Selection for Transformers

Overview

Results

Perplexity Comparison

Ablation Study

Brain-Like Behavior Validation

Installation

Requirements

Usage

Quick Start

Load from HuggingFace

Training

Evaluation

Architecture Details

Salience Gate

Memory Consolidator

Key Hyperparameters

Neuroscience Background

Project Structure

Citation

Contributors

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages