Gleann exposes a REST API when running in server mode (gleann serve). The API provides endpoints for index management, semantic search, RAG-based Q&A, and code graph queries.
# Start the server
gleann serve --port 8080
# Open API documentation in browser
open http://localhost:8080/api/docs
# Download the OpenAPI spec
curl http://localhost:8080/api/openapi.jsonWhen the server is running, interactive Swagger UI documentation is available at:
- Swagger UI:
GET /api/docs - OpenAPI 3.0 JSON:
GET /api/openapi.json
| Method | Path | Description |
|---|---|---|
| GET | /health |
Health check |
| Method | Path | Description |
|---|---|---|
| GET | /api/indexes |
List all indexes |
| GET | /api/indexes/{name} |
Get index metadata |
| POST | /api/indexes/{name}/build |
Build index from texts/items |
| DELETE | /api/indexes/{name} |
Delete an index |
| Method | Path | Description |
|---|---|---|
| POST | /api/indexes/{name}/search |
Semantic/hybrid search |
| POST | /api/indexes/{name}/ask |
RAG-based Q&A |
| POST | /api/search |
Multi-index search |
| Method | Path | Description |
|---|---|---|
| GET | /api/graph/{name} |
Graph statistics |
| POST | /api/graph/{name}/query |
Query the code graph |
| POST | /api/graph/{name}/index |
Trigger AST graph indexing |
The Memory Engine exposes a generic Entity / RELATES_TO graph that external
AI agents can read from and write to without coupling to gleann's internal RAG
pipeline. Each {name} corresponds to an independent KuzuDB store under
<index-dir>/<name>_memory/.
| Method | Path | Description |
|---|---|---|
| POST | /api/memory/{name}/inject |
Atomically upsert nodes and edges |
| DELETE | /api/memory/{name}/nodes/{id} |
Delete an entity and its incident edges |
| DELETE | /api/memory/{name}/edges |
Delete a specific relationship |
| POST | /api/memory/{name}/traverse |
Walk the graph N hops from a start node |
| Method | Path | Description |
|---|---|---|
| GET | /api/webhooks |
List registered webhooks |
| POST | /api/webhooks |
Register a webhook |
| DELETE | /api/webhooks |
Delete a webhook by URL |
| Method | Path | Description |
|---|---|---|
| GET | /metrics |
Prometheus-compatible metrics |
| Method | Path | Description |
|---|---|---|
| GET | /api/conversations |
List saved conversations |
| GET | /api/conversations/{id} |
Get conversation by ID |
| DELETE | /api/conversations/{id} |
Delete a conversation |
Hierarchical BBolt memory store with three tiers (short, medium, long). Blocks stored here are automatically injected into every LLM query as system context. Supports scoped isolation (e.g. per-conversation) and character limits.
| Method | Path | Description |
|---|---|---|
| GET | /api/blocks |
List memory blocks (optional ?tier=short|medium|long, ?scope=) |
| POST | /api/blocks |
Store a new memory block (supports char_limit, scope fields) |
| DELETE | /api/blocks |
Clear blocks (optional ?tier= filter) |
| DELETE | /api/blocks/{id} |
Delete a specific block by ID |
| GET | /api/blocks/search?q= |
Full-text search across all tiers (optional ?scope=) |
| GET | /api/blocks/context |
Compiled memory context (optional ?scope=, ?format=xml) |
| GET | /api/blocks/stats |
Storage statistics per tier |
Use gleann indexes as if they were OpenAI models. Compatible with any tool that speaks the OpenAI chat completions API.
| Method | Path | Description |
|---|---|---|
| GET | /v1/models |
List indexes as OpenAI-compatible model objects |
| POST | /v1/chat/completions |
Chat completions with automatic RAG injection |
Model naming: "gleann/<index-name>" for RAG-augmented answers, "gleann/" for pure LLM pass-through.
Custom headers: X-Gleann-Top-K (RAG result count), X-Gleann-Min-Score (score threshold).
Google's Agent-to-Agent protocol for agent discovery and inter-agent communication. Enabled by default; set GLEANN_A2A_ENABLED=false to disable.
| Method | Path | Description |
|---|---|---|
| GET | /.well-known/agent-card.json |
A2A Agent Card (discovery) |
| POST | /a2a/v1/message:send |
Send a message to an A2A skill |
| GET | /a2a/v1/tasks/{id} |
Get task status by ID |
Built-in skills: semantic-search, ask-rag, code-analysis, memory-management.
Orchestrates all memory layers (block storage, knowledge graph, vector search) through a single interface. Simplifies agent integration by eliminating the need to call individual memory APIs.
| Method | Path | Description |
|---|---|---|
| POST | /api/memory/ingest |
Store facts + relationships across memory layers |
| POST | /api/memory/recall |
Query all memory layers in parallel |
| Field | Type | Description |
|---|---|---|
facts[].content |
string | Required. The fact text to store |
facts[].tags |
string[] | Searchable tags |
facts[].tier |
string | short (default), medium, or long |
facts[].metadata |
object | Arbitrary key-value metadata (e.g. {"source_file": "auth.go"}) |
facts[].expires_in |
string | TTL as Go duration (1h, 7d, 2w) |
facts[].char_limit |
int | Per-block character limit |
relationships[].from |
string | Source entity ID |
relationships[].to |
string | Target entity ID |
relationships[].relation |
string | Relation type (e.g. DEPENDS_ON, IMPLEMENTS) |
relationships[].weight |
float | Edge importance (default: 1.0) |
relationships[].attributes |
object | Edge metadata |
scope |
string | Isolate facts to a conversation/agent |
| Field | Type | Description |
|---|---|---|
query |
string | Required. Search text |
layers |
string[] | blocks, graph, vector (default: all) |
top_k |
int | Max results per layer (default: 5) |
tier |
string | Filter blocks by tier |
tags |
string[] | Filter blocks by tags (AND logic) |
after |
string | Filter blocks created after (RFC3339 or duration like 24h, 7d) |
before |
string | Filter blocks created before |
relations |
string[] | Filter graph edges by relation types |
scope |
string | Filter blocks by scope |
depth |
int | Graph traversal depth (default: 2) |
format |
string | json (default) or context (LLM-ready XML) |
# Store a fact with metadata and TTL
curl -X POST http://localhost:8080/api/memory/ingest \
-H 'Content-Type: application/json' \
-d '{
"facts": [{
"content": "Auth service uses JWT tokens with RS256 signing",
"tags": ["auth", "security", "jwt"],
"metadata": {"source_file": "auth/jwt.go", "confidence": "high"},
"tier": "long",
"expires_in": "90d"
}],
"relationships": [{
"from": "AuthService",
"to": "JWTValidator",
"relation": "DEPENDS_ON",
"weight": 0.9
}]
}'
# Project-scoped ingest (sets scope="project:myapp" + default index)
curl -X POST http://localhost:8080/api/memory/ingest \
-H 'Content-Type: application/json' \
-d '{
"project": "myapp",
"facts": [{"content": "The API uses rate limiting at 100 req/s"}],
"relationships": [{"from": "Gateway", "to": "RateLimiter", "relation": "USES"}]
}'
# Recall facts from the last 7 days, filtered by tag
curl -X POST http://localhost:8080/api/memory/recall \
-H 'Content-Type: application/json' \
-d '{
"query": "authentication",
"layers": ["blocks"],
"tags": ["security"],
"after": "7d",
"tier": "long"
}'
# Project-scoped recall (searches blocks with scope + vector/graph in index)
curl -X POST http://localhost:8080/api/memory/recall \
-H 'Content-Type: application/json' \
-d '{
"project": "myapp",
"query": "rate limiting",
"format": "context"
}'
# Full multi-layer recall in LLM-ready format
curl -X POST http://localhost:8080/api/memory/recall \
-H 'Content-Type: application/json' \
-d '{
"query": "how does auth work",
"format": "context",
"top_k": 10
}'
# Returns XML with <facts>, <relationships>, <relevant_documents>Monitor and manage long-running background operations (indexing, memory consolidation, health checks).
| Method | Path | Description |
|---|---|---|
| GET | /api/tasks |
List background tasks (optional ?status= filter) |
| GET | /api/tasks/{id} |
Get task status by ID |
| DELETE | /api/tasks |
Cleanup completed/failed tasks older than 1 hour |
curl -X POST http://localhost:8080/api/indexes/my-code/search \
-H 'Content-Type: application/json' \
-d '{
"query": "how does authentication work",
"top_k": 5,
"hybrid_alpha": 0.7,
"graph_context": true
}'curl -X POST http://localhost:8080/api/indexes/my-code/ask \
-H 'Content-Type: application/json' \
-d '{
"question": "How is the user session managed?",
"top_k": 10
}'Ask request fields:
| Field | Type | Default | Description |
|---|---|---|---|
question |
string | (required) | Question to answer using RAG |
top_k |
integer | 10 | Number of context passages to retrieve |
llm_model |
string | — | LLM model name |
llm_provider |
string | — | LLM provider (ollama, openai, anthropic) |
system_prompt |
string | — | Custom system prompt for the LLM |
role |
string | — | Named role (e.g. code, shell). Resolves to a system prompt from the role registry. |
conversation_id |
string | — | Continue an existing conversation by ID. Restores message history. |
stream |
boolean | false | Enable SSE streaming |
Stream tokens in real-time via Server-Sent Events:
curl -N -X POST http://localhost:8080/api/indexes/my-code/ask \
-H 'Content-Type: application/json' \
-d '{
"question": "Explain the authentication flow",
"stream": true
}'Or use the query parameter:
curl -N -X POST 'http://localhost:8080/api/indexes/my-code/ask?stream=true' \
-H 'Content-Type: application/json' \
-d '{"question": "Explain the authentication flow"}'Response format (Content-Type: text/event-stream):
data: {"token":"The"}
data: {"token":" authentication"}
data: {"token":" flow"}
data: {"token":" works by..."}
data: [DONE]
Each data: line contains a JSON object with a token field. The stream ends with data: [DONE]. If an error occurs mid-stream, it sends data: {"error": "..."} before [DONE].
JavaScript example:
const response = await fetch('/api/indexes/my-code/ask', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ question: 'How does auth work?', stream: true })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
for (const line of text.split('\n')) {
if (line.startsWith('data: ') && line !== 'data: [DONE]') {
const { token } = JSON.parse(line.slice(6));
process.stdout.write(token); // or append to UI
}
}
}curl -X POST http://localhost:8080/api/graph/my-code/query \
-H 'Content-Type: application/json' \
-d '{
"query": "callers",
"symbol": "github.com/foo/bar.HandleLogin"
}'curl -X POST http://localhost:8080/api/graph/my-code/query \
-H 'Content-Type: application/json' \
-d '{
"query": "impact",
"symbol": "github.com/foo/bar.UserService",
"max_depth": 3
}'Atomically upsert Entity nodes and RELATES_TO edges. The operation is idempotent (re-submitting the same payload creates no duplicates):
curl -X POST http://localhost:8080/api/memory/project/inject \
-H 'Content-Type: application/json' \
-d '{
"nodes": [
{
"id": "req-001",
"type": "requirement",
"content": "User must be able to log in with email and password",
"attributes": {"priority": "high", "sprint": 3}
},
{
"id": "feat-jwt",
"type": "code_symbol",
"content": "JWT authentication handler"
}
],
"edges": [
{
"from": "req-001",
"to": "feat-jwt",
"relation_type": "IMPLEMENTED_BY",
"weight": 1.0
}
]
}'Response:
{"ok": true, "nodes_sent": 2, "edges_sent": 1}Walk the knowledge graph from a starting node up to depth hops. Returns all
reachable nodes and the intra-subgraph edges:
curl -X POST http://localhost:8080/api/memory/project/traverse \
-H 'Content-Type: application/json' \
-d '{"start_id": "req-001", "depth": 2}'Response:
{
"nodes": [
{"id": "req-001", "type": "requirement", "content": "User must be able to log in..."},
{"id": "feat-jwt", "type": "code_symbol", "content": "JWT authentication handler"}
],
"edges": [
{"from": "req-001", "to": "feat-jwt", "relation_type": "IMPLEMENTED_BY", "weight": 1}
],
"count": 2
}curl -X DELETE http://localhost:8080/api/memory/project/nodes/req-001All RELATES_TO edges incident to the deleted node are removed automatically
(DETACH DELETE semantics).
curl -X DELETE http://localhost:8080/api/memory/project/edges \
-H 'Content-Type: application/json' \
-d '{
"from": "req-001",
"to": "feat-jwt",
"relation_type": "IMPLEMENTED_BY"
}'Only the matching RELATES_TO relationship is removed; both nodes survive and any other relation types between them remain intact.
curl -X POST http://localhost:8080/api/indexes/test-index/build \
-H 'Content-Type: application/json' \
-d '{
"texts": ["Hello world", "foo bar baz"],
"metadata": {"source": "test"}
}'Search across multiple indexes simultaneously. Results are merged by score:
# Search specific indexes
curl -X POST http://localhost:8080/api/search \
-H 'Content-Type: application/json' \
-d '{
"query": "authentication flow",
"indexes": ["backend-code", "frontend-code", "docs"],
"top_k": 10
}'
# Search ALL available indexes
curl -X POST http://localhost:8080/api/search \
-H 'Content-Type: application/json' \
-d '{
"query": "authentication flow",
"top_k": 10
}'Response:
{
"results": [
{
"index": "backend-code",
"text": "func HandleLogin...",
"score": 0.92,
"metadata": {"source": "auth/handler.go"},
"graph_context": {
"symbols": [{"name": "HandleLogin", "kind": "function"}]
}
},
{
"index": "docs",
"text": "Authentication uses JWT tokens...",
"score": 0.88,
"metadata": {"source": "auth.md"},
"document_context": {
"vpath": "auth.md",
"name": "Authentication Overview",
"summary": "This document explains the JWT flow..."
}
}
],
"count": 2,
"query_ms": 45
}CLI multi-search:
# Search specific indexes (comma-separated)
gleann search backend-code,frontend-code "authentication"
# Search all indexes
gleann search --all dummy "authentication"CLI multi-index ask (conversations work across multiple indexes):
# Ask across multiple indexes
gleann ask docs,backend-code "How does authentication work?"
# Pipe input with multi-index
cat auth.go | gleann ask backend,frontend "Review this auth handler"
# Continue a multi-index conversation
gleann ask docs,code --continue-last "What about the error handling?"
# Use a role
gleann ask my-code "Explain this module" --role explain --format markdownRegister a webhook to receive POST notifications for events:
# Register a webhook
curl -X POST http://localhost:8080/api/webhooks \
-H 'Content-Type: application/json' \
-d '{
"url": "https://your-server.com/gleann-hook",
"events": ["build_complete", "index_deleted"],
"secret": "optional-hmac-secret"
}'
# List registered webhooks
curl http://localhost:8080/api/webhooks
# Delete a webhook
curl -X DELETE http://localhost:8080/api/webhooks \
-H 'Content-Type: application/json' \
-d '{"url": "https://your-server.com/gleann-hook"}'Webhook payload (POST to your URL):
{
"event": "build_complete",
"index": "my-code",
"count": 1250,
"buildMs": 3200,
"timestamp": "2026-03-10T14:30:00Z"
}If a secret is configured, payloads include an X-Gleann-Signature header with HMAC-SHA256 signature: sha256=<hex>.
Supported events: build_complete, index_deleted, * (all events).
Prometheus-compatible metrics endpoint:
curl http://localhost:8080/metricsResponse (text/plain, Prometheus exposition format):
# HELP gleann_up Whether the gleann server is running.
# TYPE gleann_up gauge
gleann_up 1
# HELP gleann_search_requests_total Total search requests.
# TYPE gleann_search_requests_total counter
gleann_search_requests_total 42
# HELP gleann_search_latency_avg_ms Average search latency in milliseconds.
# TYPE gleann_search_latency_avg_ms gauge
gleann_search_latency_avg_ms 23.50
# HELP gleann_multi_search_requests_total Total multi-index search requests.
# TYPE gleann_multi_search_requests_total counter
gleann_multi_search_requests_total 5
# HELP gleann_cached_searchers Number of cached searcher instances.
# TYPE gleann_cached_searchers gauge
gleann_cached_searchers 3
Available metrics: gleann_up, gleann_uptime_seconds, gleann_search_requests_total, gleann_search_errors_total, gleann_search_latency_avg_ms, gleann_multi_search_requests_total, gleann_build_requests_total, gleann_build_errors_total, gleann_build_latency_avg_ms, gleann_ask_requests_total, gleann_delete_requests_total, gleann_webhooks_fired_total, gleann_cached_searchers.
Grafana / Prometheus integration: Point your Prometheus scraper at http://<host>:8080/metrics.
Manage saved conversation history:
# List all conversations
curl http://localhost:8080/api/conversations
# Get a specific conversation by ID (full or prefix)
curl http://localhost:8080/api/conversations/a1b2c3d4
# Delete a conversation
curl -X DELETE http://localhost:8080/api/conversations/a1b2c3d4
# Ask with a role
curl -X POST http://localhost:8080/api/indexes/my-code/ask \
-H 'Content-Type: application/json' \
-d '{"question": "Review this code", "role": "code"}'
# Continue an existing conversation
curl -X POST http://localhost:8080/api/indexes/my-code/ask \
-H 'Content-Type: application/json' \
-d '{"question": "What about error handling?", "conversation_id": "a1b2c3d4..."}'All endpoints include CORS headers for cross-origin access:
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, DELETE, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization
All errors return JSON with a single error field:
{
"error": "index \"foo\" not found: open .../foo/meta.json: no such file or directory"
}Common HTTP status codes:
| Code | Meaning |
|---|---|
| 200 | Success |
| 400 | Bad request (missing required fields) |
| 404 | Index or graph not found |
| 429 | Rate limit exceeded (per-IP token bucket; see GLEANN_RATE_LIMIT) — includes Retry-After: 1 header |
| 500 | Internal server error |
| 503 | Feature unavailable (e.g., graph without treesitter build tag) |
| 504 | Gateway timeout — request exceeded its deadline (see GLEANN_TIMEOUT_*_S env vars) |
The server applies per-IP token-bucket rate limiting (default: 60 req/s sustained, 120 burst). The /health and /metrics endpoints are exempt. Configure via GLEANN_RATE_LIMIT and GLEANN_RATE_BURST environment variables.
Each endpoint has a context deadline based on its path:
| Endpoint pattern | Default timeout | Env var |
|---|---|---|
*/ask, /v1/chat/completions |
5 minutes | GLEANN_TIMEOUT_ASK_S |
*/search |
30 seconds | GLEANN_TIMEOUT_SEARCH_S |
*/build |
10 minutes | GLEANN_TIMEOUT_BUILD_S |
| All others | 60 seconds | GLEANN_TIMEOUT_DEFAULT_S |
SSE streaming endpoints (?stream=true or Accept: text/event-stream) bypass the timeout middleware; they rely on client disconnect detection instead.
curl -X POST http://localhost:8080/api/blocks \
-H 'Content-Type: application/json' \
-d '{
"content": "Project uses hexagonal architecture",
"tier": "long",
"tags": ["convention", "architecture"],
"label": "project_fact"
}'curl -X POST http://localhost:8080/api/blocks \
-H 'Content-Type: application/json' \
-d '{
"content": "User asked about deployment strategies",
"tier": "medium",
"scope": "conv-abc123",
"label": "conversation_note"
}'curl -X POST http://localhost:8080/api/blocks \
-H 'Content-Type: application/json' \
-d '{
"content": "Running notes that may grow over time...",
"tier": "long",
"char_limit": 2000,
"label": "rolling_notes"
}'# All tiers
curl http://localhost:8080/api/blocks
# Only long-term memories
curl http://localhost:8080/api/blocks?tier=long
# Scoped to a conversation (global + conversation-specific)
curl 'http://localhost:8080/api/blocks?scope=conv-abc123'# All scopes
curl 'http://localhost:8080/api/blocks/search?q=architecture'
# Scoped search
curl 'http://localhost:8080/api/blocks/search?q=architecture&scope=conv-abc123'# Global context
curl http://localhost:8080/api/blocks/context
# Conversation-scoped context
curl 'http://localhost:8080/api/blocks/context?scope=conv-abc123'
# XML format
curl 'http://localhost:8080/api/blocks/context?format=xml'curl -X DELETE http://localhost:8080/api/blocks/abc123Use any OpenAI-compatible client with gleann as the backend:
# RAG-augmented: model = "gleann/<index-name>"
curl -X POST http://localhost:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "gleann/my-code",
"messages": [{"role": "user", "content": "How does auth work?"}],
"stream": false
}'
# Custom RAG parameters via headers
curl -X POST http://localhost:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'X-Gleann-Top-K: 15' \
-d '{
"model": "gleann/my-docs",
"messages": [{"role": "user", "content": "Summarize the architecture"}]
}'
# List available indexes as OpenAI models
curl http://localhost:8080/v1/modelsPython (OpenAI SDK):
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
response = client.chat.completions.create(
model="gleann/my-code",
messages=[{"role": "user", "content": "Explain the auth flow"}],
)
print(response.choices[0].message.content)