Gleann REST API Reference

Gleann exposes a REST API when running in server mode (gleann serve). The API provides endpoints for index management, semantic search, RAG-based Q&A, and code graph queries.

Quick Start

# Start the server
gleann serve --port 8080

# Open API documentation in browser
open http://localhost:8080/api/docs

# Download the OpenAPI spec
curl http://localhost:8080/api/openapi.json

Interactive Documentation

When the server is running, interactive Swagger UI documentation is available at:

Swagger UI: GET /api/docs
OpenAPI 3.0 JSON: GET /api/openapi.json

Endpoints

Health

Method	Path	Description
GET	`/health`	Health check

Index Management

Method	Path	Description
GET	`/api/indexes`	List all indexes
GET	`/api/indexes/{name}`	Get index metadata
POST	`/api/indexes/{name}/build`	Build index from texts/items
DELETE	`/api/indexes/{name}`	Delete an index

Search & RAG

Method	Path	Description
POST	`/api/indexes/{name}/search`	Semantic/hybrid search
POST	`/api/indexes/{name}/ask`	RAG-based Q&A
POST	`/api/search`	Multi-index search

Code Graph (requires treesitter build)

Method	Path	Description
GET	`/api/graph/{name}`	Graph statistics
POST	`/api/graph/{name}/query`	Query the code graph
POST	`/api/graph/{name}/index`	Trigger AST graph indexing

Memory Engine (Generic Knowledge Graph)

The Memory Engine exposes a generic Entity / RELATES_TO graph that external AI agents can read from and write to without coupling to gleann's internal RAG pipeline. Each {name} corresponds to an independent KuzuDB store under <index-dir>/<name>_memory/.

Method	Path	Description
POST	`/api/memory/{name}/inject`	Atomically upsert nodes and edges
DELETE	`/api/memory/{name}/nodes/{id}`	Delete an entity and its incident edges
DELETE	`/api/memory/{name}/edges`	Delete a specific relationship
POST	`/api/memory/{name}/traverse`	Walk the graph N hops from a start node

Webhooks

Method	Path	Description
GET	`/api/webhooks`	List registered webhooks
POST	`/api/webhooks`	Register a webhook
DELETE	`/api/webhooks`	Delete a webhook by URL

Metrics

Method	Path	Description
GET	`/metrics`	Prometheus-compatible metrics

Conversations

Method	Path	Description
GET	`/api/conversations`	List saved conversations
GET	`/api/conversations/{id}`	Get conversation by ID
DELETE	`/api/conversations/{id}`	Delete a conversation

Memory Blocks (Long-term Memory)

Hierarchical BBolt memory store with three tiers (short, medium, long). Blocks stored here are automatically injected into every LLM query as system context. Supports scoped isolation (e.g. per-conversation) and character limits.

Method	Path	Description
GET	`/api/blocks`	List memory blocks (optional `?tier=short\|medium\|long`, `?scope=`)
POST	`/api/blocks`	Store a new memory block (supports `char_limit`, `scope` fields)
DELETE	`/api/blocks`	Clear blocks (optional `?tier=` filter)
DELETE	`/api/blocks/{id}`	Delete a specific block by ID
GET	`/api/blocks/search?q=`	Full-text search across all tiers (optional `?scope=`)
GET	`/api/blocks/context`	Compiled memory context (optional `?scope=`, `?format=xml`)
GET	`/api/blocks/stats`	Storage statistics per tier

OpenAI-Compatible Proxy

Use gleann indexes as if they were OpenAI models. Compatible with any tool that speaks the OpenAI chat completions API.

Method	Path	Description
GET	`/v1/models`	List indexes as OpenAI-compatible model objects
POST	`/v1/chat/completions`	Chat completions with automatic RAG injection

Model naming: "gleann/<index-name>" for RAG-augmented answers, "gleann/" for pure LLM pass-through.

Custom headers: X-Gleann-Top-K (RAG result count), X-Gleann-Min-Score (score threshold).

A2A Protocol (Agent-to-Agent)

Google's Agent-to-Agent protocol for agent discovery and inter-agent communication. Enabled by default; set GLEANN_A2A_ENABLED=false to disable.

Method	Path	Description
GET	`/.well-known/agent-card.json`	A2A Agent Card (discovery)
POST	`/a2a/v1/message:send`	Send a message to an A2A skill
GET	`/a2a/v1/tasks/{id}`	Get task status by ID

Built-in skills: semantic-search, ask-rag, code-analysis, memory-management.

Unified Memory API

Orchestrates all memory layers (block storage, knowledge graph, vector search) through a single interface. Simplifies agent integration by eliminating the need to call individual memory APIs.

Method	Path	Description
POST	`/api/memory/ingest`	Store facts + relationships across memory layers
POST	`/api/memory/recall`	Query all memory layers in parallel

Ingest Request Fields

Field	Type	Description
`facts[].content`	string	Required. The fact text to store
`facts[].tags`	string[]	Searchable tags
`facts[].tier`	string	`short` (default), `medium`, or `long`
`facts[].metadata`	object	Arbitrary key-value metadata (e.g. `{"source_file": "auth.go"}`)
`facts[].expires_in`	string	TTL as Go duration (`1h`, `7d`, `2w`)
`facts[].char_limit`	int	Per-block character limit
`relationships[].from`	string	Source entity ID
`relationships[].to`	string	Target entity ID
`relationships[].relation`	string	Relation type (e.g. `DEPENDS_ON`, `IMPLEMENTS`)
`relationships[].weight`	float	Edge importance (default: 1.0)
`relationships[].attributes`	object	Edge metadata
`scope`	string	Isolate facts to a conversation/agent

Recall Request Fields

Field	Type	Description
`query`	string	Required. Search text
`layers`	string[]	`blocks`, `graph`, `vector` (default: all)
`top_k`	int	Max results per layer (default: 5)
`tier`	string	Filter blocks by tier
`tags`	string[]	Filter blocks by tags (AND logic)
`after`	string	Filter blocks created after (RFC3339 or duration like `24h`, `7d`)
`before`	string	Filter blocks created before
`relations`	string[]	Filter graph edges by relation types
`scope`	string	Filter blocks by scope
`depth`	int	Graph traversal depth (default: 2)
`format`	string	`json` (default) or `context` (LLM-ready XML)

Unified Memory Examples

# Store a fact with metadata and TTL
curl -X POST http://localhost:8080/api/memory/ingest \
  -H 'Content-Type: application/json' \
  -d '{
    "facts": [{
      "content": "Auth service uses JWT tokens with RS256 signing",
      "tags": ["auth", "security", "jwt"],
      "metadata": {"source_file": "auth/jwt.go", "confidence": "high"},
      "tier": "long",
      "expires_in": "90d"
    }],
    "relationships": [{
      "from": "AuthService",
      "to": "JWTValidator",
      "relation": "DEPENDS_ON",
      "weight": 0.9
    }]
  }'

# Project-scoped ingest (sets scope="project:myapp" + default index)
curl -X POST http://localhost:8080/api/memory/ingest \
  -H 'Content-Type: application/json' \
  -d '{
    "project": "myapp",
    "facts": [{"content": "The API uses rate limiting at 100 req/s"}],
    "relationships": [{"from": "Gateway", "to": "RateLimiter", "relation": "USES"}]
  }'

# Recall facts from the last 7 days, filtered by tag
curl -X POST http://localhost:8080/api/memory/recall \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "authentication",
    "layers": ["blocks"],
    "tags": ["security"],
    "after": "7d",
    "tier": "long"
  }'

# Project-scoped recall (searches blocks with scope + vector/graph in index)
curl -X POST http://localhost:8080/api/memory/recall \
  -H 'Content-Type: application/json' \
  -d '{
    "project": "myapp",
    "query": "rate limiting",
    "format": "context"
  }'

# Full multi-layer recall in LLM-ready format
curl -X POST http://localhost:8080/api/memory/recall \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "how does auth work",
    "format": "context",
    "top_k": 10
  }'
# Returns XML with <facts>, <relationships>, <relevant_documents>

Background Tasks

Monitor and manage long-running background operations (indexing, memory consolidation, health checks).

Method	Path	Description
GET	`/api/tasks`	List background tasks (optional `?status=` filter)
GET	`/api/tasks/{id}`	Get task status by ID
DELETE	`/api/tasks`	Cleanup completed/failed tasks older than 1 hour

Examples

Search

curl -X POST http://localhost:8080/api/indexes/my-code/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "how does authentication work",
    "top_k": 5,
    "hybrid_alpha": 0.7,
    "graph_context": true
  }'

Ask (RAG)

curl -X POST http://localhost:8080/api/indexes/my-code/ask \
  -H 'Content-Type: application/json' \
  -d '{
    "question": "How is the user session managed?",
    "top_k": 10
  }'

Ask request fields:

Field	Type	Default	Description
`question`	string	(required)	Question to answer using RAG
`top_k`	integer	10	Number of context passages to retrieve
`llm_model`	string	—	LLM model name
`llm_provider`	string	—	LLM provider (`ollama`, `openai`, `anthropic`)
`system_prompt`	string	—	Custom system prompt for the LLM
`role`	string	—	Named role (e.g. `code`, `shell`). Resolves to a system prompt from the role registry.
`conversation_id`	string	—	Continue an existing conversation by ID. Restores message history.
`stream`	boolean	false	Enable SSE streaming

Ask with SSE Streaming

Stream tokens in real-time via Server-Sent Events:

curl -N -X POST http://localhost:8080/api/indexes/my-code/ask \
  -H 'Content-Type: application/json' \
  -d '{
    "question": "Explain the authentication flow",
    "stream": true
  }'

Or use the query parameter:

curl -N -X POST 'http://localhost:8080/api/indexes/my-code/ask?stream=true' \
  -H 'Content-Type: application/json' \
  -d '{"question": "Explain the authentication flow"}'

Response format (Content-Type: text/event-stream):

data: {"token":"The"}

data: {"token":" authentication"}

data: {"token":" flow"}

data: {"token":" works by..."}

data: [DONE]

Each data: line contains a JSON object with a token field. The stream ends with data: [DONE]. If an error occurs mid-stream, it sends data: {"error": "..."} before [DONE].

JavaScript example:

const response = await fetch('/api/indexes/my-code/ask', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ question: 'How does auth work?', stream: true })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value);
  for (const line of text.split('\n')) {
    if (line.startsWith('data: ') && line !== 'data: [DONE]') {
      const { token } = JSON.parse(line.slice(6));
      process.stdout.write(token); // or append to UI
    }
  }
}

Graph Query — Callers

curl -X POST http://localhost:8080/api/graph/my-code/query \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "callers",
    "symbol": "github.com/foo/bar.HandleLogin"
  }'

Graph Query — Impact Analysis

curl -X POST http://localhost:8080/api/graph/my-code/query \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "impact",
    "symbol": "github.com/foo/bar.UserService",
    "max_depth": 3
  }'

Memory Engine — Inject Knowledge

Atomically upsert Entity nodes and RELATES_TO edges. The operation is idempotent (re-submitting the same payload creates no duplicates):

curl -X POST http://localhost:8080/api/memory/project/inject \
  -H 'Content-Type: application/json' \
  -d '{
    "nodes": [
      {
        "id": "req-001",
        "type": "requirement",
        "content": "User must be able to log in with email and password",
        "attributes": {"priority": "high", "sprint": 3}
      },
      {
        "id": "feat-jwt",
        "type": "code_symbol",
        "content": "JWT authentication handler"
      }
    ],
    "edges": [
      {
        "from": "req-001",
        "to": "feat-jwt",
        "relation_type": "IMPLEMENTED_BY",
        "weight": 1.0
      }
    ]
  }'

Response:

{"ok": true, "nodes_sent": 2, "edges_sent": 1}

Memory Engine — Traverse Sub-graph

Walk the knowledge graph from a starting node up to depth hops. Returns all reachable nodes and the intra-subgraph edges:

curl -X POST http://localhost:8080/api/memory/project/traverse \
  -H 'Content-Type: application/json' \
  -d '{"start_id": "req-001", "depth": 2}'

Response:

{
  "nodes": [
    {"id": "req-001", "type": "requirement", "content": "User must be able to log in..."},
    {"id": "feat-jwt",  "type": "code_symbol", "content": "JWT authentication handler"}
  ],
  "edges": [
    {"from": "req-001", "to": "feat-jwt", "relation_type": "IMPLEMENTED_BY", "weight": 1}
  ],
  "count": 2
}

Memory Engine — Delete a Node

curl -X DELETE http://localhost:8080/api/memory/project/nodes/req-001

All RELATES_TO edges incident to the deleted node are removed automatically (DETACH DELETE semantics).

Memory Engine — Delete a Specific Edge

curl -X DELETE http://localhost:8080/api/memory/project/edges \
  -H 'Content-Type: application/json' \
  -d '{
    "from": "req-001",
    "to": "feat-jwt",
    "relation_type": "IMPLEMENTED_BY"
  }'

Only the matching RELATES_TO relationship is removed; both nodes survive and any other relation types between them remain intact.

Build Index

curl -X POST http://localhost:8080/api/indexes/test-index/build \
  -H 'Content-Type: application/json' \
  -d '{
    "texts": ["Hello world", "foo bar baz"],
    "metadata": {"source": "test"}
  }'

Multi-Index Search

Search across multiple indexes simultaneously. Results are merged by score:

# Search specific indexes
curl -X POST http://localhost:8080/api/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "authentication flow",
    "indexes": ["backend-code", "frontend-code", "docs"],
    "top_k": 10
  }'

# Search ALL available indexes
curl -X POST http://localhost:8080/api/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "authentication flow",
    "top_k": 10
  }'

Response:

{
  "results": [
    {
      "index": "backend-code",
      "text": "func HandleLogin...",
      "score": 0.92,
      "metadata": {"source": "auth/handler.go"},
      "graph_context": {
        "symbols": [{"name": "HandleLogin", "kind": "function"}]
      }
    },
    {
      "index": "docs",
      "text": "Authentication uses JWT tokens...",
      "score": 0.88,
      "metadata": {"source": "auth.md"},
      "document_context": {
        "vpath": "auth.md",
        "name": "Authentication Overview",
        "summary": "This document explains the JWT flow..."
      }
    }
  ],
  "count": 2,
  "query_ms": 45
}

CLI multi-search:

# Search specific indexes (comma-separated)
gleann search backend-code,frontend-code "authentication"

# Search all indexes
gleann search --all dummy "authentication"

CLI multi-index ask (conversations work across multiple indexes):

# Ask across multiple indexes
gleann ask docs,backend-code "How does authentication work?"

# Pipe input with multi-index
cat auth.go | gleann ask backend,frontend "Review this auth handler"

# Continue a multi-index conversation
gleann ask docs,code --continue-last "What about the error handling?"

# Use a role
gleann ask my-code "Explain this module" --role explain --format markdown

Webhooks

# Register a webhook
curl -X POST http://localhost:8080/api/webhooks \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://your-server.com/gleann-hook",
    "events": ["build_complete", "index_deleted"],
    "secret": "optional-hmac-secret"
  }'

# List registered webhooks
curl http://localhost:8080/api/webhooks

# Delete a webhook
curl -X DELETE http://localhost:8080/api/webhooks \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://your-server.com/gleann-hook"}'

Webhook payload (POST to your URL):

{
  "event": "build_complete",
  "index": "my-code",
  "count": 1250,
  "buildMs": 3200,
  "timestamp": "2026-03-10T14:30:00Z"
}

If a secret is configured, payloads include an X-Gleann-Signature header with HMAC-SHA256 signature: sha256=<hex>.

Supported events: build_complete, index_deleted, * (all events).

Metrics

Prometheus-compatible metrics endpoint:

curl http://localhost:8080/metrics

Response (text/plain, Prometheus exposition format):

# HELP gleann_up Whether the gleann server is running.
# TYPE gleann_up gauge
gleann_up 1

# HELP gleann_search_requests_total Total search requests.
# TYPE gleann_search_requests_total counter
gleann_search_requests_total 42

# HELP gleann_search_latency_avg_ms Average search latency in milliseconds.
# TYPE gleann_search_latency_avg_ms gauge
gleann_search_latency_avg_ms 23.50

# HELP gleann_multi_search_requests_total Total multi-index search requests.
# TYPE gleann_multi_search_requests_total counter
gleann_multi_search_requests_total 5

# HELP gleann_cached_searchers Number of cached searcher instances.
# TYPE gleann_cached_searchers gauge
gleann_cached_searchers 3

Available metrics: gleann_up, gleann_uptime_seconds, gleann_search_requests_total, gleann_search_errors_total, gleann_search_latency_avg_ms, gleann_multi_search_requests_total, gleann_build_requests_total, gleann_build_errors_total, gleann_build_latency_avg_ms, gleann_ask_requests_total, gleann_delete_requests_total, gleann_webhooks_fired_total, gleann_cached_searchers.

Grafana / Prometheus integration: Point your Prometheus scraper at http://<host>:8080/metrics.

Conversations

Manage saved conversation history:

# List all conversations
curl http://localhost:8080/api/conversations

# Get a specific conversation by ID (full or prefix)
curl http://localhost:8080/api/conversations/a1b2c3d4

# Delete a conversation
curl -X DELETE http://localhost:8080/api/conversations/a1b2c3d4

# Ask with a role
curl -X POST http://localhost:8080/api/indexes/my-code/ask \
  -H 'Content-Type: application/json' \
  -d '{"question": "Review this code", "role": "code"}'

# Continue an existing conversation
curl -X POST http://localhost:8080/api/indexes/my-code/ask \
  -H 'Content-Type: application/json' \
  -d '{"question": "What about error handling?", "conversation_id": "a1b2c3d4..."}'

CORS

All endpoints include CORS headers for cross-origin access:

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, DELETE, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization

Error Responses

All errors return JSON with a single error field:

{
  "error": "index \"foo\" not found: open .../foo/meta.json: no such file or directory"
}

Common HTTP status codes:

Code	Meaning
200	Success
400	Bad request (missing required fields)
404	Index or graph not found
429	Rate limit exceeded (per-IP token bucket; see `GLEANN_RATE_LIMIT`) — includes `Retry-After: 1` header
500	Internal server error
503	Feature unavailable (e.g., graph without treesitter build tag)
504	Gateway timeout — request exceeded its deadline (see `GLEANN_TIMEOUT_*_S` env vars)

Rate Limiting

The server applies per-IP token-bucket rate limiting (default: 60 req/s sustained, 120 burst). The /health and /metrics endpoints are exempt. Configure via GLEANN_RATE_LIMIT and GLEANN_RATE_BURST environment variables.

Request Timeouts

Each endpoint has a context deadline based on its path:

Endpoint pattern	Default timeout	Env var
`*/ask`, `/v1/chat/completions`	5 minutes	`GLEANN_TIMEOUT_ASK_S`
`*/search`	30 seconds	`GLEANN_TIMEOUT_SEARCH_S`
`*/build`	10 minutes	`GLEANN_TIMEOUT_BUILD_S`
All others	60 seconds	`GLEANN_TIMEOUT_DEFAULT_S`

SSE streaming endpoints (?stream=true or Accept: text/event-stream) bypass the timeout middleware; they rely on client disconnect detection instead.

Examples: Memory Blocks

Store a fact

curl -X POST http://localhost:8080/api/blocks \
  -H 'Content-Type: application/json' \
  -d '{
    "content": "Project uses hexagonal architecture",
    "tier": "long",
    "tags": ["convention", "architecture"],
    "label": "project_fact"
  }'

Store a scoped block (conversation-isolated)

curl -X POST http://localhost:8080/api/blocks \
  -H 'Content-Type: application/json' \
  -d '{
    "content": "User asked about deployment strategies",
    "tier": "medium",
    "scope": "conv-abc123",
    "label": "conversation_note"
  }'

Store with character limit

curl -X POST http://localhost:8080/api/blocks \
  -H 'Content-Type: application/json' \
  -d '{
    "content": "Running notes that may grow over time...",
    "tier": "long",
    "char_limit": 2000,
    "label": "rolling_notes"
  }'

List all blocks

# All tiers
curl http://localhost:8080/api/blocks

# Only long-term memories
curl http://localhost:8080/api/blocks?tier=long

# Scoped to a conversation (global + conversation-specific)
curl 'http://localhost:8080/api/blocks?scope=conv-abc123'

Search blocks

# All scopes
curl 'http://localhost:8080/api/blocks/search?q=architecture'

# Scoped search
curl 'http://localhost:8080/api/blocks/search?q=architecture&scope=conv-abc123'

Show compiled context (what the LLM receives)

# Global context
curl http://localhost:8080/api/blocks/context

# Conversation-scoped context
curl 'http://localhost:8080/api/blocks/context?scope=conv-abc123'

# XML format
curl 'http://localhost:8080/api/blocks/context?format=xml'

Delete a specific block

curl -X DELETE http://localhost:8080/api/blocks/abc123

Examples: OpenAI-Compatible Proxy

Use any OpenAI-compatible client with gleann as the backend:

# RAG-augmented: model = "gleann/<index-name>"
curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gleann/my-code",
    "messages": [{"role": "user", "content": "How does auth work?"}],
    "stream": false
  }'

# Custom RAG parameters via headers
curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-Gleann-Top-K: 15' \
  -d '{
    "model": "gleann/my-docs",
    "messages": [{"role": "user", "content": "Summarize the architecture"}]
  }'

# List available indexes as OpenAI models
curl http://localhost:8080/v1/models

Python (OpenAI SDK):

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
response = client.chat.completions.create(
    model="gleann/my-code",
    messages=[{"role": "user", "content": "Explain the auth flow"}],
)
print(response.choices[0].message.content)

FilesExpand file tree

api.md

Latest commit

History

api.md

File metadata and controls

Gleann REST API Reference

Quick Start

Interactive Documentation

Endpoints

Health

Index Management

Search & RAG

Code Graph (requires treesitter build)

Memory Engine (Generic Knowledge Graph)

Webhooks

Metrics

Conversations

Memory Blocks (Long-term Memory)

OpenAI-Compatible Proxy

A2A Protocol (Agent-to-Agent)

Unified Memory API

Ingest Request Fields

Recall Request Fields

Unified Memory Examples

Background Tasks

Examples

Search

Ask (RAG)

Ask with SSE Streaming

Graph Query — Callers

Graph Query — Impact Analysis

Memory Engine — Inject Knowledge

Memory Engine — Traverse Sub-graph

Memory Engine — Delete a Node

Memory Engine — Delete a Specific Edge

Build Index

Multi-Index Search

Webhooks

Metrics

Conversations

CORS

Error Responses

Rate Limiting

Request Timeouts

Examples: Memory Blocks

Store a fact

Store a scoped block (conversation-isolated)

Store with character limit

List all blocks

Search blocks

Show compiled context (what the LLM receives)

Delete a specific block

Examples: OpenAI-Compatible Proxy