Skip to content

Latest commit

 

History

History
818 lines (639 loc) · 23.7 KB

File metadata and controls

818 lines (639 loc) · 23.7 KB

Gleann REST API Reference

Gleann exposes a REST API when running in server mode (gleann serve). The API provides endpoints for index management, semantic search, RAG-based Q&A, and code graph queries.

Quick Start

# Start the server
gleann serve --port 8080

# Open API documentation in browser
open http://localhost:8080/api/docs

# Download the OpenAPI spec
curl http://localhost:8080/api/openapi.json

Interactive Documentation

When the server is running, interactive Swagger UI documentation is available at:

  • Swagger UI: GET /api/docs
  • OpenAPI 3.0 JSON: GET /api/openapi.json

Endpoints

Health

Method Path Description
GET /health Health check

Index Management

Method Path Description
GET /api/indexes List all indexes
GET /api/indexes/{name} Get index metadata
POST /api/indexes/{name}/build Build index from texts/items
DELETE /api/indexes/{name} Delete an index

Search & RAG

Method Path Description
POST /api/indexes/{name}/search Semantic/hybrid search
POST /api/indexes/{name}/ask RAG-based Q&A
POST /api/search Multi-index search

Code Graph (requires treesitter build)

Method Path Description
GET /api/graph/{name} Graph statistics
POST /api/graph/{name}/query Query the code graph
POST /api/graph/{name}/index Trigger AST graph indexing

Memory Engine (Generic Knowledge Graph)

The Memory Engine exposes a generic Entity / RELATES_TO graph that external AI agents can read from and write to without coupling to gleann's internal RAG pipeline. Each {name} corresponds to an independent KuzuDB store under <index-dir>/<name>_memory/.

Method Path Description
POST /api/memory/{name}/inject Atomically upsert nodes and edges
DELETE /api/memory/{name}/nodes/{id} Delete an entity and its incident edges
DELETE /api/memory/{name}/edges Delete a specific relationship
POST /api/memory/{name}/traverse Walk the graph N hops from a start node

Webhooks

Method Path Description
GET /api/webhooks List registered webhooks
POST /api/webhooks Register a webhook
DELETE /api/webhooks Delete a webhook by URL

Metrics

Method Path Description
GET /metrics Prometheus-compatible metrics

Conversations

Method Path Description
GET /api/conversations List saved conversations
GET /api/conversations/{id} Get conversation by ID
DELETE /api/conversations/{id} Delete a conversation

Memory Blocks (Long-term Memory)

Hierarchical BBolt memory store with three tiers (short, medium, long). Blocks stored here are automatically injected into every LLM query as system context. Supports scoped isolation (e.g. per-conversation) and character limits.

Method Path Description
GET /api/blocks List memory blocks (optional ?tier=short|medium|long, ?scope=)
POST /api/blocks Store a new memory block (supports char_limit, scope fields)
DELETE /api/blocks Clear blocks (optional ?tier= filter)
DELETE /api/blocks/{id} Delete a specific block by ID
GET /api/blocks/search?q= Full-text search across all tiers (optional ?scope=)
GET /api/blocks/context Compiled memory context (optional ?scope=, ?format=xml)
GET /api/blocks/stats Storage statistics per tier

OpenAI-Compatible Proxy

Use gleann indexes as if they were OpenAI models. Compatible with any tool that speaks the OpenAI chat completions API.

Method Path Description
GET /v1/models List indexes as OpenAI-compatible model objects
POST /v1/chat/completions Chat completions with automatic RAG injection

Model naming: "gleann/<index-name>" for RAG-augmented answers, "gleann/" for pure LLM pass-through.

Custom headers: X-Gleann-Top-K (RAG result count), X-Gleann-Min-Score (score threshold).

A2A Protocol (Agent-to-Agent)

Google's Agent-to-Agent protocol for agent discovery and inter-agent communication. Enabled by default; set GLEANN_A2A_ENABLED=false to disable.

Method Path Description
GET /.well-known/agent-card.json A2A Agent Card (discovery)
POST /a2a/v1/message:send Send a message to an A2A skill
GET /a2a/v1/tasks/{id} Get task status by ID

Built-in skills: semantic-search, ask-rag, code-analysis, memory-management.

Unified Memory API

Orchestrates all memory layers (block storage, knowledge graph, vector search) through a single interface. Simplifies agent integration by eliminating the need to call individual memory APIs.

Method Path Description
POST /api/memory/ingest Store facts + relationships across memory layers
POST /api/memory/recall Query all memory layers in parallel

Ingest Request Fields

Field Type Description
facts[].content string Required. The fact text to store
facts[].tags string[] Searchable tags
facts[].tier string short (default), medium, or long
facts[].metadata object Arbitrary key-value metadata (e.g. {"source_file": "auth.go"})
facts[].expires_in string TTL as Go duration (1h, 7d, 2w)
facts[].char_limit int Per-block character limit
relationships[].from string Source entity ID
relationships[].to string Target entity ID
relationships[].relation string Relation type (e.g. DEPENDS_ON, IMPLEMENTS)
relationships[].weight float Edge importance (default: 1.0)
relationships[].attributes object Edge metadata
scope string Isolate facts to a conversation/agent

Recall Request Fields

Field Type Description
query string Required. Search text
layers string[] blocks, graph, vector (default: all)
top_k int Max results per layer (default: 5)
tier string Filter blocks by tier
tags string[] Filter blocks by tags (AND logic)
after string Filter blocks created after (RFC3339 or duration like 24h, 7d)
before string Filter blocks created before
relations string[] Filter graph edges by relation types
scope string Filter blocks by scope
depth int Graph traversal depth (default: 2)
format string json (default) or context (LLM-ready XML)

Unified Memory Examples

# Store a fact with metadata and TTL
curl -X POST http://localhost:8080/api/memory/ingest \
  -H 'Content-Type: application/json' \
  -d '{
    "facts": [{
      "content": "Auth service uses JWT tokens with RS256 signing",
      "tags": ["auth", "security", "jwt"],
      "metadata": {"source_file": "auth/jwt.go", "confidence": "high"},
      "tier": "long",
      "expires_in": "90d"
    }],
    "relationships": [{
      "from": "AuthService",
      "to": "JWTValidator",
      "relation": "DEPENDS_ON",
      "weight": 0.9
    }]
  }'

# Project-scoped ingest (sets scope="project:myapp" + default index)
curl -X POST http://localhost:8080/api/memory/ingest \
  -H 'Content-Type: application/json' \
  -d '{
    "project": "myapp",
    "facts": [{"content": "The API uses rate limiting at 100 req/s"}],
    "relationships": [{"from": "Gateway", "to": "RateLimiter", "relation": "USES"}]
  }'

# Recall facts from the last 7 days, filtered by tag
curl -X POST http://localhost:8080/api/memory/recall \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "authentication",
    "layers": ["blocks"],
    "tags": ["security"],
    "after": "7d",
    "tier": "long"
  }'

# Project-scoped recall (searches blocks with scope + vector/graph in index)
curl -X POST http://localhost:8080/api/memory/recall \
  -H 'Content-Type: application/json' \
  -d '{
    "project": "myapp",
    "query": "rate limiting",
    "format": "context"
  }'

# Full multi-layer recall in LLM-ready format
curl -X POST http://localhost:8080/api/memory/recall \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "how does auth work",
    "format": "context",
    "top_k": 10
  }'
# Returns XML with <facts>, <relationships>, <relevant_documents>

Background Tasks

Monitor and manage long-running background operations (indexing, memory consolidation, health checks).

Method Path Description
GET /api/tasks List background tasks (optional ?status= filter)
GET /api/tasks/{id} Get task status by ID
DELETE /api/tasks Cleanup completed/failed tasks older than 1 hour

Examples

Search

curl -X POST http://localhost:8080/api/indexes/my-code/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "how does authentication work",
    "top_k": 5,
    "hybrid_alpha": 0.7,
    "graph_context": true
  }'

Ask (RAG)

curl -X POST http://localhost:8080/api/indexes/my-code/ask \
  -H 'Content-Type: application/json' \
  -d '{
    "question": "How is the user session managed?",
    "top_k": 10
  }'

Ask request fields:

Field Type Default Description
question string (required) Question to answer using RAG
top_k integer 10 Number of context passages to retrieve
llm_model string LLM model name
llm_provider string LLM provider (ollama, openai, anthropic)
system_prompt string Custom system prompt for the LLM
role string Named role (e.g. code, shell). Resolves to a system prompt from the role registry.
conversation_id string Continue an existing conversation by ID. Restores message history.
stream boolean false Enable SSE streaming

Ask with SSE Streaming

Stream tokens in real-time via Server-Sent Events:

curl -N -X POST http://localhost:8080/api/indexes/my-code/ask \
  -H 'Content-Type: application/json' \
  -d '{
    "question": "Explain the authentication flow",
    "stream": true
  }'

Or use the query parameter:

curl -N -X POST 'http://localhost:8080/api/indexes/my-code/ask?stream=true' \
  -H 'Content-Type: application/json' \
  -d '{"question": "Explain the authentication flow"}'

Response format (Content-Type: text/event-stream):

data: {"token":"The"}

data: {"token":" authentication"}

data: {"token":" flow"}

data: {"token":" works by..."}

data: [DONE]

Each data: line contains a JSON object with a token field. The stream ends with data: [DONE]. If an error occurs mid-stream, it sends data: {"error": "..."} before [DONE].

JavaScript example:

const response = await fetch('/api/indexes/my-code/ask', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ question: 'How does auth work?', stream: true })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value);
  for (const line of text.split('\n')) {
    if (line.startsWith('data: ') && line !== 'data: [DONE]') {
      const { token } = JSON.parse(line.slice(6));
      process.stdout.write(token); // or append to UI
    }
  }
}

Graph Query — Callers

curl -X POST http://localhost:8080/api/graph/my-code/query \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "callers",
    "symbol": "github.com/foo/bar.HandleLogin"
  }'

Graph Query — Impact Analysis

curl -X POST http://localhost:8080/api/graph/my-code/query \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "impact",
    "symbol": "github.com/foo/bar.UserService",
    "max_depth": 3
  }'

Memory Engine — Inject Knowledge

Atomically upsert Entity nodes and RELATES_TO edges. The operation is idempotent (re-submitting the same payload creates no duplicates):

curl -X POST http://localhost:8080/api/memory/project/inject \
  -H 'Content-Type: application/json' \
  -d '{
    "nodes": [
      {
        "id": "req-001",
        "type": "requirement",
        "content": "User must be able to log in with email and password",
        "attributes": {"priority": "high", "sprint": 3}
      },
      {
        "id": "feat-jwt",
        "type": "code_symbol",
        "content": "JWT authentication handler"
      }
    ],
    "edges": [
      {
        "from": "req-001",
        "to": "feat-jwt",
        "relation_type": "IMPLEMENTED_BY",
        "weight": 1.0
      }
    ]
  }'

Response:

{"ok": true, "nodes_sent": 2, "edges_sent": 1}

Memory Engine — Traverse Sub-graph

Walk the knowledge graph from a starting node up to depth hops. Returns all reachable nodes and the intra-subgraph edges:

curl -X POST http://localhost:8080/api/memory/project/traverse \
  -H 'Content-Type: application/json' \
  -d '{"start_id": "req-001", "depth": 2}'

Response:

{
  "nodes": [
    {"id": "req-001", "type": "requirement", "content": "User must be able to log in..."},
    {"id": "feat-jwt",  "type": "code_symbol", "content": "JWT authentication handler"}
  ],
  "edges": [
    {"from": "req-001", "to": "feat-jwt", "relation_type": "IMPLEMENTED_BY", "weight": 1}
  ],
  "count": 2
}

Memory Engine — Delete a Node

curl -X DELETE http://localhost:8080/api/memory/project/nodes/req-001

All RELATES_TO edges incident to the deleted node are removed automatically (DETACH DELETE semantics).

Memory Engine — Delete a Specific Edge

curl -X DELETE http://localhost:8080/api/memory/project/edges \
  -H 'Content-Type: application/json' \
  -d '{
    "from": "req-001",
    "to": "feat-jwt",
    "relation_type": "IMPLEMENTED_BY"
  }'

Only the matching RELATES_TO relationship is removed; both nodes survive and any other relation types between them remain intact.

Build Index

curl -X POST http://localhost:8080/api/indexes/test-index/build \
  -H 'Content-Type: application/json' \
  -d '{
    "texts": ["Hello world", "foo bar baz"],
    "metadata": {"source": "test"}
  }'

Multi-Index Search

Search across multiple indexes simultaneously. Results are merged by score:

# Search specific indexes
curl -X POST http://localhost:8080/api/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "authentication flow",
    "indexes": ["backend-code", "frontend-code", "docs"],
    "top_k": 10
  }'

# Search ALL available indexes
curl -X POST http://localhost:8080/api/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "authentication flow",
    "top_k": 10
  }'

Response:

{
  "results": [
    {
      "index": "backend-code",
      "text": "func HandleLogin...",
      "score": 0.92,
      "metadata": {"source": "auth/handler.go"},
      "graph_context": {
        "symbols": [{"name": "HandleLogin", "kind": "function"}]
      }
    },
    {
      "index": "docs",
      "text": "Authentication uses JWT tokens...",
      "score": 0.88,
      "metadata": {"source": "auth.md"},
      "document_context": {
        "vpath": "auth.md",
        "name": "Authentication Overview",
        "summary": "This document explains the JWT flow..."
      }
    }
  ],
  "count": 2,
  "query_ms": 45
}

CLI multi-search:

# Search specific indexes (comma-separated)
gleann search backend-code,frontend-code "authentication"

# Search all indexes
gleann search --all dummy "authentication"

CLI multi-index ask (conversations work across multiple indexes):

# Ask across multiple indexes
gleann ask docs,backend-code "How does authentication work?"

# Pipe input with multi-index
cat auth.go | gleann ask backend,frontend "Review this auth handler"

# Continue a multi-index conversation
gleann ask docs,code --continue-last "What about the error handling?"

# Use a role
gleann ask my-code "Explain this module" --role explain --format markdown

Webhooks

Register a webhook to receive POST notifications for events:

# Register a webhook
curl -X POST http://localhost:8080/api/webhooks \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://your-server.com/gleann-hook",
    "events": ["build_complete", "index_deleted"],
    "secret": "optional-hmac-secret"
  }'

# List registered webhooks
curl http://localhost:8080/api/webhooks

# Delete a webhook
curl -X DELETE http://localhost:8080/api/webhooks \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://your-server.com/gleann-hook"}'

Webhook payload (POST to your URL):

{
  "event": "build_complete",
  "index": "my-code",
  "count": 1250,
  "buildMs": 3200,
  "timestamp": "2026-03-10T14:30:00Z"
}

If a secret is configured, payloads include an X-Gleann-Signature header with HMAC-SHA256 signature: sha256=<hex>.

Supported events: build_complete, index_deleted, * (all events).

Metrics

Prometheus-compatible metrics endpoint:

curl http://localhost:8080/metrics

Response (text/plain, Prometheus exposition format):

# HELP gleann_up Whether the gleann server is running.
# TYPE gleann_up gauge
gleann_up 1

# HELP gleann_search_requests_total Total search requests.
# TYPE gleann_search_requests_total counter
gleann_search_requests_total 42

# HELP gleann_search_latency_avg_ms Average search latency in milliseconds.
# TYPE gleann_search_latency_avg_ms gauge
gleann_search_latency_avg_ms 23.50

# HELP gleann_multi_search_requests_total Total multi-index search requests.
# TYPE gleann_multi_search_requests_total counter
gleann_multi_search_requests_total 5

# HELP gleann_cached_searchers Number of cached searcher instances.
# TYPE gleann_cached_searchers gauge
gleann_cached_searchers 3

Available metrics: gleann_up, gleann_uptime_seconds, gleann_search_requests_total, gleann_search_errors_total, gleann_search_latency_avg_ms, gleann_multi_search_requests_total, gleann_build_requests_total, gleann_build_errors_total, gleann_build_latency_avg_ms, gleann_ask_requests_total, gleann_delete_requests_total, gleann_webhooks_fired_total, gleann_cached_searchers.

Grafana / Prometheus integration: Point your Prometheus scraper at http://<host>:8080/metrics.

Conversations

Manage saved conversation history:

# List all conversations
curl http://localhost:8080/api/conversations

# Get a specific conversation by ID (full or prefix)
curl http://localhost:8080/api/conversations/a1b2c3d4

# Delete a conversation
curl -X DELETE http://localhost:8080/api/conversations/a1b2c3d4

# Ask with a role
curl -X POST http://localhost:8080/api/indexes/my-code/ask \
  -H 'Content-Type: application/json' \
  -d '{"question": "Review this code", "role": "code"}'

# Continue an existing conversation
curl -X POST http://localhost:8080/api/indexes/my-code/ask \
  -H 'Content-Type: application/json' \
  -d '{"question": "What about error handling?", "conversation_id": "a1b2c3d4..."}'

CORS

All endpoints include CORS headers for cross-origin access:

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, DELETE, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization

Error Responses

All errors return JSON with a single error field:

{
  "error": "index \"foo\" not found: open .../foo/meta.json: no such file or directory"
}

Common HTTP status codes:

Code Meaning
200 Success
400 Bad request (missing required fields)
404 Index or graph not found
429 Rate limit exceeded (per-IP token bucket; see GLEANN_RATE_LIMIT) — includes Retry-After: 1 header
500 Internal server error
503 Feature unavailable (e.g., graph without treesitter build tag)
504 Gateway timeout — request exceeded its deadline (see GLEANN_TIMEOUT_*_S env vars)

Rate Limiting

The server applies per-IP token-bucket rate limiting (default: 60 req/s sustained, 120 burst). The /health and /metrics endpoints are exempt. Configure via GLEANN_RATE_LIMIT and GLEANN_RATE_BURST environment variables.

Request Timeouts

Each endpoint has a context deadline based on its path:

Endpoint pattern Default timeout Env var
*/ask, /v1/chat/completions 5 minutes GLEANN_TIMEOUT_ASK_S
*/search 30 seconds GLEANN_TIMEOUT_SEARCH_S
*/build 10 minutes GLEANN_TIMEOUT_BUILD_S
All others 60 seconds GLEANN_TIMEOUT_DEFAULT_S

SSE streaming endpoints (?stream=true or Accept: text/event-stream) bypass the timeout middleware; they rely on client disconnect detection instead.

Examples: Memory Blocks

Store a fact

curl -X POST http://localhost:8080/api/blocks \
  -H 'Content-Type: application/json' \
  -d '{
    "content": "Project uses hexagonal architecture",
    "tier": "long",
    "tags": ["convention", "architecture"],
    "label": "project_fact"
  }'

Store a scoped block (conversation-isolated)

curl -X POST http://localhost:8080/api/blocks \
  -H 'Content-Type: application/json' \
  -d '{
    "content": "User asked about deployment strategies",
    "tier": "medium",
    "scope": "conv-abc123",
    "label": "conversation_note"
  }'

Store with character limit

curl -X POST http://localhost:8080/api/blocks \
  -H 'Content-Type: application/json' \
  -d '{
    "content": "Running notes that may grow over time...",
    "tier": "long",
    "char_limit": 2000,
    "label": "rolling_notes"
  }'

List all blocks

# All tiers
curl http://localhost:8080/api/blocks

# Only long-term memories
curl http://localhost:8080/api/blocks?tier=long

# Scoped to a conversation (global + conversation-specific)
curl 'http://localhost:8080/api/blocks?scope=conv-abc123'

Search blocks

# All scopes
curl 'http://localhost:8080/api/blocks/search?q=architecture'

# Scoped search
curl 'http://localhost:8080/api/blocks/search?q=architecture&scope=conv-abc123'

Show compiled context (what the LLM receives)

# Global context
curl http://localhost:8080/api/blocks/context

# Conversation-scoped context
curl 'http://localhost:8080/api/blocks/context?scope=conv-abc123'

# XML format
curl 'http://localhost:8080/api/blocks/context?format=xml'

Delete a specific block

curl -X DELETE http://localhost:8080/api/blocks/abc123

Examples: OpenAI-Compatible Proxy

Use any OpenAI-compatible client with gleann as the backend:

# RAG-augmented: model = "gleann/<index-name>"
curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gleann/my-code",
    "messages": [{"role": "user", "content": "How does auth work?"}],
    "stream": false
  }'

# Custom RAG parameters via headers
curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-Gleann-Top-K: 15' \
  -d '{
    "model": "gleann/my-docs",
    "messages": [{"role": "user", "content": "Summarize the architecture"}]
  }'

# List available indexes as OpenAI models
curl http://localhost:8080/v1/models

Python (OpenAI SDK):

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
response = client.chat.completions.create(
    model="gleann/my-code",
    messages=[{"role": "user", "content": "Explain the auth flow"}],
)
print(response.choices[0].message.content)