Skip to content

examples: add RLM (Recursive Language Model) demo#1778

Draft
hyprh wants to merge 3 commits into
mainfrom
hyprh/examples-rlm-recursive-language-model
Draft

examples: add RLM (Recursive Language Model) demo#1778
hyprh wants to merge 3 commits into
mainfrom
hyprh/examples-rlm-recursive-language-model

Conversation

@hyprh
Copy link
Copy Markdown
Contributor

@hyprh hyprh commented May 11, 2026

Summary

Implement the RLM paper (arXiv:2512.24601) as a Go example demonstrating how LLMs can process very large documents via recursive code-driven decomposition.

Architecture

Root Agent (depth=0, ~2.2M chars loaded as REPL context)
├── Explore context structure with Starlark
├── Decompose → rlm_query_batched() spawns child agents
│   ├── Child Agent (depth=1, <=60K char context)
│   │   └── Analyze directly via llm_query / llm_query_batched()
│   └── Child Agent (depth=1, larger logical scope)
│       └── Further decompose into smaller child contexts
└── Aggregate child results → final_answer
  • ReAct Agent Loop — each node is an autonomous agent with execute_code + final_answer tools
  • HTTP Service — centralized LLM proxy and recursive RLM orchestrator with rate limiting
  • Starlark REPL — embedded Python-subset interpreter for LLM-generated code, with the full external context kept outside the prompt as context
  • Symbolic Recursionrlm_query() spawns child RLM instances via HTTP
  • Concurrent Batchingrlm_query_batched / llm_query_batched for parallel execution
  • Rate Limiting — token-bucket limiter (default 20 QPM) with exponential backoff retry
  • Runtime Guardrails — 8 KiB tool-output truncation with notice, 30K direct LLM prompt limit, 60K child-context limit, and max 10 child agents per RLM batch
  • Model Call Timeout — each model request is bounded to 2 minutes to avoid hanging forever on an unreachable gateway

File Structure

File Responsibility
simple/main.go CLI entry point, auto-clones demo repo, starts service
simple/service.go HTTP server: LLM proxy + recursive RLM orchestration
simple/rlm.go ReAct agent loop: thought → action → observation
simple/repl.go Starlark REPL with builtins (context, llm_query, rlm_query, ...)
simple/prompt.go Dynamic system prompt and runtime guidance
simple/tools.go Tool definitions: execute_code, final_answer, output limiting
simple/ratelimit.go Token-bucket rate limiter
README.md Usage docs, architecture overview, and guardrails

Demo

The example auto-clones EbookFoundation/free-programming-books, currently loading about 2.2M chars across 200+ markdown files, and identifies outdated/deprecated content:

cd examples/rlm
export OPENAI_API_KEY="..." OPENAI_BASE_URL="..." MODEL_NAME="..."
go run ./simple/ 2>rlm-run.log

Key Design Decisions

  1. ReAct pattern with tool calling instead of a hardcoded processing loop
  2. LLM decides decomposition strategy based on context size and content
  3. Full document context stays external in the REPL, not in every model prompt
  4. rlm_query is called from Starlark so child contexts are explicit slices produced by code
  5. Root query is propagated to all sub-agents for global task awareness
  6. Guardrails reject oversized direct prompts/child contexts and truncate excessive tool output with a visible notice
  7. HTTP recursion keeps a long timeout, while each model request has a shorter timeout for bad gateway/network cases

Validation

  • cd examples/rlm && go test ./...
  • Manual run with an OpenAI-compatible Venus endpoint verified the demo progresses past the first LLM call and exercises child-agent fan-out under the new context limits.

Implement the RLM paper (arXiv:2512.24601) in Go using:
- HTTP service architecture for centralized LLM proxying and recursive orchestration
- Starlark (Python subset) REPL for LLM-generated code execution
- Symbolic recursion via rlm_query spawning child RLM instances
- Concurrent batch operations (llm_query_batched, rlm_query_batched)
- Depth/iteration budget awareness in prompts for LLM self-planning

WIP: needs integration testing with real LLM endpoints.
Co-authored-by: Cursor <[email protected]>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 905f1668-c024-44e3-aa95-e8dfe314584c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hyprh/examples-rlm-recursive-language-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hyprh hyprh changed the title [WIP] examples/rlm: Recursive Language Model with Starlark REPL [WIP] examples: add RLM (Recursive Language Model) demo May 11, 2026
// Detach from HTTP request lifecycle — recursive RLM calls can be long-running.
ctx := context.WithoutCancel(r.Context())

if req.Depth >= s.maxDepth {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxDepth=1 never runs a child RLM because Depth reaches 1 before this check. Use req.Depth > s.maxDepth here so one configured recursive level executes.

中文 `maxDepth=1` 不会运行子级 RLM,因为检查前 `Depth` 已经变成 1。这里改为 `req.Depth > s.maxDepth`,让配置的一级递归实际执行。

@hyprh hyprh changed the title [WIP] examples: add RLM (Recursive Language Model) demo [WIP] examples: add RLM demo May 11, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.77838%. Comparing base (2343bf5) to head (9c3b508).
⚠️ Report is 14 commits behind head on main.

Additional details and impacted files
@@                 Coverage Diff                 @@
##                main       #1778         +/-   ##
===================================================
+ Coverage   89.77574%   89.77838%   +0.00263%     
===================================================
  Files            936         936                 
  Lines         151610      151610                 
===================================================
+ Hits          136109      136113          +4     
+ Misses          9770        9769          -1     
+ Partials        5731        5728          -3     
Flag Coverage Δ
unittests 89.77838% <ø> (+0.00263%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Major refactoring of the RLM example:
- Restructure into examples/rlm/simple/ subdirectory
- Refactor iterative loop to ReAct agent pattern (tool calling)
- Add rate limiter (token-bucket, default 20 QPM)
- Add exponential backoff retry for 429 errors
- Remove direct rlm_query tool; enforce Starlark-based delegation
- Enrich sub-agent context with RootQuery and structured metadata
- Auto-clone EbookFoundation/free-programming-books as demo data
- Add README with architecture docs and usage instructions
- Fix HTTP client timeout (5min -> 30min) for long-running sub-agents

Addresses review feedback:
- maxDepth boundary check is now handled via canRecurse in prompt/tools
  (depth < maxDepth), not as a hard guard in service handler

Co-authored-by: Cursor <[email protected]>
@hyprh hyprh changed the title [WIP] examples: add RLM demo examples: add RLM (Recursive Language Model) demo May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants