Plugin development for LiteLLM stats - token usage and context window info

Tried to build a plugin for A0 v.1.3 which injects information from LiteLLM into the header. What we would like to see is the usual - number of tokens processed, size of context window and percentage of usage. This LiteLLM Info should be updated, at least after every prompt or response (not necessarily in realtime). However, we hit brick walls, whatever we tried.

## Summary

Two approaches were attempted to create a plugin that displays token usage and context window information in the A0 header bar. **Neither approach succeeded.** This report documents the attempts, failures, and missing developer information that would be needed to build this or similar plugins.

---

## Approach 1: Core Modifications [rejected]

### Goal
Extract `response.usage` (prompt_tokens, completion_tokens, total_tokens) from LiteLLM API responses and display them in the header.

### What was done
1. Modified `models.py`:
   - Added `_extract_usage()` function to parse LiteLLM response objects
   - Added `usage` attribute to `ChatGenerationResult`
   - Changed `unified_call()` return type from `Tuple[str, str]` to `Tuple[str, str, dict | None]`
   - Tracked usage in both streaming and non-streaming paths
2. Modified `agent.py`:
   - Changed `call_chat_model` and `call_utility_model` to unpack 3 return values
   - Passed `usage` to `chat_model_call_after` extension
3. Updated all callers of `unified_call()` across the codebase (16 changes in 5 files)

### Result
- Token data was correctly extracted and displayed in the header
- **A0 broke**: `ValueError: Tool request must be a dictionary` — the agent could no longer process any messages
- Root cause: Changing `unified_call()` return signature from 2 to 3 values broke A0's response processing in ways that were not immediately visible in the code

### Conclusion
Core modifications to `unified_call()` are too invasive. Changing a fundamental return signature affects the entire framework and is fragile against future A0 updates.

---

## Approach 2: Isolated Plugin [no core modifications]

### Goal
Use A0's existing `agent.get_data("ctx_window")` (which stores approximate token count from `prepare_prompt()`) and `get_chat_model_config()` (which provides `ctx_length`) to display token usage without modifying any core files.

### What was done
1. Created API handler (`api/token_info.py`) that reads from `agent.get_data(Agent.DATA_NAME_CTX_WINDOW)`
2. Fixed agent lookup bug: API now checks both `context.agent0` and `context.streaming_agent`
3. Created Alpine store (`webui/token-store.js`) that polls the API every 2 seconds
4. Created WebUI widget (`webui/token-widget.html`) with badge display
5. Created WebUI extension (`extensions/webui/chat-top-start/token-display.html`)
6. Fixed path issues: User plugins must use `/usr/plugins/...` paths (not `/plugins/...`)

### Result
- The plugin loads and displays data in the header
- The API returns a 200 response with token data
- **Token data never updates**: Always shows the same value (53,859 tokens / 262,144 ctx_length / 20.5%)
- Debug logging in `prepare_prompt()` never produces output — the function is not called during normal chat operation

### Investigation
- Added `print()` with `flush=True` to `communicate()`, `monologue()`, and `prepare_prompt()` in `agent.py`
- None of these debug statements produced output in `docker logs`
- This means A0 uses a different code path for chat processing that does not go through `communicate → _process_chain → monologue → prepare_prompt`
- Without understanding which code path A0 uses, it's impossible to know when/where token data is updated

### Conclusion
The isolated plugin approach works structurally (API, store, widget all function), but cannot display useful data because:
1. The data source (`ctx_window`) is only updated by `prepare_prompt()`
2. `prepare_prompt()` is not called during normal chat operation (for unknown reasons)
3. Without access to the actual chat processing code path, there's no way to get live token data

---

## Missing Developer Information

The following information would be needed to develop this plugin (and similar plugins) for Agent Zero:

### 1. Chat Processing Code Path
**Question:** What is the exact call chain when a user sends a message and A0 responds?
- `communicate()` → `_process_chain()` → `monologue()` → `prepare_prompt()` was assumed but debug output never appeared
- Is there a different entry point for chat messages?
- Does the WebUI use a different mechanism (WebSocket, direct API call) that bypasses `communicate()`?

### 2. Token Data Availability
**Question:** When and where are token counts calculated and stored?
- `prepare_prompt()` stores data in `agent.get_data("ctx_window")` — but this function is apparently not called
- Is there a different place where token data is available?
- Does LiteLLM provide `response.usage` data that could be accessed without core modifications?

### 3. Extension Point Documentation
**Question:** What data is available at each extension point?
- The extension framework lists points like `monologue_start`, `response_stream`, `message_loop_start` etc.
- But there's no documentation on what parameters/data each extension receives
- Without knowing the available data, it's trial-and-error to build useful extensions

### 4. Plugin API Route Resolution
**Question:** How are plugin API routes resolved for user plugins vs. core plugins?
- User plugins in `usr/plugins/` use route `/usr/plugins/<name>/...` for static assets
- But API routes use `/api/plugins/<name>/...` (not `/api/usr/plugins/<name>/...`)
- The resolution mechanism is not documented

### 5. WebUI Extension Loading
**Question:** How do WebUI extensions get loaded?
- `x-component path=` requires absolute paths starting with `/` for user plugins
- The `components.js` loader prepends `components/` if the path doesn't start with `/` or `components/`
- This is not documented and caused a 404 error that took significant debugging to find

### 6. Logging in Docker
**Question:** How to output debug information that appears in `docker logs`?
- `PrintStyle.debug()` output doesn't appear in docker logs
- `print()` with `flush=True` also didn't appear (for code running in agent threads)
- There's no documented way to write log output that's visible in the container logs

### 7. Agent Instance Lifecycle
**Question:** When are `agent0` and `streaming_agent` created/destroyed?
- API calls return different agent instances than the one processing the chat
- `agent0` is recreated in `reset_context()` — when is this called?
- Understanding the agent lifecycle is essential for plugins that need to access agent state

---

## Recommendations for Agent Zero Maintainers

1. **Document the chat processing flow**: A clear diagram of what happens when a user sends a message, including all code paths and entry points

2. **Document extension point parameters**: Each extension point should document exactly what `kwargs` are available

3. **Add a token usage API**: A built-in API endpoint that returns current token usage would eliminate the need for plugins to reverse-engineer the data flow

4. **Document plugin routing**: How API routes and static assets are resolved for plugins in different directories

5. **Fix logging in Docker**: Provide a documented way to write log output visible in container logs

6. **Add a plugin development guide**: A step-by-step guide for creating plugins, including common patterns and pitfalls

---

## Files

### Plugin files (in `usr/plugins/litellm_info/`)
- `plugin.yaml` — Manifest
- `api/token_info.py` — API handler (reads ctx_window data)
- `extensions/webui/chat-top-start/token-display.html` — Header extension
- `webui/token-store.js` — Alpine store (polls API)
- `webui/token-widget.html` — Header widget component
- `README.md` — Plugin documentation

### Core files (NOT modified — verified clean)
- `agent.py`
- `models.py`
- `helpers/tokens.py`
- All other core files


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Plugin development for LiteLLM stats - token usage and context window info #1382

Summary

Approach 1: Core Modifications [rejected]

Goal

What was done

Result

Conclusion

Approach 2: Isolated Plugin [no core modifications]

Goal

What was done

Result

Investigation

Conclusion

Missing Developer Information

1. Chat Processing Code Path

2. Token Data Availability

3. Extension Point Documentation

4. Plugin API Route Resolution

5. WebUI Extension Loading

6. Logging in Docker

7. Agent Instance Lifecycle

Recommendations for Agent Zero Maintainers

Files

Plugin files (in `usr/plugins/litellm_info/`)

Core files (NOT modified — verified clean)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Plugin development for LiteLLM stats - token usage and context window info #1382

Description

Summary

Approach 1: Core Modifications [rejected]

Goal

What was done

Result

Conclusion

Approach 2: Isolated Plugin [no core modifications]

Goal

What was done

Result

Investigation

Conclusion

Missing Developer Information

1. Chat Processing Code Path

2. Token Data Availability

3. Extension Point Documentation

4. Plugin API Route Resolution

5. WebUI Extension Loading

6. Logging in Docker

7. Agent Instance Lifecycle

Recommendations for Agent Zero Maintainers

Files

Plugin files (in usr/plugins/litellm_info/)

Core files (NOT modified — verified clean)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Plugin files (in `usr/plugins/litellm_info/`)