Interesting repo. The system prompt alone eats ~14K tokens per prompt. On top of that, Claude Code reads dozens of project files for context.
I tracked the full breakdown on a real project: ~14K system prompt + ~180K codebase reads + conversation history = the agent is processing 200K+ tokens before it starts reasoning about your actual question.
The system prompt is fixed cost. The conversation history grows linearly. The codebase reads are the variable you can actually control.
Has anyone mapped the full token breakdown per prompt including the system prompt overhead documented here? Would be useful data for the community.
I have benchmark data on reducing the codebase read portion: vexp.dev/benchmark
Interesting repo. The system prompt alone eats ~14K tokens per prompt. On top of that, Claude Code reads dozens of project files for context.
I tracked the full breakdown on a real project: ~14K system prompt + ~180K codebase reads + conversation history = the agent is processing 200K+ tokens before it starts reasoning about your actual question.
The system prompt is fixed cost. The conversation history grows linearly. The codebase reads are the variable you can actually control.
Has anyone mapped the full token breakdown per prompt including the system prompt overhead documented here? Would be useful data for the community.
I have benchmark data on reducing the codebase read portion: vexp.dev/benchmark