Skip to content

[agentic-token-optimizer] Optimization: Package Specification Librarian — Phase 2 sandbox mismatch and template verbosity #39607

@github-actions

Description

@github-actions

Target Workflow

Package Specification Librarian (spec-librarian.md) — selected as highest-AIC eligible workflow after excluding workflows optimized in the past 14 days. No prior optimization entry in optimization-log.json.

Analysis Period

Period Runs Total AIC Avg AIC/run Raw tokens Avg turns/run Cache efficiency Action minutes
2026-06-16 (7-day window) 1 956.17 956.17 2,769,395 107 96.5% cache read 19 min

Analysis based on 1 observed run (§27626196453). Cache efficiency (2,638,651 / 2,730,900 input tokens) is strong, but total turns is abnormally high for a workflow of this scope.

Cost Profile

Metric Value
Total AIC 956.17
Avg turns/run 107
Avg input tokens/turn (first 10 turns) 22,607
Avg input tokens/turn (last 10 turns) 44,962
Turns with >30k input tokens 25 (23%)
Turns with >40k input tokens 10 (9%)
Total output tokens 38,495
Input:output ratio 71:1

The token-per-turn growth from 22k → 45k indicates significant context accumulation — the agent is building up intermediate results in the conversation rather than computing them in a few batched operations.


Ranked Recommendations

1. Fix Phase 2 bash-loop sandbox mismatch (Estimated savings: ~220–340 AIC/run)

Evidence: The workflow-logs show the agent encountered "Permission denied and could not request permission from user" repeatedly when trying to run for pkg in pkg/*/; do git log ... done loops and python3 -c "..." scripts. The sandbox only allows the specific shell commands listed in the tool whitelist — complex constructs (loops, heredocs, multi-command pipes) are blocked. This caused ~25–35 turns of retry/workaround cycles observed across turns 8, 14, 17, 31, 33, 37, 39, 42, 46. The agent also violated its own "Do not use background sub-agents" instruction (turns 17–25) when loop-based approaches failed, adding ~8 more wasted turns.

Root cause: Phase 2 instructs: "Run direct shell commands for each package in has_spec to detect stale specifications" — but a per-package loop is not in the allowed tool list. The allowed git commands are:

  • git log --oneline --since="30 days ago" -- pkg/*
  • git log --oneline --since="7 days ago" -- pkg/*/README.md
  • git log -1 --format=%H -- pkg/*

None of these produce per-package date output in a single pass that the agent can parse without a loop.

Action: Add a single batch command to the tools list and rewrite Phase 2 to use it:

# Add to tools.bash:
- "git log --format='%as %H' --name-only --since='90 days ago' -- pkg/"

Rewrite Phase 2:

Use git log --format='%as %H' --name-only --since='90 days ago' -- pkg/ in a single call to get all change dates across packages. Parse the output (each commit block: date, hash, then filenames) to derive spec_date and src_date per package without iterating package-by-package.

This replaces 28–56 per-package git calls with a single command. Estimated turn reduction: 25–38 turns.


2. Trim Phase 5 issue body template (Estimated savings: ~40–60 AIC/run)

Evidence: Phase 5 occupies 111 lines of the prompt — the largest single section. It includes a full example issue body with sample table rows, sample dates, and sample package names. Every one of the 107 turns re-reads this example data. At ~2,000 extra tokens per turn × 107 turns = 214,000 tokens ($0.64 at $3/M), mapped to AIC this represents ~50 AIC overhead across the run.

Action: Remove the example table rows from the issue body template. Keep the structure (headers, column names, formatting instructions) but replace example rows like:

|| `console` | 2026-04-10 | 2026-04-08 |
| ⚠️ | `parser` | 2026-03-01 | 2026-04-12 |

with a single comment: <!-- one row per package — fill with actual data -->. The sample "cli" and "workflow" entries in the Missing Specifications and Stale Specifications sections should similarly be replaced with brief schema notes. Target: reduce Phase 5 from ~111 lines to ~35 lines.


3. Consolidate Phase 3 grep passes (Estimated savings: ~25–35 AIC/run)

Evidence: The tool list allows 5 separate grep invocations:

  • grep -rn "func [A-Z]" pkg --include="*.go"
  • grep -rn "type [A-Z]" pkg --include="*.go"
  • grep -rn "const [A-Z]" pkg --include="*.go"
  • grep -rn "import " pkg --include="*.go"
  • grep -rn "package " pkg --include="*.go"

Each produces output the agent must reason over in a separate turn. These 5 passes scan the same files five times.

Action: Replace with a combined command using -e patterns:

# Replace the 5 separate grep entries with:
- "grep -rn -e '^func [A-Z]' -e '^type [A-Z]' -e '^const [A-Z]' -e '^import ' -e '^package ' pkg --include='*.go'"

And update the prompt to reference one command. This reduces 5 grep turns to 1. Estimated turn reduction: 3–4 turns.


4. Remove redundant unconstrained shell tools (Estimated savings: ~10–15 AIC/run)

Evidence: The tool whitelist includes unconstrained generic commands alongside constrained variants:

  • shell(cat) — alongside shell(cat pkg/*/README.md), shell(cat pkg/*/*.go)
  • shell(grep) — alongside 5 specific grep patterns
  • shell(head) — alongside shell(head -n * pkg/*/*.go)
  • shell(wc) — alongside shell(wc -l pkg/*/README.md)

The unconstrained variants are actually broader than the constrained ones — they don't gate the sandbox as intended — but they still add tool description tokens in every turn. The 35+ --allow-tool flags each emit tool description text to the model. Removing ~8–10 redundant ones reduces per-turn overhead.

Action: Remove shell(cat), shell(grep), shell(head), shell(wc), shell(sort), shell(uniq), shell(tail), shell(printf) from the unconstrained tools list. Keep only the constrained variants plus shell(echo), shell(date), shell(ls), shell(pwd).

Supporting evidence: turn cost breakdown and log excerpts

Turn cost growth (from token_usage.jsonl):

Turn range Avg input tokens
Turns 1–10 22,607
Turns 40–60 20,337 (plateau)
Turns 97–107 44,962

The spike in the final 25 turns is consistent with accumulated Phase 3/4 grep output being retained in context.

Permission failure pattern (from workflow-logs/3_agent.txt):

  • Turn 8: "The bash commands are not working due to permission issues."
  • Turn 14: "The bash commands with loops seem to fail with 'Permission denied and could not request permission from user'."
  • Turn 17: "The bash tool is failing... Let me try using the task tool to delegate this work to a sub-agent."
  • Turn 19: "The agent is running in the background. Let me wait for it to complete." (violates "no sub-agents" instruction)
  • Turn 39: "the 'permission denied' seems to be coming from the bash tool itself (the sandbox), not from git"

The agent diagnosed the issue correctly at turn 39 but had already spent ~30 turns on retries.

Git log call count: 32 git log appearances in the agent log across the single run.

References: §27626196453

Caveats

  • Based on a single observed run. The permission failure pattern may vary if the runner sandbox configuration changes.
  • The batch git log approach (Rec. 1) requires the repo to not be a shallow clone, or to fall back to the GitHub API. The workflow already uses cli-proxy: true with github tools, so API fallback is available.
  • Cache efficiency (96.5%) is already good — improvements will come from reducing turns, not cache tuning.
  • Estimated AIC savings are conservative (lower bound). Upper bound may be 350–430 AIC/run if all recommendations are applied.

Generated by Agentic Workflow AIC Usage Optimizer · 1.2K AIC · ⊞ 24.6K ·

  • expires on Jun 23, 2026, 8:34 AM UTC-08:00

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions