[agentic-token-optimizer] Optimization: Package Specification Librarian — Phase 2 sandbox mismatch and template verbosity

### Target Workflow

**Package Specification Librarian** (`spec-librarian.md`) — selected as highest-AIC eligible workflow after excluding workflows optimized in the past 14 days. No prior optimization entry in `optimization-log.json`.

### Analysis Period

| Period | Runs | Total AIC | Avg AIC/run | Raw tokens | Avg turns/run | Cache efficiency | Action minutes |
|--------|------|-----------|-------------|------------|---------------|-----------------|----------------|
| 2026-06-16 (7-day window) | 1 | 956.17 | 956.17 | 2,769,395 | 107 | 96.5% cache read | 19 min |

Analysis based on 1 observed run ([§27626196453](https://github.com/github/gh-aw/actions/runs/27626196453)). Cache efficiency (2,638,651 / 2,730,900 input tokens) is strong, but total turns is abnormally high for a workflow of this scope.

### Cost Profile

| Metric | Value |
|--------|-------|
| Total AIC | 956.17 |
| Avg turns/run | 107 |
| Avg input tokens/turn (first 10 turns) | 22,607 |
| Avg input tokens/turn (last 10 turns) | 44,962 |
| Turns with >30k input tokens | 25 (23%) |
| Turns with >40k input tokens | 10 (9%) |
| Total output tokens | 38,495 |
| Input:output ratio | 71:1 |

The token-per-turn growth from 22k → 45k indicates significant context accumulation — the agent is building up intermediate results in the conversation rather than computing them in a few batched operations.

---

### Ranked Recommendations

#### 1. Fix Phase 2 bash-loop sandbox mismatch (Estimated savings: ~220–340 AIC/run)

**Evidence**: The workflow-logs show the agent encountered "Permission denied and could not request permission from user" repeatedly when trying to run `for pkg in pkg/*/; do git log ... done` loops and `python3 -c "..."` scripts. The sandbox only allows the specific shell commands listed in the tool whitelist — complex constructs (loops, heredocs, multi-command pipes) are blocked. This caused ~25–35 turns of retry/workaround cycles observed across turns 8, 14, 17, 31, 33, 37, 39, 42, 46. The agent also violated its own "Do not use background sub-agents" instruction (turns 17–25) when loop-based approaches failed, adding ~8 more wasted turns.

**Root cause**: Phase 2 instructs: *"Run direct shell commands for each package in `has_spec` to detect stale specifications"* — but a per-package loop is not in the allowed tool list. The allowed git commands are:
- `git log --oneline --since="30 days ago" -- pkg/*`
- `git log --oneline --since="7 days ago" -- pkg/*/README.md`
- `git log -1 --format=%H -- pkg/*`

None of these produce per-package date output in a single pass that the agent can parse without a loop.

**Action**: Add a single batch command to the tools list and rewrite Phase 2 to use it:

```yaml
# Add to tools.bash:
- "git log --format='%as %H' --name-only --since='90 days ago' -- pkg/"
```

Rewrite Phase 2:
> Use `git log --format='%as %H' --name-only --since='90 days ago' -- pkg/` in a single call to get all change dates across packages. Parse the output (each commit block: date, hash, then filenames) to derive `spec_date` and `src_date` per package without iterating package-by-package.

This replaces 28–56 per-package git calls with a single command. Estimated turn reduction: 25–38 turns.

---

#### 2. Trim Phase 5 issue body template (Estimated savings: ~40–60 AIC/run)

**Evidence**: Phase 5 occupies 111 lines of the prompt — the largest single section. It includes a full example issue body with sample table rows, sample dates, and sample package names. Every one of the 107 turns re-reads this example data. At ~2,000 extra tokens per turn × 107 turns = 214,000 tokens ($0.64 at $3/M), mapped to AIC this represents ~50 AIC overhead across the run.

**Action**: Remove the example table rows from the issue body template. Keep the structure (headers, column names, formatting instructions) but replace example rows like:

```markdown
| ✅ | `console` | 2026-04-10 | 2026-04-08 |
| ⚠️ | `parser` | 2026-03-01 | 2026-04-12 |
```

with a single comment: ``. The sample "cli" and "workflow" entries in the Missing Specifications and Stale Specifications sections should similarly be replaced with brief schema notes. Target: reduce Phase 5 from ~111 lines to ~35 lines.

---

#### 3. Consolidate Phase 3 grep passes (Estimated savings: ~25–35 AIC/run)

**Evidence**: The tool list allows 5 separate grep invocations:
- `grep -rn "func [A-Z]" pkg --include="*.go"`
- `grep -rn "type [A-Z]" pkg --include="*.go"`
- `grep -rn "const [A-Z]" pkg --include="*.go"`
- `grep -rn "import " pkg --include="*.go"`
- `grep -rn "package " pkg --include="*.go"`

Each produces output the agent must reason over in a separate turn. These 5 passes scan the same files five times.

**Action**: Replace with a combined command using `-e` patterns:

```yaml
# Replace the 5 separate grep entries with:
- "grep -rn -e '^func [A-Z]' -e '^type [A-Z]' -e '^const [A-Z]' -e '^import ' -e '^package ' pkg --include='*.go'"
```

And update the prompt to reference one command. This reduces 5 grep turns to 1. Estimated turn reduction: 3–4 turns.

---

#### 4. Remove redundant unconstrained shell tools (Estimated savings: ~10–15 AIC/run)

**Evidence**: The tool whitelist includes unconstrained generic commands alongside constrained variants:
- `shell(cat)` — alongside `shell(cat pkg/*/README.md)`, `shell(cat pkg/*/*.go)`
- `shell(grep)` — alongside 5 specific grep patterns
- `shell(head)` — alongside `shell(head -n * pkg/*/*.go)`
- `shell(wc)` — alongside `shell(wc -l pkg/*/README.md)`

The unconstrained variants are actually broader than the constrained ones — they don't gate the sandbox as intended — but they still add tool description tokens in every turn. The 35+ `--allow-tool` flags each emit tool description text to the model. Removing ~8–10 redundant ones reduces per-turn overhead.

**Action**: Remove `shell(cat)`, `shell(grep)`, `shell(head)`, `shell(wc)`, `shell(sort)`, `shell(uniq)`, `shell(tail)`, `shell(printf)` from the unconstrained tools list. Keep only the constrained variants plus `shell(echo)`, `shell(date)`, `shell(ls)`, `shell(pwd)`.

<details>
<summary><b>Supporting evidence: turn cost breakdown and log excerpts</b></summary>

**Turn cost growth** (from `token_usage.jsonl`):

| Turn range | Avg input tokens |
|------------|-----------------|
| Turns 1–10 | 22,607 |
| Turns 40–60 | 20,337 (plateau) |
| Turns 97–107 | 44,962 |

The spike in the final 25 turns is consistent with accumulated Phase 3/4 grep output being retained in context.

**Permission failure pattern** (from `workflow-logs/3_agent.txt`):
- Turn 8: "The bash commands are not working due to permission issues."
- Turn 14: "The bash commands with loops seem to fail with 'Permission denied and could not request permission from user'."
- Turn 17: "The bash tool is failing... Let me try using the task tool to delegate this work to a sub-agent."
- Turn 19: "The agent is running in the background. Let me wait for it to complete." *(violates "no sub-agents" instruction)*
- Turn 39: "the 'permission denied' seems to be coming from the bash tool itself (the sandbox), not from git"

The agent diagnosed the issue correctly at turn 39 but had already spent ~30 turns on retries.

**Git log call count**: 32 git log appearances in the agent log across the single run.

**References**: [§27626196453](https://github.com/github/gh-aw/actions/runs/27626196453)

</details>

### Caveats

- Based on a single observed run. The permission failure pattern may vary if the runner sandbox configuration changes.
- The batch git log approach (Rec. 1) requires the repo to not be a shallow clone, or to fall back to the GitHub API. The workflow already uses `cli-proxy: true` with github tools, so API fallback is available.
- Cache efficiency (96.5%) is already good — improvements will come from reducing turns, not cache tuning.
- Estimated AIC savings are conservative (lower bound). Upper bound may be 350–430 AIC/run if all recommendations are applied.







> Generated by [Agentic Workflow AIC Usage Optimizer](https://github.com/github/gh-aw/actions/runs/27631922625) · 1.2K AIC · ⊞ 24.6K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fagentic-token-optimizer%22&type=issues)
> - [x] expires  on Jun 23, 2026, 8:34 AM UTC-08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[agentic-token-optimizer] Optimization: Package Specification Librarian — Phase 2 sandbox mismatch and template verbosity #39607

Target Workflow

Analysis Period

Cost Profile

Ranked Recommendations

1. Fix Phase 2 bash-loop sandbox mismatch (Estimated savings: ~220–340 AIC/run)

2. Trim Phase 5 issue body template (Estimated savings: ~40–60 AIC/run)

3. Consolidate Phase 3 grep passes (Estimated savings: ~25–35 AIC/run)

4. Remove redundant unconstrained shell tools (Estimated savings: ~10–15 AIC/run)

Caveats

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
Total AIC	956.17
Avg turns/run	107
Avg input tokens/turn (first 10 turns)	22,607
Avg input tokens/turn (last 10 turns)	44,962
Turns with >30k input tokens	25 (23%)
Turns with >40k input tokens	10 (9%)
Total output tokens	38,495
Input:output ratio	71:1

Turn range	Avg input tokens
Turns 1–10	22,607
Turns 40–60	20,337 (plateau)
Turns 97–107	44,962

[agentic-token-optimizer] Optimization: Package Specification Librarian — Phase 2 sandbox mismatch and template verbosity #39607

Description

Target Workflow

Analysis Period

Cost Profile

Ranked Recommendations

1. Fix Phase 2 bash-loop sandbox mismatch (Estimated savings: ~220–340 AIC/run)

2. Trim Phase 5 issue body template (Estimated savings: ~40–60 AIC/run)

3. Consolidate Phase 3 grep passes (Estimated savings: ~25–35 AIC/run)

4. Remove redundant unconstrained shell tools (Estimated savings: ~10–15 AIC/run)

Caveats

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions