跳至主要內容

SP Agent Team Token Report — Week of 2026-05-17

1 分
---
title: "Weekly Token Optimization Report: Managing the Opus Spike"
date: "2026-05-18"
category: "Engineering"
tags: ["LLM-Ops", "Token-Optimization", "Claude", "Hermes"]
---

## This Week's Numbers

The current state of our token distribution has hit a **Red Light (🔴)**. After a period of stability, our reliance on Claude Opus has spiked, pushing us past our efficiency threshold.

| Metric | Current Week | Previous Week | Delta | Status |
| :--- | :--- | :--- | :--- | :--- |
| **Avg Opus %** | 54% | 44% | +10% ⚠️ | 🔴 Over Target |
| **Total New Turns** | 418 | - | - | - |
| **Hermes Sessions** | 78 | - | - | 🟢 Stable |
| **Total Sessions** | 25 (Claude) | - | - | - |

Our target is to keep Opus usage **below 50%**. At 54%, we are currently over-indexing on the most expensive model for tasks that may be solvable by Sonnet or our secondary agents.

## What Changed

The data reveals two massive volatility spikes on **May 13th (58% Opus)** and **May 15th (73% Opus)**. 

While the early part of the week (May 11-12) showed 0% Opus usage, the mid-week surge suggests that a series of complex architectural hurdles or "hard" implementation tasks forced the team to rely on Opus's superior reasoning capabilities. The high number of new files (28) and turns added (418) indicates a period of heavy feature development.

## Wins

**Hermes Agent Stability:** Our secondary agent, Hermes (running `qwen3-235b-a22b-2507`), has become a reliable workhorse. With 78 sessions triggered via `cron` jobs, it is handling background maintenance and routine checks with a perfect 8-day streak. The consistency of ~11 sessions per day demonstrates that our automation pipeline is healthy and decoupled from the primary Claude project costs.

## Challenges

The "Opus Trap" is our primary challenge. When developers encounter a complex bug, there is a tendency to switch to Opus and stay there for the remainder of the session. This is evident in the May 15th data, where Opus turns (136) dwarfed Sonnet turns (49). We need to implement a stricter "Sonnet-first" workflow to prevent cost leakage.

## Dispatch Optimization

To bring the Opus% back under 50%, we must shift specific workloads from Claude to Hermes:
1. **Routine Log Analysis**: Move all daily `cron` analysis and health checks entirely to Hermes.
2. **Initial Research Passes**: Use Hermes to summarize documentation or scan repositories before passing the refined context to Sonnet/Opus.
3. **Boilerplate Generation**: Shift repetitive scaffolding tasks to the Hermes/Qwen pipeline.

## Cost Savings

By routing 78 routine tasks to Hermes instead of Claude Sonnet, we are reducing our per-query overhead significantly.

- **Claude Sonnet Est. Cost**: ~$0.01 per routine query
- **Hermes (DeepSeek-equiv) Cost**: $0.003 per query
- **Savings per Query**: $0.007
- **Total Weekly Savings**: $78 \times 0.007 \approx \mathbf{\$0.55}$

While the absolute dollar amount is small at this scale, this represents a **70% reduction in cost** for these specific tasks. As we scale from 78 to 7,800 sessions, this optimization becomes critical.

## Next Week's Target

**Goal: Opus% < 45%**

Based on this week's performance, here are the mandatory action items:

1. **Route more implementation to Sonnet**: Because Opus% (54%) is > 50%, developers must attempt a solution with Sonnet 3.5 before escalating to Opus.
2. **Audit "Opus Spikes"**: Review the turns from May 15th to identify if those tasks could have been decomposed into smaller, Sonnet-compatible prompts.
3. **Expand Hermes Triggering**: Identify one more manual "research" task to automate via Hermes `cron` to further lower the primary token load.