跳至主要內容

SP Agent Team Token Report — Week of 2026-06-07

1 分
---
title: "Weekly Token Optimization Report: The Opus Drift"
date: "2026-06-08"
category: "Engineering"
tags: ["LLMOps", "CostOptimization", "AgenticSystems", "Claude"]
---

## This Week's Numbers

Our token efficiency took a significant hit this week. While we have successfully scaled our secondary agent (Hermes), our primary Claude dispatching has drifted back toward the high-cost model.

- **Opus Usage Average:** 70% (Last week: 46%)
- **Trend:** $\uparrow 24\%$ worsening
- **Target:** $< 50\%$
- **Status:** 🔴 **Red Light**
- **Total Claude Turns:** 1,247 turns across 5 sessions
- **Hermes Activity:** 89 sessions / 11M+ tokens processed

## What Changed

The most alarming data point is the spike on **June 6th**, where we recorded **491 Opus turns** compared to only 99 Sonnet turns. This suggests a "complexity trap" where the agent defaulted to Opus for a large-scale implementation task rather than decomposing the problem for Sonnet. By June 7th, the ratio hit 100% Opus, indicating that current prompt routing is failing to trigger Sonnet for routine updates.

## Wins

Despite the Claude drift, **Hermes Agent** is performing exceptionally well as our "utility layer." 
- **High Volume:** Handled 591 messages and 11 million tokens, primarily via `cron` platforms.
- **Tool Proficiency:** Hermes successfully executed 241 tool calls, with `terminal` (34.9%) and `read_file` (27.4%) being the primary drivers.
- **Model Diversity:** Gemini-3.5-flash handled the bulk of the load (11M tokens), proving that our "free engine" tier is capable of maintaining agentic loops for monitoring and reporting.

## Challenges

The core challenge is **Dispatch Regression**. We are seeing a trend where "Implementation" tasks are defaulting to Opus. When the Opus% hits 70%+, our cost-per-feature increases exponentially. We need to enforce stricter boundaries on when an agent is allowed to escalate from Sonnet to Opus.

## Next Week's Target

- **Opus% Target:** $< 45\%$
- **Objective:** Implement a "Sonnet-First" enforcement hook for all implementation tasks under 500 lines of code.

## Dispatch Optimization

Based on the Hermes data, we are seeing a high volume of `read_file` and `search_files` calls. These are low-reasoning, high-context tasks. 

**Proposed Shift:**
Next week, we will move all **Repository Indexing** and **Log Analysis** tasks entirely from Claude to Hermes. Currently, some of these are still triggering Opus turns for "analysis" that Gemini-3.5-flash can handle with 95% accuracy.

## Cost Savings

By routing the Hermes workload (591 messages) through our optimized DeepSeek/Gemini pipeline instead of Claude Sonnet, the savings are substantial:

- **Hermes Cost (DeepSeek rate):** $591 \text{ queries} \times \$0.003 \approx \mathbf{\$1.77}$
- **Estimated Sonnet Cost:** For the same 1.67M input tokens and 15K output tokens, Sonnet would have cost approximately $\mathbf{\$5.20 - \$7.00}$ (depending on cache hits).
- **Net Savings:** $\approx \$4.00 - \$5.00$ per single-agent utility cycle. While small in isolation, scaling this to 100+ agents represents a massive reduction in OpEx.

## Recommendations

Based on this week's metrics, the following action items are mandated:

1. **Route more implementation to Sonnet:** Because Opus% (70%) is well above the 50% threshold, we must audit the routing logic to prevent automatic Opus escalation.
2. **Audit "Complex" Tags:** Review the 491 Opus turns from June 6th to determine if those tasks could have been decomposed into smaller Sonnet-sized chunks.
3. **Expand Hermes Toolset:** Since `terminal` and `read_file` are the most used tools, we will move all "read-only" repo explorations to Hermes to protect the Claude context window.