Routing fix: new sessions placed by cumulative token load (greedy bin
packing) with cache-hit tiebreak. Session affinity for turn 2+.
Replayer now sends X-Session-Id header for proper session tracking.
Agentic workload core patterns (GLM-5.1 trace):
- 91% of reusable KV is intra-session (not cross-session)
- Session-sticky routing is THE critical optimization
- 36% warm requests (1.3k new tokens), 64% cold (17k+)
- After cache: effective prefill/decode ratio drops from 61.5x to 28.7x
- Cross-session sharing (system prompt) is only 4.8% of tokens
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>