Gahow Wang 795edc6c66 Overnight work report: routing optimization achieves +4.7pp APC
Summary of overnight autonomous session:
- Analyzed agentic workload patterns (91% KV reuse is intra-session)
- Simulated cache policies (LRU near-optimal, routing is the bottleneck)
- Implemented hybrid routing (session-sticky + load-aware override)
- Result: APC 44.7% -> 49.4% with zero latency regression

Key insight: routing quality > cache policy > PD separation for
single-machine agentic workloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 02:54:48 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%