Gahow Wang
012d73f596
Hybrid routing: session-sticky + load-aware override achieves best results
Session affinity for KV reuse, with load-aware override when pinned
instance has ongoing_tokens > 2x average. Combines APC of sticky
routing with latency of load-based routing.
Results (1000 req, TP=1 DP=8 combined):
TTFT50 TPOT90 E2E50 APC
Old cache-aware 0.731 0.073 4.480 44.7%
Balanced session-sticky 0.953 0.079 5.520 48.7%
Hybrid (sticky+load-aware) 0.737 0.072 4.487 49.4% <- BEST
Hybrid achieves +4.7pp APC improvement with zero latency regression.
Session-sticky provides KV reuse; load-aware override prevents hotspots.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 02:53:44 +08:00
..
2026-05-21 22:13:38 +08:00
2026-05-21 22:42:20 +08:00
2026-05-21 23:02:42 +08:00
2026-05-22 01:50:27 +08:00
2026-05-22 00:13:50 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 01:27:22 +08:00
2026-05-21 22:13:38 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 02:53:44 +08:00
2026-05-22 01:00:10 +08:00
2026-05-22 02:13:15 +08:00
2026-05-21 21:21:57 +08:00
2026-05-21 21:21:57 +08:00
2026-05-21 21:21:57 +08:00
2026-05-21 22:13:38 +08:00
2026-05-21 21:21:57 +08:00
2026-05-21 21:21:57 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 00:13:50 +08:00
2026-05-21 21:21:57 +08:00
2026-05-21 21:21:57 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 01:28:53 +08:00