Files
agentic-kvc/scripts
Gahow Wang 012d73f596 Hybrid routing: session-sticky + load-aware override achieves best results
Session affinity for KV reuse, with load-aware override when pinned
instance has ongoing_tokens > 2x average. Combines APC of sticky
routing with latency of load-based routing.

Results (1000 req, TP=1 DP=8 combined):
                              TTFT50  TPOT90  E2E50   APC
  Old cache-aware              0.731   0.073   4.480  44.7%
  Balanced session-sticky      0.953   0.079   5.520  48.7%
  Hybrid (sticky+load-aware)   0.737   0.072   4.487  49.4%  <- BEST

Hybrid achieves +4.7pp APC improvement with zero latency regression.
Session-sticky provides KV reuse; load-aware override prevents hotspots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 02:53:44 +08:00
..