Gahow Wang 5892739159 Add session affinity as soft preference in unified routing
Without affinity, all cached requests route to the same instance
(cache source always has lowest prefill cost), causing 149s queue.

Fix: if the session's last instance has cost <= 2x the global best,
use it (preserves cache locality). Only re-route when the affinity
instance is significantly more expensive (overloaded).

The 2x threshold is intentionally loose — it's not a hardcoded magic
number but a "prefer locality unless clearly worse" heuristic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-24 02:37:58 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%