Gahow Wang 3594f7dce0 Fix LMetric routing: remove session affinity, align with OSDI'26 spec
LMetric was incorrectly sharing session-sticky logic with Linear policy.
Fixed to pure per-request routing: score = P_tokens × BS where
P = pending_prefill + (input - cache_hit), BS = num_requests.

Experiment result (200 req, fresh restart): Linear vs corrected LMetric
show <2% difference on all metrics — LMetric's cache-hit estimation
provides implicit soft affinity that preserves locality without explicit
session stickiness.

Also fix bench.sh missing cd (replayer module not found from non-project
cwd) and rewrite run_lmetric_ab.sh as thin wrapper around bench.sh to
eliminate duplicated launch/cleanup logic that broke under set -euo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-23 11:56:58 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%