3594f7dce0dc735e77ea2f66cf8ee7adc7fcb855
LMetric was incorrectly sharing session-sticky logic with Linear policy. Fixed to pure per-request routing: score = P_tokens × BS where P = pending_prefill + (input - cache_hit), BS = num_requests. Experiment result (200 req, fresh restart): Linear vs corrected LMetric show <2% difference on all metrics — LMetric's cache-hit estimation provides implicit soft affinity that preserves locality without explicit session stickiness. Also fix bench.sh missing cd (replayer module not found from non-project cwd) and rewrite run_lmetric_ab.sh as thin wrapper around bench.sh to eliminate duplicated launch/cleanup logic that broke under set -euo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%