Least-Prefill-Work-Left: score = pending_prefill_tokens + max(0, input -
cache_hit_here), pure argmin with (num_requests, round-robin) tie-break.
Zero hyperparameters — derived from the agentic pattern: decode is cheap
(I/O ~217x) so outstanding prefill-token-work is the only load worth
modelling. Dropping LMetric's x num_requests factor (a) un-swallows the
cache signal so affinity emerges with no gate, and (b) makes an idle-but-
decoding host score `input` (its true marginal cost) instead of 0,
removing the empty-batch degeneracy. Stick-vs-spill crossover is computed
from real token-work, replacing overload_factor + cache_ratio gate.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>