5b26c345f4b6678ee1abfd2222ffe854f3880fb4
InstanceState.eff_{num_requests,pending_prefill,ongoing_decode,ongoing_tokens}
= max(shadow, real) when feed fresh (fixes 30s-stale under-count, keeps
in-flight RaceFix), plus real-only r_max_prefill_remaining / r_kv_used_frac.
Wired into load_only, lmetric, sticky, unified(_kv_both), unified_v3, and
snapshot logging. Feed off => identical to before. run_v3_trace.sh gains ES=1
toggle (always deploys enhanced proxy); run_ablation_es.sh runs each config
ES0-vs-ES1 to test whether real state changes policy performance/ranking.
All unit-tested without GPU.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%