- Delete unreachable best_needs_push block in _handle_combined and the
four orphaned helpers (_handle_cached_prefill_offload,
_handle_direct_read_offload, _query_bootstrap_hit,
_get_bootstrap_client). Their only caller was the retired PUSH gate;
see REPORT §3.9 errata for the rejected experiments (cc6e562, 4c583f2).
- Extract pick_instance_unified_hybrid as a pure function returning
(chosen, idx, decision_dict). The decision dict carries the review #7
breakdown fields (decision, affinity_idx/chosen_idx, cache_hit/ratio,
avg_num_requests, fallback_score, tie_break_used).
- Add LMetric-fallback tie-breaker (primary score, then new_uncached,
num_requests, round-robin) so new sessions don't all pin to inst 0
when BS=0 across the board.
- Drop the lmetric-policy affinity write so --policy lmetric stays
affinity-free per review #3.
- Mark --max-offload-inflight / --offload-mode / --cache-gate-ratio /
--decode-iteration-s as [DEPRECATED] in --help; flags remain accepted
so scripts/bench.sh and legacy launchers don't break.
- Revert uncommitted overload_factor 2.0->1.5 default; H7 sweep already
rejected this knob (within noise). Future sweeps should go via CLI.
Tests: add 6 hybrid-policy tests in tests/test_proxy_pick.py covering
affinity-hit, overload break, low-cache fallback, tie-break rotation,
lmetric purity, and breakdown field shape. 19/19 pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>