unified_v2.1: relax gates + add unified_kv_both isolation control
v2.0 ran on B3 and triggered PD-sep only 2 / 1214 times (0.2%). The
gates were too conservative; the v2-vs-v1 latency gap (TTFT p90
7.35 -> 8.96 s) is therefore probably attributable to kv_both
always-on overhead, not to the PD-sep mechanism itself. v2.1 has two
fixes plus an isolation control.
Bug fix:
- The "chosen has live decodes worth protecting" gate combined
num_requests and ongoing_decode_tokens with AND, falling through
when EITHER was small. Under agentic workloads each worker rarely
stacks more than 1-2 concurrent requests, so the gate killed 84%
of v2.0 candidates that reached it. Replace with a pure
ongoing_decode_tokens == 0 check ("chosen_no_active_decode") —
same semantic, much higher recall.
Threshold relaxation (B2 microbench is the calibration source):
- pd_sep_min_new_tokens: 16000 -> 8000 (B2 TPOT idx 1.9x already
at 8k, TTFT idx 12x — strictly worth migrating)
- pd_sep_min_decodes_protected: 2 -> 1
- pd_sep_min_src_cache_tokens: 8000 -> 4000
- pd_sep_min_extra_cache_tokens: 4000 -> 2000
Isolation control:
- New --policy unified_kv_both option. Uses the exact same picker as
--policy unified but the vLLMs are launched in kv_role=kv_both
(the same launch mode unified_v2 requires). PD-sep never fires.
Compares against unified_v2 to attribute any v2 effect to the
PD-sep branch alone, not the kv_both always-on overhead.
- Both unified_kv_both and unified_v2 auto-enable kv_both launch in
b3_isolated_policy.sh.
Tests:
- Updated the existing "chosen has no decodes" test for the new
gate name and semantic.
- All 24 proxy tests pass.
Refs: window_1_results/v2_breakdown analysis (88.7% of candidates
caught by old new_local_below_threshold; 84% of the remainder
caught by the old few_decodes gate).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -410,12 +410,12 @@ def test_unified_v2_triggers_when_src_has_meaningful_cache_and_chosen_has_decode
|
||||
|
||||
|
||||
def test_unified_v2_falls_through_when_chosen_has_no_decodes(proxy):
|
||||
"""No decodes on chosen → no benefit from PD-sep."""
|
||||
"""No decoding work on chosen → no benefit from PD-sep."""
|
||||
insts, prefix = _setup_v2_scene(proxy, chosen_decodes=0, src_cache_blocks=128)
|
||||
chosen, idx, decision, pd_sep = proxy.pick_instance_unified_v2(
|
||||
insts, prefix, None, len(prefix), {})
|
||||
assert pd_sep is None
|
||||
assert "few_decodes" in decision["v2_reason"]
|
||||
assert "no_active_decode" in decision["v2_reason"]
|
||||
|
||||
|
||||
def test_estimate_transfer_cost_is_calibrated_function(proxy):
|
||||
|
||||
Reference in New Issue
Block a user