Follow-up to the LMetric sweep: rerun with --policy linear (cache-aware
load + sticky session affinity, the cache_aware_proxy default) and cap
each PD-disagg arm at 2x the colo bench wall (SIGTERM bench.sh once cap
is exceeded; the cleanup trap clears vLLM and proxy; capped runs lack
metrics.summary.json so the analysis script computes from raw
metrics.jsonl).
Headline: the success-rate ceiling is policy-invariant.
arm linear (capped at 2x) lmetric (uncapped)
colo 807/807 = 100%, 964s 807/807 = 100%, 1021s
pd6 (6:2) 472/807 = 58%, 2280s ⊗ 474/807 = 59%, 3325s
pd4 (4:4) 349/807 = 43%, 2281s ⊗ 348/807 = 43%, 6850s
pd2 (2:6) 176/807 = 22%, 2280s ⊗ 180/807 = 22%, 19275s
Routing affects only how much wall is wasted timing out unreachable
requests at 600s each. Linear hits the same ceiling in 2280s as
LMetric does in 3300-19000s. This *strengthens* the §5 D-pool
capacity-ceiling thesis -- the cap is structural, not a routing
artifact.
Artifacts:
analysis/v2/fig4r_linear.json -- 4-arm linear summary
analysis/v2/PD_DISAGG_LMETRIC.md -- extended with wall-cap section
figs/v2/fig4_linear_vs_lmetric.png -- 3-panel side-by-side comparison
microbench/fresh_setup/plot_fig4_linear_vs_lmetric.py