v2: linear (default cache-aware) baseline + 2x wall-cap on first600s
Follow-up to the LMetric sweep: rerun with --policy linear (cache-aware load + sticky session affinity, the cache_aware_proxy default) and cap each PD-disagg arm at 2x the colo bench wall (SIGTERM bench.sh once cap is exceeded; the cleanup trap clears vLLM and proxy; capped runs lack metrics.summary.json so the analysis script computes from raw metrics.jsonl). Headline: the success-rate ceiling is policy-invariant. arm linear (capped at 2x) lmetric (uncapped) colo 807/807 = 100%, 964s 807/807 = 100%, 1021s pd6 (6:2) 472/807 = 58%, 2280s ⊗ 474/807 = 59%, 3325s pd4 (4:4) 349/807 = 43%, 2281s ⊗ 348/807 = 43%, 6850s pd2 (2:6) 176/807 = 22%, 2280s ⊗ 180/807 = 22%, 19275s Routing affects only how much wall is wasted timing out unreachable requests at 600s each. Linear hits the same ceiling in 2280s as LMetric does in 3300-19000s. This *strengthens* the §5 D-pool capacity-ceiling thesis -- the cap is structural, not a routing artifact. Artifacts: analysis/v2/fig4r_linear.json -- 4-arm linear summary analysis/v2/PD_DISAGG_LMETRIC.md -- extended with wall-cap section figs/v2/fig4_linear_vs_lmetric.png -- 3-panel side-by-side comparison microbench/fresh_setup/plot_fig4_linear_vs_lmetric.py
This commit is contained in:
BIN
figs/v2/fig4_linear_vs_lmetric.png
Normal file
BIN
figs/v2/fig4_linear_vs_lmetric.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 109 KiB |
Reference in New Issue
Block a user