Commit Graph

2 Commits

Author SHA1 Message Date
0b180c191e v2 exp(d): expand figure to 6 panels (TTFT/E2E mean+p90, TPS, per-worker GPU util)
Per request: TTFT mean+p90, E2E mean+p90, decode TPS (output goodput; total/
prefill TPS omitted as cache-miss-inflated), and per-worker GPU-util boxplots
(8 workers/arm, tracets vs thinktime) showing utilization level + balance.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 21:10:27 +08:00
9b6091fe6e v2 exp(d): 5-policy routing under tracets vs thinktime — ranking flip
Extends exp(c) (dispatch ablation, 1 round-robin policy) to the full 5-policy
routing comparison, both modes on the SAME ttp trace (807 reqs, fresh vLLM/arm,
dash0 8xH20). Confirms exp(c)'s prediction and finds something stronger: the
dispatch mode FLIPS which policy wins.

- thinktime helps every policy but helps LPWL most (TTFT p90 -40%, E2E mean -31%
  vs -3..-16% for the rest): tracets bursts punish prefill-spreading.
- Ranking flip: tracets -> LPWL only ties unified_ab on TTFT p90 and is 3rd on
  E2E mean; thinktime -> LPWL is 1st on both (TTFT p90 -31%, best TPOT/balance,
  zero knobs) vs the tuned unified+A+B.
- => benchmark agentic routing with thinktime; tracets' burst artifact erases
  LPWL's advantage. Caveat n=1: tracets ranking is run-sensitive (does not
  reproduce dash1 lpwl_5policy_600s.md), the thinktime advantage is the robust
  signal (appears in both environments).

README + grouped-bar fig (figs/exp_d_policy_dispatch.png) + bench_report
summaries in results/.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 20:59:18 +08:00