agentic-kvc

gahow/agentic-kvc

Fork 0

Commit Graph

Author	SHA1	Message	Date
Gahow Wang	0b180c191e	v2 exp(d): expand figure to 6 panels (TTFT/E2E mean+p90, TPS, per-worker GPU util) Per request: TTFT mean+p90, E2E mean+p90, decode TPS (output goodput; total/ prefill TPS omitted as cache-miss-inflated), and per-worker GPU-util boxplots (8 workers/arm, tracets vs thinktime) showing utilization level + balance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 21:10:27 +08:00
Gahow Wang	9b6091fe6e	v2 exp(d): 5-policy routing under tracets vs thinktime — ranking flip Extends exp(c) (dispatch ablation, 1 round-robin policy) to the full 5-policy routing comparison, both modes on the SAME ttp trace (807 reqs, fresh vLLM/arm, dash0 8xH20). Confirms exp(c)'s prediction and finds something stronger: the dispatch mode FLIPS which policy wins. - thinktime helps every policy but helps LPWL most (TTFT p90 -40%, E2E mean -31% vs -3..-16% for the rest): tracets bursts punish prefill-spreading. - Ranking flip: tracets -> LPWL only ties unified_ab on TTFT p90 and is 3rd on E2E mean; thinktime -> LPWL is 1st on both (TTFT p90 -31%, best TPOT/balance, zero knobs) vs the tuned unified+A+B. - => benchmark agentic routing with thinktime; tracets' burst artifact erases LPWL's advantage. Caveat n=1: tracets ranking is run-sensitive (does not reproduce dash1 lpwl_5policy_600s.md), the thinktime advantage is the robust signal (appears in both environments). README + grouped-bar fig (figs/exp_d_policy_dispatch.png) + bench_report summaries in results/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 20:59:18 +08:00

Author

SHA1

Message

Date

Gahow Wang

0b180c191e

v2 exp(d): expand figure to 6 panels (TTFT/E2E mean+p90, TPS, per-worker GPU util)

Per request: TTFT mean+p90, E2E mean+p90, decode TPS (output goodput; total/
prefill TPS omitted as cache-miss-inflated), and per-worker GPU-util boxplots
(8 workers/arm, tracets vs thinktime) showing utilization level + balance.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-30 21:10:27 +08:00

Gahow Wang

9b6091fe6e

v2 exp(d): 5-policy routing under tracets vs thinktime — ranking flip

Extends exp(c) (dispatch ablation, 1 round-robin policy) to the full 5-policy
routing comparison, both modes on the SAME ttp trace (807 reqs, fresh vLLM/arm,
dash0 8xH20). Confirms exp(c)'s prediction and finds something stronger: the
dispatch mode FLIPS which policy wins.

- thinktime helps every policy but helps LPWL most (TTFT p90 -40%, E2E mean -31%
  vs -3..-16% for the rest): tracets bursts punish prefill-spreading.
- Ranking flip: tracets -> LPWL only ties unified_ab on TTFT p90 and is 3rd on
  E2E mean; thinktime -> LPWL is 1st on both (TTFT p90 -31%, best TPOT/balance,
  zero knobs) vs the tuned unified+A+B.
- => benchmark agentic routing with thinktime; tracets' burst artifact erases
  LPWL's advantage. Caveat n=1: tracets ranking is run-sensitive (does not
  reproduce dash1 lpwl_5policy_600s.md), the thinktime advantage is the robust
  signal (appears in both environments).

README + grouped-bar fig (figs/exp_d_policy_dispatch.png) + bench_report
summaries in results/.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-30 20:59:18 +08:00

2 Commits