Adds `--dispatch-mode {tracets,thinktime}` to the replayer and documents that
agentic serving should be benchmarked with `thinktime` (the faithful load).
- `tracets` (old default): turn-k at the absolute trace timestamp, i.e.
max(prev_finished, trace_ts) -- collapses inter-turn think-time to ~0 when the
system is behind, manufacturing request bursts.
- `thinktime`: turn-1 at trace arrival; turn-k at prev_finished +
time_to_parent_chat (real production gap). scripts/add_time_to_parent.py
annotates a trace with that gap from the raw trace's request_ready/end_ms.
exp(c) ablation (v2/exp_c_dispatch_ablation/): at N=8 (capacity slack) thinktime
beats tracets -- E2E p90 -28% (73.5 vs 102.8s), TTFT p90 -29%, TPS +7%, because
tracets' bursts spike concurrency -> KV pressure -> preemption. At N=6
(saturated) they converge. So tracets makes the system look ~30% worse on tail
latency than realistic agent pacing. Root README.md carries the headline
guidance; raw per-request metrics gitignored (perf_summary.json kept).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
87 lines
1.4 KiB
JSON
87 lines
1.4 KiB
JSON
{
|
|
"setup": "w600 first-300s window (366 req, 223 multi-turn), round-robin x N H20, Qwen3-Coder-30B-A3B",
|
|
"N6_tracets": {
|
|
"n": 366,
|
|
"ok": 366,
|
|
"wall_s": 974.8,
|
|
"tps": 110.9,
|
|
"ttft": {
|
|
"p50": 4.414,
|
|
"p90": 61.791,
|
|
"p99": 135.243
|
|
},
|
|
"tpot": {
|
|
"p50": 0.039,
|
|
"p90": 0.242,
|
|
"p99": 0.958
|
|
},
|
|
"e2e": {
|
|
"p50": 17.074,
|
|
"p90": 118.02,
|
|
"p99": 297.572
|
|
}
|
|
},
|
|
"N6_thinktime": {
|
|
"n": 366,
|
|
"ok": 366,
|
|
"wall_s": 1125.1,
|
|
"tps": 96.1,
|
|
"ttft": {
|
|
"p50": 4.52,
|
|
"p90": 83.662,
|
|
"p99": 130.373
|
|
},
|
|
"tpot": {
|
|
"p50": 0.037,
|
|
"p90": 0.264,
|
|
"p99": 0.694
|
|
},
|
|
"e2e": {
|
|
"p50": 15.029,
|
|
"p90": 119.68,
|
|
"p99": 338.466
|
|
}
|
|
},
|
|
"N8_tracets": {
|
|
"n": 366,
|
|
"ok": 366,
|
|
"wall_s": 967.2,
|
|
"tps": 111.8,
|
|
"ttft": {
|
|
"p50": 2.869,
|
|
"p90": 56.128,
|
|
"p99": 115.189
|
|
},
|
|
"tpot": {
|
|
"p50": 0.037,
|
|
"p90": 0.174,
|
|
"p99": 0.89
|
|
},
|
|
"e2e": {
|
|
"p50": 11.879,
|
|
"p90": 102.849,
|
|
"p99": 245.492
|
|
}
|
|
},
|
|
"N8_thinktime": {
|
|
"n": 365,
|
|
"ok": 365,
|
|
"wall_s": 787.0,
|
|
"tps": 119.3,
|
|
"ttft": {
|
|
"p50": 3.099,
|
|
"p90": 39.663,
|
|
"p99": 83.524
|
|
},
|
|
"tpot": {
|
|
"p50": 0.037,
|
|
"p90": 0.188,
|
|
"p99": 0.853
|
|
},
|
|
"e2e": {
|
|
"p50": 12.256,
|
|
"p90": 73.525,
|
|
"p99": 227.295
|
|
}
|
|
}
|
|
} |