agentic-kvc

Files

Gahow Wang 8a6b22c11c Replayer think-time dispatch mode + benchmarking guidance

Adds `--dispatch-mode {tracets,thinktime}` to the replayer and documents that
agentic serving should be benchmarked with `thinktime` (the faithful load).

- `tracets` (old default): turn-k at the absolute trace timestamp, i.e.
  max(prev_finished, trace_ts) -- collapses inter-turn think-time to ~0 when the
  system is behind, manufacturing request bursts.
- `thinktime`: turn-1 at trace arrival; turn-k at prev_finished +
  time_to_parent_chat (real production gap). scripts/add_time_to_parent.py
  annotates a trace with that gap from the raw trace's request_ready/end_ms.

exp(c) ablation (v2/exp_c_dispatch_ablation/): at N=8 (capacity slack) thinktime
beats tracets -- E2E p90 -28% (73.5 vs 102.8s), TTFT p90 -29%, TPS +7%, because
tracets' bursts spike concurrency -> KV pressure -> preemption. At N=6
(saturated) they converge. So tracets makes the system look ~30% worse on tail
latency than realistic agent pacing. Root README.md carries the headline
guidance; raw per-request metrics gitignored (perf_summary.json kept).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-30 16:28:36 +08:00

__init__.py

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

__main__.py

Replayer think-time dispatch mode + benchmarking guidance

2026-05-30 16:28:36 +08:00

metrics.py

A1: replayer instrumentation for cross-process join

2026-05-25 16:18:52 +08:00

replay.py

Replayer think-time dispatch mode + benchmarking guidance

2026-05-30 16:28:36 +08:00

srr.py

A4: open-loop session-causal SRR loadgen

2026-05-25 16:19:20 +08:00

trace.py

Replayer think-time dispatch mode + benchmarking guidance

2026-05-30 16:28:36 +08:00