Gahow Wang
cf812b6264
Workload characterization C1-C3 on full production trace
Joint/temporal characterizations of the full 051315 cluster trace (2.11M
req / 1.31M sessions / 2h), beyond the existing single-variable marginals:
- C1 mixture: 90.3% sessions single-turn, but multi-turn (9.7%) = 44% reqs /
67% prefill mass; continuation hazard rises 10%->94% (Lindy); heaviness
unpredictable at turn 1 (corr 0.04-0.15) => reactive routing justified.
- C2 resident/delta: resident context 11k->56k while new-prefill 2.7k->~200;
per-turn reuse ->99.6%; resident/delta ("PD tax") ->~250-450x.
- C3 prefill/decode: token mass 98.7% input / 1.3% output, BUT decode ~70% of
TIME (robust 68-71%); "decode negligible" is wrong (tokens != time). Correct
colo argument = roofline complementarity, not "no decode".
Maps each to (1) PD-colocation and (2) routing. compute_chars.py + chars.json
+ figs/workload_chars/. Raw-file exact validation (cached_tokens, real
timings) pending.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 18:19:39 +08:00
..
2026-05-29 00:38:20 +08:00
2026-05-27 13:55:10 +08:00
2026-05-28 17:11:12 +08:00
2026-05-29 18:19:39 +08:00
2026-05-29 18:19:23 +08:00
2026-05-27 01:44:13 +08:00
2026-05-27 10:53:30 +08:00
2026-05-27 10:41:53 +08:00
2026-05-27 11:28:47 +08:00
2026-05-27 14:49:44 +08:00
2026-05-27 10:57:43 +08:00
2026-05-27 01:44:13 +08:00
2026-05-27 09:13:26 +08:00
2026-05-27 11:07:12 +08:00
2026-05-27 01:44:13 +08:00
2026-05-27 10:57:43 +08:00
2026-05-27 11:15:18 +08:00
2026-05-27 22:04:49 +08:00
2026-05-27 20:56:08 +08:00
2026-05-27 20:56:08 +08:00
2026-05-27 19:04:03 +08:00
2026-05-27 20:56:08 +08:00
2026-05-27 20:56:08 +08:00
2026-05-27 19:04:03 +08:00