Gahow Wang
cf812b6264
Workload characterization C1-C3 on full production trace
Joint/temporal characterizations of the full 051315 cluster trace (2.11M
req / 1.31M sessions / 2h), beyond the existing single-variable marginals:
- C1 mixture: 90.3% sessions single-turn, but multi-turn (9.7%) = 44% reqs /
67% prefill mass; continuation hazard rises 10%->94% (Lindy); heaviness
unpredictable at turn 1 (corr 0.04-0.15) => reactive routing justified.
- C2 resident/delta: resident context 11k->56k while new-prefill 2.7k->~200;
per-turn reuse ->99.6%; resident/delta ("PD tax") ->~250-450x.
- C3 prefill/decode: token mass 98.7% input / 1.3% output, BUT decode ~70% of
TIME (robust 68-71%); "decode negligible" is wrong (tokens != time). Correct
colo argument = roofline complementarity, not "no decode".
Maps each to (1) PD-colocation and (2) routing. compute_chars.py + chars.json
+ figs/workload_chars/. Raw-file exact validation (cached_tokens, real
timings) pending.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 18:19:39 +08:00
..
2026-05-27 14:49:44 +08:00
2026-05-29 18:19:23 +08:00
2026-05-27 22:04:49 +08:00
2026-05-27 22:04:49 +08:00
2026-05-29 00:38:20 +08:00
2026-05-25 16:23:52 +08:00
2026-05-28 16:03:25 +08:00
2026-05-29 18:19:39 +08:00
2026-05-22 00:44:22 +08:00
2026-05-26 01:12:42 +08:00
2026-05-25 23:25:27 +08:00
2026-05-25 16:18:41 +08:00
2026-05-25 10:47:14 +08:00
2026-05-22 13:50:25 +08:00
2026-05-22 01:27:22 +08:00
2026-05-29 17:07:23 +08:00
2026-05-22 10:15:08 +08:00
2026-05-22 17:54:21 +08:00
2026-05-25 10:47:14 +08:00
2026-05-25 10:47:14 +08:00