Commit Graph

1 Commits

Author SHA1 Message Date
cf812b6264 Workload characterization C1-C3 on full production trace
Joint/temporal characterizations of the full 051315 cluster trace (2.11M
req / 1.31M sessions / 2h), beyond the existing single-variable marginals:

- C1 mixture: 90.3% sessions single-turn, but multi-turn (9.7%) = 44% reqs /
  67% prefill mass; continuation hazard rises 10%->94% (Lindy); heaviness
  unpredictable at turn 1 (corr 0.04-0.15) => reactive routing justified.
- C2 resident/delta: resident context 11k->56k while new-prefill 2.7k->~200;
  per-turn reuse ->99.6%; resident/delta ("PD tax") ->~250-450x.
- C3 prefill/decode: token mass 98.7% input / 1.3% output, BUT decode ~70% of
  TIME (robust 68-71%); "decode negligible" is wrong (tokens != time). Correct
  colo argument = roofline complementarity, not "no decode".

Maps each to (1) PD-colocation and (2) routing. compute_chars.py + chars.json
+ figs/workload_chars/. Raw-file exact validation (cached_tokens, real
timings) pending.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 18:19:39 +08:00