Workload characterization C1-C3 on full production trace
Joint/temporal characterizations of the full 051315 cluster trace (2.11M
req / 1.31M sessions / 2h), beyond the existing single-variable marginals:
- C1 mixture: 90.3% sessions single-turn, but multi-turn (9.7%) = 44% reqs /
67% prefill mass; continuation hazard rises 10%->94% (Lindy); heaviness
unpredictable at turn 1 (corr 0.04-0.15) => reactive routing justified.
- C2 resident/delta: resident context 11k->56k while new-prefill 2.7k->~200;
per-turn reuse ->99.6%; resident/delta ("PD tax") ->~250-450x.
- C3 prefill/decode: token mass 98.7% input / 1.3% output, BUT decode ~70% of
TIME (robust 68-71%); "decode negligible" is wrong (tokens != time). Correct
colo argument = roofline complementarity, not "no decode".
Maps each to (1) PD-colocation and (2) routing. compute_chars.py + chars.json
+ figs/workload_chars/. Raw-file exact validation (cached_tokens, real
timings) pending.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
81
analysis/workload_chars/README.md
Normal file
81
analysis/workload_chars/README.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# Agentic workload characterization C1–C3 (full 051315 production trace)
|
||||
|
||||
Date 2026-05-29. Source: `trace-glm5.1-formatted/051315-051317.jsonl` on dash1
|
||||
(release file, 2,114,220 requests / 1,307,276 sessions / 2h, type=100% `coder`).
|
||||
This release file **is the full cluster-level production trace** — session skew
|
||||
reproduces 46.5/66.5/74.6/87.5/96.0 exactly. Compute: `compute_chars.py`
|
||||
(2-pass, ~65s, `~/ali-trace/.venv` python). Numbers: `chars.json`.
|
||||
|
||||
> ⚠️ **Cluster-level, not per-instance.** This is one cluster's aggregate stream.
|
||||
> Concurrent-session counts have NO denominator of "8 instances" — do not compare
|
||||
> them to a single deployment's instance count.
|
||||
|
||||
These three are NOT in the existing 13 analyzer figures (which are single-variable
|
||||
marginals on the older 041x traces). C1–C3 are joint/temporal and argument-bearing.
|
||||
|
||||
## C1 — the workload is a MIXTURE, not "multi-turn agentic" (`c1_session_mixture.png`)
|
||||
|
||||
- **90.3%** of sessions are single-turn; mean 1.62 turns, p99=18, max=3091.
|
||||
- But multi-turn sessions (9.7%) = **44.2% of requests** and **66.9% of input
|
||||
(prefill) mass**. Single-turn = **60.2% of output (decode) mass**.
|
||||
- Continuation hazard P(reach k+1 | reached k): turn1→2 only **10.2%**, but
|
||||
turn2→3 50.6%, turn5→6 87%, turn12→13 **94.3%** (Lindy / Pareto).
|
||||
- Predictability of heaviness at cold-start is near-zero:
|
||||
corr(turn1_input, session_mass)=0.15, corr(turn1_input, n_turns)=**0.04**.
|
||||
|
||||
**Routing:** heaviness is unpredictable at session start → proactive placement
|
||||
cannot pre-empt hot-pin → a REACTIVE mechanism (observable-load routing /
|
||||
migration) is required. But once a session has shown depth, it almost surely
|
||||
continues → "observed accumulated load" is the signal that works (not turn-1
|
||||
features, not cost-model prediction). The single/multi optimal strategies are
|
||||
opposite (load-balance the 90% one-shot sea vs affinity-pin the deep tail) and
|
||||
you can't tell them apart at turn 1 → the only viable policy starts everyone
|
||||
load-balanced and becomes sticky as turns accrue. This is exactly LPWL's
|
||||
emergent behavior (`new_uncached≈input`→by-load; `new_uncached≈0`→sticks), so
|
||||
C1 explains *why* a cache-aware-load score is the right shape — it auto-segments
|
||||
the mixture with no classifier.
|
||||
|
||||
## C2 — marginal work collapses while resident state explodes (`c2_work_amortization.png`)
|
||||
|
||||
Per turn: resident context grows 11k→56k+ tokens while new prefill collapses
|
||||
2.7k→~200 tokens; per-turn reuse climbs 83%→**99.6%**; resident/new ratio
|
||||
("the PD tax") grows to ~250× by turn 12, ~450× by turn 30.
|
||||
|
||||
**PD-colocation:** the dominant cost is keeping ~50k+ resident KV available for
|
||||
the next turn's tiny delta. Disaggregation physically splits a turn's prefill-KV
|
||||
(P) and decode-KV (D), and the next turn's prefix = [prevPrompt + prevAnswer]
|
||||
spans both → must be gathered/transferred; colocation keeps it local for free.
|
||||
**Routing:** route on delta (`input − cache_hit`), never total input — C2 is the
|
||||
trace-level justification for LPWL's score function.
|
||||
|
||||
## C3 — prefill/decode BALANCE (honest reframe) (`c3_prefill_decode_balance.png`)
|
||||
|
||||
- Token mass: 98.7% input / **1.3% output**; of input, 60% reused-prefix, 40%
|
||||
new-prefill (28.6B new-prefill tokens vs 0.94B decode tokens).
|
||||
- **But tokens ≠ time.** Under a per-request latency model (prefill@7k tok/s,
|
||||
TPOT 10ms), aggregate decode-time share ≈ **70% (robust 68–71% across
|
||||
constants)** — each decode token costs ~70–140× a prefill token. So this is
|
||||
NOT a "decode is negligible" workload.
|
||||
- Per-request the bottleneck FLIPS within a session: turn-1 (and the 90%
|
||||
single-turn) is prefill-bound; turns ≥3 are strongly decode-bound.
|
||||
|
||||
**PD-colocation (correct argument):** the workload has *substantial* work on both
|
||||
sides of the roofline — compute-bound prefill (~30% of time) and memory-bound
|
||||
decode (~70%). Colocation interleaves them on one GPU (chunked prefill +
|
||||
continuous batching) so compute and HBM bandwidth are both used; static
|
||||
disaggregation strands P-instances bandwidth-idle and D-instances compute-idle.
|
||||
The earlier "decode is 1.3% so nothing to isolate" instinct was WRONG (token vs
|
||||
time confusion) — C3b is the correction.
|
||||
|
||||
**Caveat:** C3b's 70% is a per-request-latency-weighted estimate; batched decode
|
||||
throughput will shift it. Ground-truth needs `-raw.jsonl` (`usage.cached_tokens`
|
||||
for exact reuse; `backend_first_response_time_ms` / `total_cost_time_ms` for real
|
||||
prefill vs decode wall time). Sampling that 522GB file is the next step.
|
||||
|
||||
## Goal mapping
|
||||
|
||||
| | argue PD-colocation | guide routing |
|
||||
|---|---|---|
|
||||
| C1 mixture + hazard | both segments favor colo (diff reasons) | reactive + auto-segment ⇒ LPWL shape |
|
||||
| C2 resident/delta | the PD tax (transfer/split resident KV) | route on delta, not total |
|
||||
| C3 prefill/decode | roofline complementarity (interleave) | per-req bottleneck flips within session |
|
||||
964
analysis/workload_chars/chars.json
Normal file
964
analysis/workload_chars/chars.json
Normal file
@@ -0,0 +1,964 @@
|
||||
{
|
||||
"mixture": {
|
||||
"single_sessions": 1179990,
|
||||
"multi_sessions": 127286,
|
||||
"req_single_pct": 55.81207253738968,
|
||||
"req_multi_pct": 44.187927462610325,
|
||||
"in_single_pct": 33.12487590117447,
|
||||
"in_multi_pct": 66.87512409882554,
|
||||
"out_single_pct": 60.24502960903973,
|
||||
"out_multi_pct": 39.75497039096027
|
||||
},
|
||||
"turns": {
|
||||
"mean": 1.6172713336739908,
|
||||
"p99": 18.0,
|
||||
"max": 3091,
|
||||
"single_turn_pct": 90.26326498765371
|
||||
},
|
||||
"hazard": {
|
||||
"1": 0.102101621998721,
|
||||
"2": 0.5062146469376287,
|
||||
"3": 0.7351961756478754,
|
||||
"4": 0.8113739305485657,
|
||||
"5": 0.8723731546954472,
|
||||
"6": 0.8669264241631353,
|
||||
"7": 0.9093235352011023,
|
||||
"8": 0.9240204920989971,
|
||||
"9": 0.901725753553022,
|
||||
"10": 0.9346178826585841,
|
||||
"11": 0.9260597637248089,
|
||||
"12": 0.9427685226874781,
|
||||
"13": 0.91950119395065,
|
||||
"14": 0.936865189289012,
|
||||
"15": 0.9382160896883085,
|
||||
"16": 0.9308646838684262,
|
||||
"17": 0.9371561574269995,
|
||||
"18": 0.9312862196131557,
|
||||
"19": 0.9333279456925813,
|
||||
"20": 0.9351459000779289,
|
||||
"21": 0.9399074074074074,
|
||||
"22": 0.9404984730568416,
|
||||
"23": 0.9473132921336546,
|
||||
"24": 0.9193940734188413,
|
||||
"25": 0.9497294046903187,
|
||||
"26": 0.9323793845764214,
|
||||
"27": 0.9483906016569333,
|
||||
"28": 0.9368466275239868,
|
||||
"29": 0.9472638336900031
|
||||
},
|
||||
"token_mass": {
|
||||
"total_input": 71116829368,
|
||||
"total_output": 940765734,
|
||||
"out_in_ratio_pct": 1.3228454394837104,
|
||||
"new_prefill": 28616906067,
|
||||
"reused_prefix": 42499923301,
|
||||
"new_prefill_pct_of_input": 40.23928839532401
|
||||
},
|
||||
"decode_time_fraction": {
|
||||
"optimistic_for_prefill": 0.6812079219496285,
|
||||
"mid": 0.6970810590484581,
|
||||
"pessimistic": 0.711448473592609
|
||||
},
|
||||
"per_turn": {
|
||||
"turn": [
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5,
|
||||
6,
|
||||
7,
|
||||
8,
|
||||
9,
|
||||
10,
|
||||
11,
|
||||
12,
|
||||
13,
|
||||
14,
|
||||
15,
|
||||
16,
|
||||
17,
|
||||
18,
|
||||
19,
|
||||
20,
|
||||
21,
|
||||
22,
|
||||
23,
|
||||
24,
|
||||
25,
|
||||
26,
|
||||
27,
|
||||
28,
|
||||
29,
|
||||
30,
|
||||
31,
|
||||
32,
|
||||
33,
|
||||
34,
|
||||
35,
|
||||
36,
|
||||
37,
|
||||
38,
|
||||
39,
|
||||
40,
|
||||
41,
|
||||
42,
|
||||
43,
|
||||
44,
|
||||
45,
|
||||
46,
|
||||
47,
|
||||
48,
|
||||
49,
|
||||
50,
|
||||
51,
|
||||
52,
|
||||
53,
|
||||
54,
|
||||
55,
|
||||
56,
|
||||
57,
|
||||
58,
|
||||
59,
|
||||
60,
|
||||
61,
|
||||
62,
|
||||
63,
|
||||
64,
|
||||
65,
|
||||
66,
|
||||
67,
|
||||
68,
|
||||
69,
|
||||
70,
|
||||
71,
|
||||
72,
|
||||
73,
|
||||
74,
|
||||
75,
|
||||
76,
|
||||
77,
|
||||
78,
|
||||
79,
|
||||
80,
|
||||
81,
|
||||
82,
|
||||
83,
|
||||
84,
|
||||
85,
|
||||
86,
|
||||
87,
|
||||
88,
|
||||
89,
|
||||
90,
|
||||
91,
|
||||
92,
|
||||
93,
|
||||
94,
|
||||
95,
|
||||
96,
|
||||
97,
|
||||
98,
|
||||
99,
|
||||
100,
|
||||
101,
|
||||
102,
|
||||
103,
|
||||
104,
|
||||
105,
|
||||
106,
|
||||
107,
|
||||
108,
|
||||
109,
|
||||
110,
|
||||
111,
|
||||
112,
|
||||
113,
|
||||
114,
|
||||
115,
|
||||
116,
|
||||
117,
|
||||
118,
|
||||
119,
|
||||
120,
|
||||
121,
|
||||
122,
|
||||
123,
|
||||
124,
|
||||
125,
|
||||
126,
|
||||
127,
|
||||
128,
|
||||
129,
|
||||
130,
|
||||
131,
|
||||
132,
|
||||
133,
|
||||
134,
|
||||
135,
|
||||
136,
|
||||
137,
|
||||
138,
|
||||
139,
|
||||
140,
|
||||
141,
|
||||
142,
|
||||
143,
|
||||
144,
|
||||
145,
|
||||
146,
|
||||
147,
|
||||
148
|
||||
],
|
||||
"med_resident_input": [
|
||||
11035.0,
|
||||
19505.0,
|
||||
28059.0,
|
||||
35089.0,
|
||||
41215.0,
|
||||
44750.0,
|
||||
47419.5,
|
||||
49874.0,
|
||||
51905.0,
|
||||
53068.0,
|
||||
54782.0,
|
||||
56414.0,
|
||||
58229.0,
|
||||
59123.5,
|
||||
60434.5,
|
||||
61320.0,
|
||||
62243.0,
|
||||
63411.0,
|
||||
64510.5,
|
||||
65423.0,
|
||||
66942.5,
|
||||
67965.0,
|
||||
68826.0,
|
||||
70165.5,
|
||||
70052.0,
|
||||
70936.0,
|
||||
71547.0,
|
||||
72648.0,
|
||||
73406.0,
|
||||
73844.0,
|
||||
73604.0,
|
||||
74937.5,
|
||||
74778.0,
|
||||
75460.0,
|
||||
75029.0,
|
||||
74978.0,
|
||||
75933.0,
|
||||
76590.0,
|
||||
74695.0,
|
||||
76813.0,
|
||||
77079.5,
|
||||
78310.0,
|
||||
77848.0,
|
||||
77549.0,
|
||||
78203.0,
|
||||
79102.0,
|
||||
79202.0,
|
||||
78821.0,
|
||||
79868.0,
|
||||
80229.5,
|
||||
80912.0,
|
||||
81620.0,
|
||||
81612.5,
|
||||
81836.5,
|
||||
82506.0,
|
||||
82948.0,
|
||||
82633.0,
|
||||
84107.5,
|
||||
84176.0,
|
||||
84441.0,
|
||||
84101.0,
|
||||
85192.0,
|
||||
84127.0,
|
||||
84783.5,
|
||||
85087.0,
|
||||
85771.5,
|
||||
86110.0,
|
||||
85374.5,
|
||||
87137.0,
|
||||
87677.0,
|
||||
88587.0,
|
||||
88656.0,
|
||||
88882.0,
|
||||
89284.0,
|
||||
91512.0,
|
||||
89850.0,
|
||||
90596.0,
|
||||
91244.0,
|
||||
92102.0,
|
||||
93431.0,
|
||||
92333.5,
|
||||
96682.0,
|
||||
94999.0,
|
||||
95226.5,
|
||||
95173.0,
|
||||
95910.0,
|
||||
96528.0,
|
||||
96508.0,
|
||||
97270.0,
|
||||
97301.0,
|
||||
97076.5,
|
||||
97105.0,
|
||||
98032.0,
|
||||
97962.5,
|
||||
97968.5,
|
||||
98310.0,
|
||||
97061.0,
|
||||
97631.0,
|
||||
100126.0,
|
||||
97765.0,
|
||||
101076.0,
|
||||
98198.5,
|
||||
98678.0,
|
||||
98307.0,
|
||||
99174.0,
|
||||
99882.0,
|
||||
99974.0,
|
||||
99757.0,
|
||||
100065.5,
|
||||
99943.0,
|
||||
100612.0,
|
||||
101138.0,
|
||||
106738.0,
|
||||
99621.0,
|
||||
101980.0,
|
||||
102252.0,
|
||||
103018.0,
|
||||
101238.0,
|
||||
102005.0,
|
||||
101897.0,
|
||||
103576.0,
|
||||
102159.5,
|
||||
102695.5,
|
||||
100590.5,
|
||||
103236.0,
|
||||
101812.0,
|
||||
103074.0,
|
||||
99966.0,
|
||||
102183.5,
|
||||
101882.0,
|
||||
102572.5,
|
||||
105622.5,
|
||||
106066.0,
|
||||
103974.0,
|
||||
105443.5,
|
||||
104716.0,
|
||||
105041.0,
|
||||
106628.0,
|
||||
108320.0,
|
||||
108022.5,
|
||||
107621.5,
|
||||
107664.0,
|
||||
107913.0,
|
||||
108630.0,
|
||||
108382.0,
|
||||
107216.5,
|
||||
105731.0,
|
||||
103986.0
|
||||
],
|
||||
"med_new_prefill": [
|
||||
11035.0,
|
||||
2920.0,
|
||||
1249.0,
|
||||
767.0,
|
||||
628.0,
|
||||
485.0,
|
||||
400.0,
|
||||
359.0,
|
||||
314.0,
|
||||
274.0,
|
||||
263.0,
|
||||
258.0,
|
||||
244.0,
|
||||
231.0,
|
||||
227.0,
|
||||
222.0,
|
||||
201.0,
|
||||
200.0,
|
||||
198.0,
|
||||
189.0,
|
||||
182.5,
|
||||
184.0,
|
||||
179.0,
|
||||
188.0,
|
||||
173.0,
|
||||
180.0,
|
||||
164.0,
|
||||
167.0,
|
||||
159.5,
|
||||
168.0,
|
||||
156.0,
|
||||
174.0,
|
||||
156.0,
|
||||
159.0,
|
||||
166.0,
|
||||
165.0,
|
||||
153.0,
|
||||
158.0,
|
||||
182.0,
|
||||
149.0,
|
||||
184.0,
|
||||
172.0,
|
||||
149.0,
|
||||
167.0,
|
||||
163.0,
|
||||
152.0,
|
||||
153.0,
|
||||
171.0,
|
||||
151.0,
|
||||
146.0,
|
||||
162.0,
|
||||
153.0,
|
||||
156.0,
|
||||
164.0,
|
||||
148.0,
|
||||
143.0,
|
||||
143.0,
|
||||
149.0,
|
||||
170.5,
|
||||
159.0,
|
||||
144.0,
|
||||
168.0,
|
||||
148.0,
|
||||
144.5,
|
||||
142.5,
|
||||
146.5,
|
||||
147.0,
|
||||
157.0,
|
||||
168.0,
|
||||
153.0,
|
||||
155.0,
|
||||
127.5,
|
||||
145.0,
|
||||
143.0,
|
||||
146.0,
|
||||
123.0,
|
||||
139.0,
|
||||
137.0,
|
||||
115.0,
|
||||
139.5,
|
||||
117.0,
|
||||
154.0,
|
||||
111.0,
|
||||
124.0,
|
||||
118.0,
|
||||
90.0,
|
||||
104.0,
|
||||
116.0,
|
||||
112.0,
|
||||
76.5,
|
||||
110.0,
|
||||
101.0,
|
||||
123.0,
|
||||
114.0,
|
||||
86.0,
|
||||
92.0,
|
||||
108.0,
|
||||
85.0,
|
||||
146.0,
|
||||
77.5,
|
||||
101.0,
|
||||
102.0,
|
||||
85.0,
|
||||
77.0,
|
||||
114.0,
|
||||
66.0,
|
||||
105.0,
|
||||
90.0,
|
||||
89.0,
|
||||
100.0,
|
||||
108.5,
|
||||
100.0,
|
||||
169.0,
|
||||
89.0,
|
||||
106.5,
|
||||
78.0,
|
||||
75.0,
|
||||
90.0,
|
||||
77.0,
|
||||
88.0,
|
||||
102.0,
|
||||
83.5,
|
||||
123.5,
|
||||
116.5,
|
||||
108.0,
|
||||
119.0,
|
||||
82.0,
|
||||
80.0,
|
||||
105.0,
|
||||
90.0,
|
||||
91.0,
|
||||
113.0,
|
||||
122.0,
|
||||
102.0,
|
||||
101.5,
|
||||
64.0,
|
||||
78.0,
|
||||
52.5,
|
||||
98.5,
|
||||
72.0,
|
||||
87.0,
|
||||
102.0,
|
||||
97.0,
|
||||
123.0,
|
||||
80.0,
|
||||
132.5,
|
||||
86.5,
|
||||
111.0
|
||||
],
|
||||
"med_output": [
|
||||
63.0,
|
||||
67.0,
|
||||
111.0,
|
||||
142.0,
|
||||
158.0,
|
||||
162.0,
|
||||
164.0,
|
||||
164.0,
|
||||
159.0,
|
||||
160.0,
|
||||
159.0,
|
||||
161.0,
|
||||
160.0,
|
||||
158.0,
|
||||
154.0,
|
||||
154.0,
|
||||
154.0,
|
||||
149.0,
|
||||
146.0,
|
||||
147.0,
|
||||
142.0,
|
||||
144.0,
|
||||
143.0,
|
||||
142.0,
|
||||
140.0,
|
||||
136.0,
|
||||
137.0,
|
||||
139.0,
|
||||
136.0,
|
||||
133.0,
|
||||
130.0,
|
||||
131.0,
|
||||
125.0,
|
||||
123.0,
|
||||
122.0,
|
||||
122.0,
|
||||
118.0,
|
||||
122.0,
|
||||
114.0,
|
||||
112.0,
|
||||
115.0,
|
||||
111.0,
|
||||
109.0,
|
||||
112.0,
|
||||
109.0,
|
||||
107.0,
|
||||
111.0,
|
||||
105.0,
|
||||
108.0,
|
||||
107.0,
|
||||
100.0,
|
||||
100.0,
|
||||
95.0,
|
||||
105.0,
|
||||
103.0,
|
||||
102.0,
|
||||
100.0,
|
||||
100.0,
|
||||
98.0,
|
||||
98.0,
|
||||
101.0,
|
||||
99.0,
|
||||
101.0,
|
||||
102.0,
|
||||
97.0,
|
||||
91.0,
|
||||
100.0,
|
||||
97.0,
|
||||
94.0,
|
||||
98.5,
|
||||
92.5,
|
||||
97.0,
|
||||
102.0,
|
||||
92.0,
|
||||
95.0,
|
||||
91.0,
|
||||
91.0,
|
||||
92.0,
|
||||
85.0,
|
||||
98.0,
|
||||
96.0,
|
||||
99.0,
|
||||
94.0,
|
||||
96.0,
|
||||
90.0,
|
||||
85.0,
|
||||
99.0,
|
||||
86.0,
|
||||
99.0,
|
||||
93.0,
|
||||
92.0,
|
||||
93.0,
|
||||
87.0,
|
||||
83.0,
|
||||
87.5,
|
||||
82.0,
|
||||
80.0,
|
||||
90.0,
|
||||
92.0,
|
||||
80.0,
|
||||
77.0,
|
||||
82.0,
|
||||
87.0,
|
||||
74.0,
|
||||
83.0,
|
||||
79.0,
|
||||
84.0,
|
||||
80.5,
|
||||
79.0,
|
||||
76.0,
|
||||
78.5,
|
||||
71.5,
|
||||
81.0,
|
||||
87.0,
|
||||
82.0,
|
||||
85.0,
|
||||
87.0,
|
||||
75.0,
|
||||
75.0,
|
||||
82.0,
|
||||
86.0,
|
||||
76.5,
|
||||
77.5,
|
||||
70.0,
|
||||
78.0,
|
||||
85.0,
|
||||
77.0,
|
||||
67.0,
|
||||
76.5,
|
||||
107.0,
|
||||
92.0,
|
||||
80.5,
|
||||
85.0,
|
||||
83.0,
|
||||
77.0,
|
||||
70.0,
|
||||
84.0,
|
||||
69.0,
|
||||
97.0,
|
||||
72.0,
|
||||
81.0,
|
||||
87.0,
|
||||
89.0,
|
||||
102.0,
|
||||
83.0,
|
||||
82.5,
|
||||
91.0,
|
||||
79.5
|
||||
],
|
||||
"resident_over_new": [
|
||||
1.0,
|
||||
6.679794520547945,
|
||||
22.46517213771017,
|
||||
45.748370273794,
|
||||
65.62898089171975,
|
||||
92.26804123711341,
|
||||
118.54875,
|
||||
138.92479108635098,
|
||||
165.30254777070064,
|
||||
193.67883211678833,
|
||||
208.29657794676805,
|
||||
218.65891472868216,
|
||||
238.64344262295083,
|
||||
255.94588744588745,
|
||||
266.23127753303964,
|
||||
276.2162162162162,
|
||||
309.6666666666667,
|
||||
317.055,
|
||||
325.81060606060606,
|
||||
346.15343915343914,
|
||||
366.8082191780822,
|
||||
369.375,
|
||||
384.5027932960894,
|
||||
373.22074468085106,
|
||||
404.9248554913295,
|
||||
394.0888888888889,
|
||||
436.2621951219512,
|
||||
435.0179640718563,
|
||||
460.2257053291536,
|
||||
439.54761904761904,
|
||||
471.8205128205128,
|
||||
430.67528735632186,
|
||||
479.34615384615387,
|
||||
474.59119496855345,
|
||||
451.98192771084337,
|
||||
454.41212121212124,
|
||||
496.29411764705884,
|
||||
484.746835443038,
|
||||
410.4120879120879,
|
||||
515.5234899328859,
|
||||
418.9103260869565,
|
||||
455.2906976744186,
|
||||
522.4697986577181,
|
||||
464.36526946107784,
|
||||
479.7730061349693,
|
||||
520.4078947368421,
|
||||
517.6601307189543,
|
||||
460.94152046783626,
|
||||
528.9271523178808,
|
||||
549.5171232876712,
|
||||
499.4567901234568,
|
||||
533.4640522875817,
|
||||
523.1570512820513,
|
||||
499.0030487804878,
|
||||
557.472972972973,
|
||||
580.0559440559441,
|
||||
577.8531468531469,
|
||||
564.4798657718121,
|
||||
493.7008797653959,
|
||||
531.0754716981132,
|
||||
584.0347222222222,
|
||||
507.0952380952381,
|
||||
568.4256756756756,
|
||||
586.7370242214533,
|
||||
597.1017543859649,
|
||||
585.4709897610921,
|
||||
585.7823129251701,
|
||||
543.7866242038217,
|
||||
518.672619047619,
|
||||
573.0522875816994,
|
||||
571.5290322580645,
|
||||
695.3411764705883,
|
||||
612.9793103448276,
|
||||
624.3636363636364,
|
||||
626.7945205479452,
|
||||
730.4878048780488,
|
||||
651.7697841726618,
|
||||
666.014598540146,
|
||||
800.8869565217391,
|
||||
669.7562724014336,
|
||||
789.1752136752136,
|
||||
627.8051948051948,
|
||||
855.8468468468468,
|
||||
767.9556451612904,
|
||||
806.5508474576271,
|
||||
1065.6666666666667,
|
||||
928.1538461538462,
|
||||
831.9655172413793,
|
||||
868.4821428571429,
|
||||
1271.9084967320262,
|
||||
882.5136363636364,
|
||||
961.4356435643564,
|
||||
797.0081300813008,
|
||||
859.3201754385965,
|
||||
1139.1686046511627,
|
||||
1068.5869565217392,
|
||||
898.7129629629629,
|
||||
1148.6,
|
||||
685.7945205479452,
|
||||
1261.483870967742,
|
||||
1000.7524752475248,
|
||||
962.7303921568628,
|
||||
1160.9176470588236,
|
||||
1276.7142857142858,
|
||||
869.9473684210526,
|
||||
1513.3636363636363,
|
||||
952.1333333333333,
|
||||
1108.411111111111,
|
||||
1124.3314606741574,
|
||||
999.43,
|
||||
927.2995391705069,
|
||||
1011.38,
|
||||
631.5857988165681,
|
||||
1119.3370786516855,
|
||||
957.5586854460093,
|
||||
1310.923076923077,
|
||||
1373.5733333333333,
|
||||
1124.8666666666666,
|
||||
1324.7402597402597,
|
||||
1157.9204545454545,
|
||||
1015.4509803921569,
|
||||
1223.4670658682635,
|
||||
831.5425101214574,
|
||||
863.4377682403433,
|
||||
955.8888888888889,
|
||||
855.563025210084,
|
||||
1257.0,
|
||||
1249.575,
|
||||
973.1761904761905,
|
||||
1132.0222222222221,
|
||||
1127.1703296703297,
|
||||
934.712389380531,
|
||||
869.3934426229508,
|
||||
1019.3529411764706,
|
||||
1038.8522167487686,
|
||||
1636.1875,
|
||||
1346.679487179487,
|
||||
2031.009523809524,
|
||||
1099.6954314720813,
|
||||
1500.3125,
|
||||
1237.028735632184,
|
||||
1055.5294117647059,
|
||||
1112.5051546391753,
|
||||
883.170731707317,
|
||||
1354.775,
|
||||
809.1811320754717,
|
||||
1222.3236994219653,
|
||||
936.8108108108108
|
||||
],
|
||||
"reuse_pct": [
|
||||
0.0,
|
||||
85.02947962061009,
|
||||
95.5486653123775,
|
||||
97.81412978426287,
|
||||
98.47628290670872,
|
||||
98.91620111731844,
|
||||
99.1564651672835,
|
||||
99.28018606889361,
|
||||
99.39504864656584,
|
||||
99.48368131453984,
|
||||
99.5199153006462,
|
||||
99.54266671393626,
|
||||
99.5809648113483,
|
||||
99.60929241333818,
|
||||
99.62438673274372,
|
||||
99.63796477495107,
|
||||
99.67707212055974,
|
||||
99.68459730961506,
|
||||
99.69307322063851,
|
||||
99.71111077144124,
|
||||
99.72737797363409,
|
||||
99.72927241962775,
|
||||
99.73992386598088,
|
||||
99.73206205328829,
|
||||
99.75304059841261,
|
||||
99.74625014097215,
|
||||
99.7707800466826,
|
||||
99.77012443563484,
|
||||
99.78271530937526,
|
||||
99.77249336438979,
|
||||
99.78805499701103,
|
||||
99.76780650542119,
|
||||
99.79138249217684,
|
||||
99.78929234031276,
|
||||
99.77875221580989,
|
||||
99.77993544773133,
|
||||
99.7985065781676,
|
||||
99.7937067502285,
|
||||
99.75634245933462,
|
||||
99.80602241808027,
|
||||
99.76128542608606,
|
||||
99.78036010726599,
|
||||
99.80860137704244,
|
||||
99.78465228436214,
|
||||
99.79156809841055,
|
||||
99.8078430381027,
|
||||
99.80682306002375,
|
||||
99.7830527397521,
|
||||
99.81093804777883,
|
||||
99.81802204924622,
|
||||
99.79978247973106,
|
||||
99.8125459446214,
|
||||
99.8088528105376,
|
||||
99.79960042279423,
|
||||
99.82061910648923,
|
||||
99.82760283551141,
|
||||
99.82694565125313,
|
||||
99.822845762863,
|
||||
99.79744820376354,
|
||||
99.81170284577398,
|
||||
99.82877730348034,
|
||||
99.80279838482487,
|
||||
99.82407550489143,
|
||||
99.82956589430727,
|
||||
99.8325243574224,
|
||||
99.82919734410615,
|
||||
99.82928811984671,
|
||||
99.81610434028896,
|
||||
99.80720015607606,
|
||||
99.82549585410084,
|
||||
99.8250307607211,
|
||||
99.85618570655116,
|
||||
99.83686235683265,
|
||||
99.83983692486895,
|
||||
99.84045808200017,
|
||||
99.86310517529216,
|
||||
99.84657159256479,
|
||||
99.84985314102846,
|
||||
99.87513843347593,
|
||||
99.85069195449047,
|
||||
99.87328542728262,
|
||||
99.84071492108149,
|
||||
99.88315666480699,
|
||||
99.8697841462198,
|
||||
99.87601525642776,
|
||||
99.90616202690022,
|
||||
99.89225924084204,
|
||||
99.87980271065611,
|
||||
99.88485658476407,
|
||||
99.9213779920042,
|
||||
99.88668730331234,
|
||||
99.89598887801864,
|
||||
99.87453076546434,
|
||||
99.88362893964528,
|
||||
99.91221668189266,
|
||||
99.90641847217984,
|
||||
99.88872976787793,
|
||||
99.91293748911718,
|
||||
99.8541837285021,
|
||||
99.92072827699074,
|
||||
99.90007519094543,
|
||||
99.89612875960428,
|
||||
99.91386124566772,
|
||||
99.92167393980083,
|
||||
99.88505051727267,
|
||||
99.93392202799302,
|
||||
99.89497269290015,
|
||||
99.90978076726445,
|
||||
99.91105825684177,
|
||||
99.89994296749147,
|
||||
99.89215998091679,
|
||||
99.90112519527774,
|
||||
99.84166838426802,
|
||||
99.91066140673152,
|
||||
99.89556775838399,
|
||||
99.92371787348903,
|
||||
99.9271971888408,
|
||||
99.91110057488295,
|
||||
99.92451350423998,
|
||||
99.91363828179436,
|
||||
99.90152158801267,
|
||||
99.91826506590185,
|
||||
99.87974156608614,
|
||||
99.8841838941053,
|
||||
99.89538533069859,
|
||||
99.883117903587,
|
||||
99.92044550517105,
|
||||
99.91997279074886,
|
||||
99.89724368415645,
|
||||
99.91166251153295,
|
||||
99.91128226376466,
|
||||
99.89301521929514,
|
||||
99.88497727829841,
|
||||
99.90189855156096,
|
||||
99.9037399175862,
|
||||
99.93888231024867,
|
||||
99.92574328119497,
|
||||
99.95076340173313,
|
||||
99.90906573116692,
|
||||
99.9333472193293,
|
||||
99.91916113415999,
|
||||
99.90526081141329,
|
||||
99.91011277603255,
|
||||
99.88677161005248,
|
||||
99.92618700522226,
|
||||
99.8764182751722,
|
||||
99.91818861071965,
|
||||
99.89325486123131
|
||||
]
|
||||
}
|
||||
}
|
||||
180
analysis/workload_chars/compute_chars.py
Normal file
180
analysis/workload_chars/compute_chars.py
Normal file
@@ -0,0 +1,180 @@
|
||||
import json, sys, math, statistics as st
|
||||
from collections import defaultdict, Counter
|
||||
import matplotlib; matplotlib.use("Agg")
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
PATH="/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl"
|
||||
OUT="/tmp/wlc_out"; import os; os.makedirs(OUT, exist_ok=True)
|
||||
BLOCK=512
|
||||
# --- transparent cost model for C3 (clearly-labeled estimate; raw-timing validation pending) ---
|
||||
PREFILL_TOK_S=7000.0 # MB1: 32k->4.5s ~7100 tok/s effective on H20 / 30B-A3B
|
||||
TPOT_S=0.010 # ~10ms/token decode (crossover unloaded ~5ms, loaded ~25ms)
|
||||
|
||||
def pct(v,p):
|
||||
if not v: return float('nan')
|
||||
s=sorted(v);k=(len(s)-1)*p;f=int(k)
|
||||
return s[f] if f+1>=len(s) else s[f]+(s[f+1]-s[f])*(k-f)
|
||||
|
||||
# ---------- Pass A: structure (scalars only) ----------
|
||||
parents={}; recs={}; childcount=Counter()
|
||||
for line in open(PATH):
|
||||
if not line.strip(): continue
|
||||
d=json.loads(line); cid=d["chat_id"]; pid=d["parent_chat_id"]
|
||||
parents[cid]=pid
|
||||
recs[cid]=(float(d["timestamp"]),int(d["input_length"]),int(d["output_length"]),int(d["turn"]))
|
||||
if pid!="-1": childcount[pid]+=1
|
||||
print(f"[A] records={len(recs)}", file=sys.stderr)
|
||||
|
||||
root_of={}
|
||||
def root(cid):
|
||||
path=[];c=cid
|
||||
while True:
|
||||
if c in root_of:r=root_of[c];break
|
||||
p=parents.get(c,"-1")
|
||||
if p=="-1" or p not in recs:r=c;break
|
||||
path.append(c);c=p
|
||||
for x in path:root_of[x]=r
|
||||
root_of[cid]=r;return r
|
||||
sessions=defaultdict(list)
|
||||
for cid in recs: sessions[root(cid)].append(cid)
|
||||
seq={r:sorted(m,key=lambda c:(recs[c][3],recs[c][0])) for r,m in sessions.items()}
|
||||
print(f"[A] sessions={len(seq)}", file=sys.stderr)
|
||||
|
||||
# ---------- C1: mixture + turn tail + hazard ----------
|
||||
sr=mr=sm=mm=so=mo=0
|
||||
turns_per=[]
|
||||
for r,s in seq.items():
|
||||
multi=len(s)>1; turns_per.append(len(s))
|
||||
for c in s:
|
||||
_,inl,outl,_=recs[c]
|
||||
if multi: mr+=1;mm+=inl;mo+=outl
|
||||
else: sr+=1;sm+=inl;so+=outl
|
||||
tot_r=sr+mr; tot_in=sm+mm; tot_out=so+mo
|
||||
cnt_turn=Counter()
|
||||
for r,s in seq.items():
|
||||
for c in s: cnt_turn[recs[c][3]]+=1
|
||||
hazard={k: (cnt_turn[k+1]/cnt_turn[k] if cnt_turn[k] else 0) for k in range(1,30)}
|
||||
|
||||
# ---------- C2/C3: per-turn resident vs new-prefill (scalar) + hash_ids reuse ----------
|
||||
by_in=defaultdict(list); by_new=defaultdict(list); by_out=defaultdict(list)
|
||||
by_reuse_hash=defaultdict(list) # hash-block prefix stability: reused/parent_blocks
|
||||
store={} # cid -> (blockset, in, out) for chats with pending children
|
||||
tot_new_prefill=0; tot_reused=0
|
||||
for line in open(PATH):
|
||||
if not line.strip(): continue
|
||||
d=json.loads(line); cid=d["chat_id"]; pid=d["parent_chat_id"]
|
||||
inl=int(d["input_length"]); outl=int(d["output_length"]); turn=int(d["turn"])
|
||||
blocks=set(d["hash_ids"])
|
||||
if pid in store:
|
||||
pblk,pin,pout=store[pid]
|
||||
new_prefill=max(0, inl - pin - pout) # actual recompute (accounts for cached answer)
|
||||
reused_blk=len(blocks & pblk)
|
||||
by_reuse_hash[turn].append(reused_blk/len(pblk) if pblk else 0)
|
||||
childcount[pid]-=1
|
||||
if childcount[pid]<=0: del store[pid]
|
||||
tot_reused += (inl-new_prefill)
|
||||
else:
|
||||
new_prefill=inl # session start: all new (intra-session)
|
||||
tot_new_prefill+=new_prefill
|
||||
by_in[turn].append(inl); by_new[turn].append(new_prefill); by_out[turn].append(outl)
|
||||
if childcount[cid]>0: store[cid]=(blocks,inl,outl)
|
||||
print(f"[B] done; store residual={len(store)}", file=sys.stderr)
|
||||
|
||||
TURNS=[t for t in sorted(by_in) if len(by_in[t])>=50]
|
||||
med_in=[pct(by_in[t],.5) for t in TURNS]
|
||||
med_new=[max(pct(by_new[t],.5),1) for t in TURNS]
|
||||
med_out=[pct(by_out[t],.5) for t in TURNS]
|
||||
ratio=[med_in[i]/med_new[i] for i in range(len(TURNS))]
|
||||
reuse_pct=[(1-med_new[i]/med_in[i])*100 for i in range(len(TURNS))]
|
||||
# C3 time per turn (cost model)
|
||||
t_pref=[med_new[i]/PREFILL_TOK_S for i in range(len(TURNS))]
|
||||
t_dec=[med_out[i]*TPOT_S for i in range(len(TURNS))]
|
||||
|
||||
# aggregate decode/prefill time fraction over a RANGE of constants
|
||||
def agg_time(prate,tpot):
|
||||
tp=tot_new_prefill/prate; td=tot_out*tpot; return td/(tp+td)
|
||||
frac_lo=agg_time(13000,0.005); frac_mid=agg_time(7000,0.010); frac_hi=agg_time(3000,0.025)
|
||||
|
||||
chars={
|
||||
"mixture":{"single_sessions":sr if False else sum(1 for s in seq.values() if len(s)==1),
|
||||
"multi_sessions":sum(1 for s in seq.values() if len(s)>1),
|
||||
"req_single_pct":sr/tot_r*100,"req_multi_pct":mr/tot_r*100,
|
||||
"in_single_pct":sm/tot_in*100,"in_multi_pct":mm/tot_in*100,
|
||||
"out_single_pct":so/tot_out*100,"out_multi_pct":mo/tot_out*100},
|
||||
"turns":{"mean":st.mean(turns_per),"p99":pct(turns_per,.99),"max":max(turns_per),
|
||||
"single_turn_pct":sum(1 for x in turns_per if x==1)/len(turns_per)*100},
|
||||
"hazard":hazard,
|
||||
"token_mass":{"total_input":tot_in,"total_output":tot_out,"out_in_ratio_pct":tot_out/tot_in*100,
|
||||
"new_prefill":tot_new_prefill,"reused_prefix":tot_reused,
|
||||
"new_prefill_pct_of_input":tot_new_prefill/tot_in*100},
|
||||
"decode_time_fraction":{"optimistic_for_prefill":frac_lo,"mid":frac_mid,"pessimistic":frac_hi},
|
||||
"per_turn":{"turn":TURNS,"med_resident_input":med_in,"med_new_prefill":med_new,
|
||||
"med_output":med_out,"resident_over_new":ratio,"reuse_pct":reuse_pct},
|
||||
}
|
||||
json.dump(chars, open(f"{OUT}/chars.json","w"), indent=2)
|
||||
|
||||
# ================= FIGURES =================
|
||||
plt.rcParams.update({"figure.dpi":140,"font.size":10,"axes.grid":True,"grid.alpha":.3})
|
||||
|
||||
# ---- C1 ----
|
||||
fig,ax=plt.subplots(1,3,figsize=(15,4.2))
|
||||
cats=["% sessions","% requests","% input\ntokens","% output\ntokens"];
|
||||
singv=[chars["mixture"]["single_sessions"]/len(seq)*100, chars["mixture"]["req_single_pct"],
|
||||
chars["mixture"]["in_single_pct"], chars["mixture"]["out_single_pct"]]
|
||||
multv=[100-x for x in singv]
|
||||
x=np.arange(len(cats))
|
||||
ax[0].bar(x,singv,label="single-turn",color="#7fb3d5")
|
||||
ax[0].bar(x,multv,bottom=singv,label="multi-turn",color="#e74c3c")
|
||||
for i in range(len(cats)):
|
||||
ax[0].text(i,singv[i]/2,f"{singv[i]:.0f}",ha="center",va="center",fontsize=9)
|
||||
ax[0].text(i,singv[i]+multv[i]/2,f"{multv[i]:.0f}",ha="center",va="center",color="white",fontsize=9)
|
||||
ax[0].set_xticks(x);ax[0].set_xticklabels(cats);ax[0].set_ylabel("%");ax[0].set_ylim(0,100)
|
||||
ax[0].set_title("C1a Mixture: 90% sessions single-turn,\nbut multi-turn carries 2/3 prefill mass");ax[0].legend(loc="center right")
|
||||
# turn CCDF log-log
|
||||
tc=sorted(turns_per); n=len(tc); xs=sorted(set(tc))
|
||||
ccdf=[sum(1 for v in tc if v>=xx)/n for xx in xs]
|
||||
ax[1].loglog(xs,ccdf,marker=".",ms=3,color="#34495e")
|
||||
ax[1].set_xlabel("turns per session (k)");ax[1].set_ylabel("P(turns >= k)")
|
||||
ax[1].set_title(f"C1b Heavy-tailed session length\n(p99={chars['turns']['p99']:.0f}, max={chars['turns']['max']})")
|
||||
# hazard
|
||||
hk=list(range(1,20)); hv=[hazard[k]*100 for k in hk]
|
||||
ax[2].plot(hk,hv,marker="o",color="#16a085")
|
||||
ax[2].set_xlabel("reached turn k");ax[2].set_ylabel("P(continue to k+1) %");ax[2].set_ylim(0,100)
|
||||
ax[2].set_title("C1c Continuation hazard rises 10%->94%\n(unpredictable at start, Lindy after)")
|
||||
fig.tight_layout(); fig.savefig(f"{OUT}/c1_session_mixture.png"); plt.close(fig)
|
||||
|
||||
# ---- C2 ----
|
||||
fig,ax=plt.subplots(1,3,figsize=(15,4.2))
|
||||
ax[0].semilogy(TURNS,med_in,marker="o",label="resident context (input)",color="#e74c3c")
|
||||
ax[0].semilogy(TURNS,med_new,marker="s",label="new prefill this turn",color="#2980b9")
|
||||
ax[0].set_xlabel("turn");ax[0].set_ylabel("tokens (median, log)");ax[0].legend()
|
||||
ax[0].set_xlim(1,30)
|
||||
ax[0].set_title("C2a Resident state explodes,\nmarginal work collapses")
|
||||
ax[1].plot(TURNS,ratio,marker="o",color="#8e44ad")
|
||||
ax[1].set_xlabel("turn");ax[1].set_ylabel("resident / new-prefill");ax[1].set_xlim(1,30)
|
||||
ax[1].set_title("C2b The PD tax = resident/delta\n(grows to ~250x by deep turns)")
|
||||
ax[2].plot(TURNS,reuse_pct,marker="o",color="#27ae60")
|
||||
ax[2].set_xlabel("turn");ax[2].set_ylabel("per-turn reuse %");ax[2].set_ylim(50,100);ax[2].set_xlim(1,30)
|
||||
ax[2].set_title("C2c Per-turn reuse climbs to 99.6%\n(deep turns are near-pure cache hits)")
|
||||
fig.tight_layout(); fig.savefig(f"{OUT}/c2_work_amortization.png"); plt.close(fig)
|
||||
|
||||
# ---- C3 ----
|
||||
fig,ax=plt.subplots(1,2,figsize=(11,4.4))
|
||||
# token mass decomposition
|
||||
vals=[tot_reused/1e9, tot_new_prefill/1e9, tot_out/1e9]
|
||||
labs=[f"reused prefix\n{tot_reused/tot_in*100:.0f}% of input",
|
||||
f"new prefill\n{tot_new_prefill/tot_in*100:.0f}% of input",
|
||||
f"decode output\n{tot_out/tot_in*100:.1f}% of input"]
|
||||
ax[0].bar(range(3),vals,color=["#95a5a6","#2980b9","#e67e22"])
|
||||
ax[0].set_xticks(range(3));ax[0].set_xticklabels(labs,fontsize=8.5)
|
||||
ax[0].set_ylabel("tokens (billions)")
|
||||
ax[0].set_title("C3a Token mass: prefill-dominated\n(but tokens != time, see C3b)")
|
||||
# per-turn prefill vs decode TIME (cost model)
|
||||
ax[1].semilogy(TURNS,t_pref,marker="o",label="prefill time (new tok / 7k·s⁻¹)",color="#2980b9")
|
||||
ax[1].semilogy(TURNS,t_dec,marker="s",label="decode time (out·10ms)",color="#e67e22")
|
||||
ax[1].set_xlabel("turn");ax[1].set_ylabel("seconds (median, log)");ax[1].legend(fontsize=8);ax[1].set_xlim(1,30)
|
||||
ax[1].set_title(f"C3b Prefill→decode bottleneck flips within a session\n(agg decode-time share ≈ {frac_mid*100:.0f}%, range {frac_lo*100:.0f}–{frac_hi*100:.0f}%)")
|
||||
fig.tight_layout(); fig.savefig(f"{OUT}/c3_prefill_decode_balance.png"); plt.close(fig)
|
||||
print("FIGURES + chars.json written to", OUT)
|
||||
print(json.dumps({k:chars[k] for k in ["mixture","turns","token_mass","decode_time_fraction"]}, indent=2))
|
||||
BIN
figs/workload_chars/c1_session_mixture.png
Normal file
BIN
figs/workload_chars/c1_session_mixture.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 111 KiB |
BIN
figs/workload_chars/c2_work_amortization.png
Normal file
BIN
figs/workload_chars/c2_work_amortization.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 108 KiB |
BIN
figs/workload_chars/c3_prefill_decode_balance.png
Normal file
BIN
figs/workload_chars/c3_prefill_decode_balance.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 89 KiB |
Reference in New Issue
Block a user