Workload characterization C1-C3 on full production trace

Joint/temporal characterizations of the full 051315 cluster trace (2.11M
req / 1.31M sessions / 2h), beyond the existing single-variable marginals:

- C1 mixture: 90.3% sessions single-turn, but multi-turn (9.7%) = 44% reqs /
  67% prefill mass; continuation hazard rises 10%->94% (Lindy); heaviness
  unpredictable at turn 1 (corr 0.04-0.15) => reactive routing justified.
- C2 resident/delta: resident context 11k->56k while new-prefill 2.7k->~200;
  per-turn reuse ->99.6%; resident/delta ("PD tax") ->~250-450x.
- C3 prefill/decode: token mass 98.7% input / 1.3% output, BUT decode ~70% of
  TIME (robust 68-71%); "decode negligible" is wrong (tokens != time). Correct
  colo argument = roofline complementarity, not "no decode".

Maps each to (1) PD-colocation and (2) routing. compute_chars.py + chars.json
+ figs/workload_chars/. Raw-file exact validation (cached_tokens, real
timings) pending.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-05-29 18:19:39 +08:00
parent 847f52f03b
commit cf812b6264
6 changed files with 1225 additions and 0 deletions

View File

@@ -0,0 +1,81 @@
# Agentic workload characterization C1C3 (full 051315 production trace)
Date 2026-05-29. Source: `trace-glm5.1-formatted/051315-051317.jsonl` on dash1
(release file, 2,114,220 requests / 1,307,276 sessions / 2h, type=100% `coder`).
This release file **is the full cluster-level production trace** — session skew
reproduces 46.5/66.5/74.6/87.5/96.0 exactly. Compute: `compute_chars.py`
(2-pass, ~65s, `~/ali-trace/.venv` python). Numbers: `chars.json`.
> ⚠️ **Cluster-level, not per-instance.** This is one cluster's aggregate stream.
> Concurrent-session counts have NO denominator of "8 instances" — do not compare
> them to a single deployment's instance count.
These three are NOT in the existing 13 analyzer figures (which are single-variable
marginals on the older 041x traces). C1C3 are joint/temporal and argument-bearing.
## C1 — the workload is a MIXTURE, not "multi-turn agentic" (`c1_session_mixture.png`)
- **90.3%** of sessions are single-turn; mean 1.62 turns, p99=18, max=3091.
- But multi-turn sessions (9.7%) = **44.2% of requests** and **66.9% of input
(prefill) mass**. Single-turn = **60.2% of output (decode) mass**.
- Continuation hazard P(reach k+1 | reached k): turn1→2 only **10.2%**, but
turn2→3 50.6%, turn5→6 87%, turn12→13 **94.3%** (Lindy / Pareto).
- Predictability of heaviness at cold-start is near-zero:
corr(turn1_input, session_mass)=0.15, corr(turn1_input, n_turns)=**0.04**.
**Routing:** heaviness is unpredictable at session start → proactive placement
cannot pre-empt hot-pin → a REACTIVE mechanism (observable-load routing /
migration) is required. But once a session has shown depth, it almost surely
continues → "observed accumulated load" is the signal that works (not turn-1
features, not cost-model prediction). The single/multi optimal strategies are
opposite (load-balance the 90% one-shot sea vs affinity-pin the deep tail) and
you can't tell them apart at turn 1 → the only viable policy starts everyone
load-balanced and becomes sticky as turns accrue. This is exactly LPWL's
emergent behavior (`new_uncached≈input`→by-load; `new_uncached≈0`→sticks), so
C1 explains *why* a cache-aware-load score is the right shape — it auto-segments
the mixture with no classifier.
## C2 — marginal work collapses while resident state explodes (`c2_work_amortization.png`)
Per turn: resident context grows 11k→56k+ tokens while new prefill collapses
2.7k→~200 tokens; per-turn reuse climbs 83%→**99.6%**; resident/new ratio
("the PD tax") grows to ~250× by turn 12, ~450× by turn 30.
**PD-colocation:** the dominant cost is keeping ~50k+ resident KV available for
the next turn's tiny delta. Disaggregation physically splits a turn's prefill-KV
(P) and decode-KV (D), and the next turn's prefix = [prevPrompt + prevAnswer]
spans both → must be gathered/transferred; colocation keeps it local for free.
**Routing:** route on delta (`input cache_hit`), never total input — C2 is the
trace-level justification for LPWL's score function.
## C3 — prefill/decode BALANCE (honest reframe) (`c3_prefill_decode_balance.png`)
- Token mass: 98.7% input / **1.3% output**; of input, 60% reused-prefix, 40%
new-prefill (28.6B new-prefill tokens vs 0.94B decode tokens).
- **But tokens ≠ time.** Under a per-request latency model (prefill@7k tok/s,
TPOT 10ms), aggregate decode-time share ≈ **70% (robust 6871% across
constants)** — each decode token costs ~70140× a prefill token. So this is
NOT a "decode is negligible" workload.
- Per-request the bottleneck FLIPS within a session: turn-1 (and the 90%
single-turn) is prefill-bound; turns ≥3 are strongly decode-bound.
**PD-colocation (correct argument):** the workload has *substantial* work on both
sides of the roofline — compute-bound prefill (~30% of time) and memory-bound
decode (~70%). Colocation interleaves them on one GPU (chunked prefill +
continuous batching) so compute and HBM bandwidth are both used; static
disaggregation strands P-instances bandwidth-idle and D-instances compute-idle.
The earlier "decode is 1.3% so nothing to isolate" instinct was WRONG (token vs
time confusion) — C3b is the correction.
**Caveat:** C3b's 70% is a per-request-latency-weighted estimate; batched decode
throughput will shift it. Ground-truth needs `-raw.jsonl` (`usage.cached_tokens`
for exact reuse; `backend_first_response_time_ms` / `total_cost_time_ms` for real
prefill vs decode wall time). Sampling that 522GB file is the next step.
## Goal mapping
| | argue PD-colocation | guide routing |
|---|---|---|
| C1 mixture + hazard | both segments favor colo (diff reasons) | reactive + auto-segment ⇒ LPWL shape |
| C2 resident/delta | the PD tax (transfer/split resident KV) | route on delta, not total |
| C3 prefill/decode | roofline complementarity (interleave) | per-req bottleneck flips within session |

View File

@@ -0,0 +1,964 @@
{
"mixture": {
"single_sessions": 1179990,
"multi_sessions": 127286,
"req_single_pct": 55.81207253738968,
"req_multi_pct": 44.187927462610325,
"in_single_pct": 33.12487590117447,
"in_multi_pct": 66.87512409882554,
"out_single_pct": 60.24502960903973,
"out_multi_pct": 39.75497039096027
},
"turns": {
"mean": 1.6172713336739908,
"p99": 18.0,
"max": 3091,
"single_turn_pct": 90.26326498765371
},
"hazard": {
"1": 0.102101621998721,
"2": 0.5062146469376287,
"3": 0.7351961756478754,
"4": 0.8113739305485657,
"5": 0.8723731546954472,
"6": 0.8669264241631353,
"7": 0.9093235352011023,
"8": 0.9240204920989971,
"9": 0.901725753553022,
"10": 0.9346178826585841,
"11": 0.9260597637248089,
"12": 0.9427685226874781,
"13": 0.91950119395065,
"14": 0.936865189289012,
"15": 0.9382160896883085,
"16": 0.9308646838684262,
"17": 0.9371561574269995,
"18": 0.9312862196131557,
"19": 0.9333279456925813,
"20": 0.9351459000779289,
"21": 0.9399074074074074,
"22": 0.9404984730568416,
"23": 0.9473132921336546,
"24": 0.9193940734188413,
"25": 0.9497294046903187,
"26": 0.9323793845764214,
"27": 0.9483906016569333,
"28": 0.9368466275239868,
"29": 0.9472638336900031
},
"token_mass": {
"total_input": 71116829368,
"total_output": 940765734,
"out_in_ratio_pct": 1.3228454394837104,
"new_prefill": 28616906067,
"reused_prefix": 42499923301,
"new_prefill_pct_of_input": 40.23928839532401
},
"decode_time_fraction": {
"optimistic_for_prefill": 0.6812079219496285,
"mid": 0.6970810590484581,
"pessimistic": 0.711448473592609
},
"per_turn": {
"turn": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61,
62,
63,
64,
65,
66,
67,
68,
69,
70,
71,
72,
73,
74,
75,
76,
77,
78,
79,
80,
81,
82,
83,
84,
85,
86,
87,
88,
89,
90,
91,
92,
93,
94,
95,
96,
97,
98,
99,
100,
101,
102,
103,
104,
105,
106,
107,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
118,
119,
120,
121,
122,
123,
124,
125,
126,
127,
128,
129,
130,
131,
132,
133,
134,
135,
136,
137,
138,
139,
140,
141,
142,
143,
144,
145,
146,
147,
148
],
"med_resident_input": [
11035.0,
19505.0,
28059.0,
35089.0,
41215.0,
44750.0,
47419.5,
49874.0,
51905.0,
53068.0,
54782.0,
56414.0,
58229.0,
59123.5,
60434.5,
61320.0,
62243.0,
63411.0,
64510.5,
65423.0,
66942.5,
67965.0,
68826.0,
70165.5,
70052.0,
70936.0,
71547.0,
72648.0,
73406.0,
73844.0,
73604.0,
74937.5,
74778.0,
75460.0,
75029.0,
74978.0,
75933.0,
76590.0,
74695.0,
76813.0,
77079.5,
78310.0,
77848.0,
77549.0,
78203.0,
79102.0,
79202.0,
78821.0,
79868.0,
80229.5,
80912.0,
81620.0,
81612.5,
81836.5,
82506.0,
82948.0,
82633.0,
84107.5,
84176.0,
84441.0,
84101.0,
85192.0,
84127.0,
84783.5,
85087.0,
85771.5,
86110.0,
85374.5,
87137.0,
87677.0,
88587.0,
88656.0,
88882.0,
89284.0,
91512.0,
89850.0,
90596.0,
91244.0,
92102.0,
93431.0,
92333.5,
96682.0,
94999.0,
95226.5,
95173.0,
95910.0,
96528.0,
96508.0,
97270.0,
97301.0,
97076.5,
97105.0,
98032.0,
97962.5,
97968.5,
98310.0,
97061.0,
97631.0,
100126.0,
97765.0,
101076.0,
98198.5,
98678.0,
98307.0,
99174.0,
99882.0,
99974.0,
99757.0,
100065.5,
99943.0,
100612.0,
101138.0,
106738.0,
99621.0,
101980.0,
102252.0,
103018.0,
101238.0,
102005.0,
101897.0,
103576.0,
102159.5,
102695.5,
100590.5,
103236.0,
101812.0,
103074.0,
99966.0,
102183.5,
101882.0,
102572.5,
105622.5,
106066.0,
103974.0,
105443.5,
104716.0,
105041.0,
106628.0,
108320.0,
108022.5,
107621.5,
107664.0,
107913.0,
108630.0,
108382.0,
107216.5,
105731.0,
103986.0
],
"med_new_prefill": [
11035.0,
2920.0,
1249.0,
767.0,
628.0,
485.0,
400.0,
359.0,
314.0,
274.0,
263.0,
258.0,
244.0,
231.0,
227.0,
222.0,
201.0,
200.0,
198.0,
189.0,
182.5,
184.0,
179.0,
188.0,
173.0,
180.0,
164.0,
167.0,
159.5,
168.0,
156.0,
174.0,
156.0,
159.0,
166.0,
165.0,
153.0,
158.0,
182.0,
149.0,
184.0,
172.0,
149.0,
167.0,
163.0,
152.0,
153.0,
171.0,
151.0,
146.0,
162.0,
153.0,
156.0,
164.0,
148.0,
143.0,
143.0,
149.0,
170.5,
159.0,
144.0,
168.0,
148.0,
144.5,
142.5,
146.5,
147.0,
157.0,
168.0,
153.0,
155.0,
127.5,
145.0,
143.0,
146.0,
123.0,
139.0,
137.0,
115.0,
139.5,
117.0,
154.0,
111.0,
124.0,
118.0,
90.0,
104.0,
116.0,
112.0,
76.5,
110.0,
101.0,
123.0,
114.0,
86.0,
92.0,
108.0,
85.0,
146.0,
77.5,
101.0,
102.0,
85.0,
77.0,
114.0,
66.0,
105.0,
90.0,
89.0,
100.0,
108.5,
100.0,
169.0,
89.0,
106.5,
78.0,
75.0,
90.0,
77.0,
88.0,
102.0,
83.5,
123.5,
116.5,
108.0,
119.0,
82.0,
80.0,
105.0,
90.0,
91.0,
113.0,
122.0,
102.0,
101.5,
64.0,
78.0,
52.5,
98.5,
72.0,
87.0,
102.0,
97.0,
123.0,
80.0,
132.5,
86.5,
111.0
],
"med_output": [
63.0,
67.0,
111.0,
142.0,
158.0,
162.0,
164.0,
164.0,
159.0,
160.0,
159.0,
161.0,
160.0,
158.0,
154.0,
154.0,
154.0,
149.0,
146.0,
147.0,
142.0,
144.0,
143.0,
142.0,
140.0,
136.0,
137.0,
139.0,
136.0,
133.0,
130.0,
131.0,
125.0,
123.0,
122.0,
122.0,
118.0,
122.0,
114.0,
112.0,
115.0,
111.0,
109.0,
112.0,
109.0,
107.0,
111.0,
105.0,
108.0,
107.0,
100.0,
100.0,
95.0,
105.0,
103.0,
102.0,
100.0,
100.0,
98.0,
98.0,
101.0,
99.0,
101.0,
102.0,
97.0,
91.0,
100.0,
97.0,
94.0,
98.5,
92.5,
97.0,
102.0,
92.0,
95.0,
91.0,
91.0,
92.0,
85.0,
98.0,
96.0,
99.0,
94.0,
96.0,
90.0,
85.0,
99.0,
86.0,
99.0,
93.0,
92.0,
93.0,
87.0,
83.0,
87.5,
82.0,
80.0,
90.0,
92.0,
80.0,
77.0,
82.0,
87.0,
74.0,
83.0,
79.0,
84.0,
80.5,
79.0,
76.0,
78.5,
71.5,
81.0,
87.0,
82.0,
85.0,
87.0,
75.0,
75.0,
82.0,
86.0,
76.5,
77.5,
70.0,
78.0,
85.0,
77.0,
67.0,
76.5,
107.0,
92.0,
80.5,
85.0,
83.0,
77.0,
70.0,
84.0,
69.0,
97.0,
72.0,
81.0,
87.0,
89.0,
102.0,
83.0,
82.5,
91.0,
79.5
],
"resident_over_new": [
1.0,
6.679794520547945,
22.46517213771017,
45.748370273794,
65.62898089171975,
92.26804123711341,
118.54875,
138.92479108635098,
165.30254777070064,
193.67883211678833,
208.29657794676805,
218.65891472868216,
238.64344262295083,
255.94588744588745,
266.23127753303964,
276.2162162162162,
309.6666666666667,
317.055,
325.81060606060606,
346.15343915343914,
366.8082191780822,
369.375,
384.5027932960894,
373.22074468085106,
404.9248554913295,
394.0888888888889,
436.2621951219512,
435.0179640718563,
460.2257053291536,
439.54761904761904,
471.8205128205128,
430.67528735632186,
479.34615384615387,
474.59119496855345,
451.98192771084337,
454.41212121212124,
496.29411764705884,
484.746835443038,
410.4120879120879,
515.5234899328859,
418.9103260869565,
455.2906976744186,
522.4697986577181,
464.36526946107784,
479.7730061349693,
520.4078947368421,
517.6601307189543,
460.94152046783626,
528.9271523178808,
549.5171232876712,
499.4567901234568,
533.4640522875817,
523.1570512820513,
499.0030487804878,
557.472972972973,
580.0559440559441,
577.8531468531469,
564.4798657718121,
493.7008797653959,
531.0754716981132,
584.0347222222222,
507.0952380952381,
568.4256756756756,
586.7370242214533,
597.1017543859649,
585.4709897610921,
585.7823129251701,
543.7866242038217,
518.672619047619,
573.0522875816994,
571.5290322580645,
695.3411764705883,
612.9793103448276,
624.3636363636364,
626.7945205479452,
730.4878048780488,
651.7697841726618,
666.014598540146,
800.8869565217391,
669.7562724014336,
789.1752136752136,
627.8051948051948,
855.8468468468468,
767.9556451612904,
806.5508474576271,
1065.6666666666667,
928.1538461538462,
831.9655172413793,
868.4821428571429,
1271.9084967320262,
882.5136363636364,
961.4356435643564,
797.0081300813008,
859.3201754385965,
1139.1686046511627,
1068.5869565217392,
898.7129629629629,
1148.6,
685.7945205479452,
1261.483870967742,
1000.7524752475248,
962.7303921568628,
1160.9176470588236,
1276.7142857142858,
869.9473684210526,
1513.3636363636363,
952.1333333333333,
1108.411111111111,
1124.3314606741574,
999.43,
927.2995391705069,
1011.38,
631.5857988165681,
1119.3370786516855,
957.5586854460093,
1310.923076923077,
1373.5733333333333,
1124.8666666666666,
1324.7402597402597,
1157.9204545454545,
1015.4509803921569,
1223.4670658682635,
831.5425101214574,
863.4377682403433,
955.8888888888889,
855.563025210084,
1257.0,
1249.575,
973.1761904761905,
1132.0222222222221,
1127.1703296703297,
934.712389380531,
869.3934426229508,
1019.3529411764706,
1038.8522167487686,
1636.1875,
1346.679487179487,
2031.009523809524,
1099.6954314720813,
1500.3125,
1237.028735632184,
1055.5294117647059,
1112.5051546391753,
883.170731707317,
1354.775,
809.1811320754717,
1222.3236994219653,
936.8108108108108
],
"reuse_pct": [
0.0,
85.02947962061009,
95.5486653123775,
97.81412978426287,
98.47628290670872,
98.91620111731844,
99.1564651672835,
99.28018606889361,
99.39504864656584,
99.48368131453984,
99.5199153006462,
99.54266671393626,
99.5809648113483,
99.60929241333818,
99.62438673274372,
99.63796477495107,
99.67707212055974,
99.68459730961506,
99.69307322063851,
99.71111077144124,
99.72737797363409,
99.72927241962775,
99.73992386598088,
99.73206205328829,
99.75304059841261,
99.74625014097215,
99.7707800466826,
99.77012443563484,
99.78271530937526,
99.77249336438979,
99.78805499701103,
99.76780650542119,
99.79138249217684,
99.78929234031276,
99.77875221580989,
99.77993544773133,
99.7985065781676,
99.7937067502285,
99.75634245933462,
99.80602241808027,
99.76128542608606,
99.78036010726599,
99.80860137704244,
99.78465228436214,
99.79156809841055,
99.8078430381027,
99.80682306002375,
99.7830527397521,
99.81093804777883,
99.81802204924622,
99.79978247973106,
99.8125459446214,
99.8088528105376,
99.79960042279423,
99.82061910648923,
99.82760283551141,
99.82694565125313,
99.822845762863,
99.79744820376354,
99.81170284577398,
99.82877730348034,
99.80279838482487,
99.82407550489143,
99.82956589430727,
99.8325243574224,
99.82919734410615,
99.82928811984671,
99.81610434028896,
99.80720015607606,
99.82549585410084,
99.8250307607211,
99.85618570655116,
99.83686235683265,
99.83983692486895,
99.84045808200017,
99.86310517529216,
99.84657159256479,
99.84985314102846,
99.87513843347593,
99.85069195449047,
99.87328542728262,
99.84071492108149,
99.88315666480699,
99.8697841462198,
99.87601525642776,
99.90616202690022,
99.89225924084204,
99.87980271065611,
99.88485658476407,
99.9213779920042,
99.88668730331234,
99.89598887801864,
99.87453076546434,
99.88362893964528,
99.91221668189266,
99.90641847217984,
99.88872976787793,
99.91293748911718,
99.8541837285021,
99.92072827699074,
99.90007519094543,
99.89612875960428,
99.91386124566772,
99.92167393980083,
99.88505051727267,
99.93392202799302,
99.89497269290015,
99.90978076726445,
99.91105825684177,
99.89994296749147,
99.89215998091679,
99.90112519527774,
99.84166838426802,
99.91066140673152,
99.89556775838399,
99.92371787348903,
99.9271971888408,
99.91110057488295,
99.92451350423998,
99.91363828179436,
99.90152158801267,
99.91826506590185,
99.87974156608614,
99.8841838941053,
99.89538533069859,
99.883117903587,
99.92044550517105,
99.91997279074886,
99.89724368415645,
99.91166251153295,
99.91128226376466,
99.89301521929514,
99.88497727829841,
99.90189855156096,
99.9037399175862,
99.93888231024867,
99.92574328119497,
99.95076340173313,
99.90906573116692,
99.9333472193293,
99.91916113415999,
99.90526081141329,
99.91011277603255,
99.88677161005248,
99.92618700522226,
99.8764182751722,
99.91818861071965,
99.89325486123131
]
}
}

View File

@@ -0,0 +1,180 @@
import json, sys, math, statistics as st
from collections import defaultdict, Counter
import matplotlib; matplotlib.use("Agg")
import matplotlib.pyplot as plt
import numpy as np
PATH="/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl"
OUT="/tmp/wlc_out"; import os; os.makedirs(OUT, exist_ok=True)
BLOCK=512
# --- transparent cost model for C3 (clearly-labeled estimate; raw-timing validation pending) ---
PREFILL_TOK_S=7000.0 # MB1: 32k->4.5s ~7100 tok/s effective on H20 / 30B-A3B
TPOT_S=0.010 # ~10ms/token decode (crossover unloaded ~5ms, loaded ~25ms)
def pct(v,p):
if not v: return float('nan')
s=sorted(v);k=(len(s)-1)*p;f=int(k)
return s[f] if f+1>=len(s) else s[f]+(s[f+1]-s[f])*(k-f)
# ---------- Pass A: structure (scalars only) ----------
parents={}; recs={}; childcount=Counter()
for line in open(PATH):
if not line.strip(): continue
d=json.loads(line); cid=d["chat_id"]; pid=d["parent_chat_id"]
parents[cid]=pid
recs[cid]=(float(d["timestamp"]),int(d["input_length"]),int(d["output_length"]),int(d["turn"]))
if pid!="-1": childcount[pid]+=1
print(f"[A] records={len(recs)}", file=sys.stderr)
root_of={}
def root(cid):
path=[];c=cid
while True:
if c in root_of:r=root_of[c];break
p=parents.get(c,"-1")
if p=="-1" or p not in recs:r=c;break
path.append(c);c=p
for x in path:root_of[x]=r
root_of[cid]=r;return r
sessions=defaultdict(list)
for cid in recs: sessions[root(cid)].append(cid)
seq={r:sorted(m,key=lambda c:(recs[c][3],recs[c][0])) for r,m in sessions.items()}
print(f"[A] sessions={len(seq)}", file=sys.stderr)
# ---------- C1: mixture + turn tail + hazard ----------
sr=mr=sm=mm=so=mo=0
turns_per=[]
for r,s in seq.items():
multi=len(s)>1; turns_per.append(len(s))
for c in s:
_,inl,outl,_=recs[c]
if multi: mr+=1;mm+=inl;mo+=outl
else: sr+=1;sm+=inl;so+=outl
tot_r=sr+mr; tot_in=sm+mm; tot_out=so+mo
cnt_turn=Counter()
for r,s in seq.items():
for c in s: cnt_turn[recs[c][3]]+=1
hazard={k: (cnt_turn[k+1]/cnt_turn[k] if cnt_turn[k] else 0) for k in range(1,30)}
# ---------- C2/C3: per-turn resident vs new-prefill (scalar) + hash_ids reuse ----------
by_in=defaultdict(list); by_new=defaultdict(list); by_out=defaultdict(list)
by_reuse_hash=defaultdict(list) # hash-block prefix stability: reused/parent_blocks
store={} # cid -> (blockset, in, out) for chats with pending children
tot_new_prefill=0; tot_reused=0
for line in open(PATH):
if not line.strip(): continue
d=json.loads(line); cid=d["chat_id"]; pid=d["parent_chat_id"]
inl=int(d["input_length"]); outl=int(d["output_length"]); turn=int(d["turn"])
blocks=set(d["hash_ids"])
if pid in store:
pblk,pin,pout=store[pid]
new_prefill=max(0, inl - pin - pout) # actual recompute (accounts for cached answer)
reused_blk=len(blocks & pblk)
by_reuse_hash[turn].append(reused_blk/len(pblk) if pblk else 0)
childcount[pid]-=1
if childcount[pid]<=0: del store[pid]
tot_reused += (inl-new_prefill)
else:
new_prefill=inl # session start: all new (intra-session)
tot_new_prefill+=new_prefill
by_in[turn].append(inl); by_new[turn].append(new_prefill); by_out[turn].append(outl)
if childcount[cid]>0: store[cid]=(blocks,inl,outl)
print(f"[B] done; store residual={len(store)}", file=sys.stderr)
TURNS=[t for t in sorted(by_in) if len(by_in[t])>=50]
med_in=[pct(by_in[t],.5) for t in TURNS]
med_new=[max(pct(by_new[t],.5),1) for t in TURNS]
med_out=[pct(by_out[t],.5) for t in TURNS]
ratio=[med_in[i]/med_new[i] for i in range(len(TURNS))]
reuse_pct=[(1-med_new[i]/med_in[i])*100 for i in range(len(TURNS))]
# C3 time per turn (cost model)
t_pref=[med_new[i]/PREFILL_TOK_S for i in range(len(TURNS))]
t_dec=[med_out[i]*TPOT_S for i in range(len(TURNS))]
# aggregate decode/prefill time fraction over a RANGE of constants
def agg_time(prate,tpot):
tp=tot_new_prefill/prate; td=tot_out*tpot; return td/(tp+td)
frac_lo=agg_time(13000,0.005); frac_mid=agg_time(7000,0.010); frac_hi=agg_time(3000,0.025)
chars={
"mixture":{"single_sessions":sr if False else sum(1 for s in seq.values() if len(s)==1),
"multi_sessions":sum(1 for s in seq.values() if len(s)>1),
"req_single_pct":sr/tot_r*100,"req_multi_pct":mr/tot_r*100,
"in_single_pct":sm/tot_in*100,"in_multi_pct":mm/tot_in*100,
"out_single_pct":so/tot_out*100,"out_multi_pct":mo/tot_out*100},
"turns":{"mean":st.mean(turns_per),"p99":pct(turns_per,.99),"max":max(turns_per),
"single_turn_pct":sum(1 for x in turns_per if x==1)/len(turns_per)*100},
"hazard":hazard,
"token_mass":{"total_input":tot_in,"total_output":tot_out,"out_in_ratio_pct":tot_out/tot_in*100,
"new_prefill":tot_new_prefill,"reused_prefix":tot_reused,
"new_prefill_pct_of_input":tot_new_prefill/tot_in*100},
"decode_time_fraction":{"optimistic_for_prefill":frac_lo,"mid":frac_mid,"pessimistic":frac_hi},
"per_turn":{"turn":TURNS,"med_resident_input":med_in,"med_new_prefill":med_new,
"med_output":med_out,"resident_over_new":ratio,"reuse_pct":reuse_pct},
}
json.dump(chars, open(f"{OUT}/chars.json","w"), indent=2)
# ================= FIGURES =================
plt.rcParams.update({"figure.dpi":140,"font.size":10,"axes.grid":True,"grid.alpha":.3})
# ---- C1 ----
fig,ax=plt.subplots(1,3,figsize=(15,4.2))
cats=["% sessions","% requests","% input\ntokens","% output\ntokens"];
singv=[chars["mixture"]["single_sessions"]/len(seq)*100, chars["mixture"]["req_single_pct"],
chars["mixture"]["in_single_pct"], chars["mixture"]["out_single_pct"]]
multv=[100-x for x in singv]
x=np.arange(len(cats))
ax[0].bar(x,singv,label="single-turn",color="#7fb3d5")
ax[0].bar(x,multv,bottom=singv,label="multi-turn",color="#e74c3c")
for i in range(len(cats)):
ax[0].text(i,singv[i]/2,f"{singv[i]:.0f}",ha="center",va="center",fontsize=9)
ax[0].text(i,singv[i]+multv[i]/2,f"{multv[i]:.0f}",ha="center",va="center",color="white",fontsize=9)
ax[0].set_xticks(x);ax[0].set_xticklabels(cats);ax[0].set_ylabel("%");ax[0].set_ylim(0,100)
ax[0].set_title("C1a Mixture: 90% sessions single-turn,\nbut multi-turn carries 2/3 prefill mass");ax[0].legend(loc="center right")
# turn CCDF log-log
tc=sorted(turns_per); n=len(tc); xs=sorted(set(tc))
ccdf=[sum(1 for v in tc if v>=xx)/n for xx in xs]
ax[1].loglog(xs,ccdf,marker=".",ms=3,color="#34495e")
ax[1].set_xlabel("turns per session (k)");ax[1].set_ylabel("P(turns >= k)")
ax[1].set_title(f"C1b Heavy-tailed session length\n(p99={chars['turns']['p99']:.0f}, max={chars['turns']['max']})")
# hazard
hk=list(range(1,20)); hv=[hazard[k]*100 for k in hk]
ax[2].plot(hk,hv,marker="o",color="#16a085")
ax[2].set_xlabel("reached turn k");ax[2].set_ylabel("P(continue to k+1) %");ax[2].set_ylim(0,100)
ax[2].set_title("C1c Continuation hazard rises 10%->94%\n(unpredictable at start, Lindy after)")
fig.tight_layout(); fig.savefig(f"{OUT}/c1_session_mixture.png"); plt.close(fig)
# ---- C2 ----
fig,ax=plt.subplots(1,3,figsize=(15,4.2))
ax[0].semilogy(TURNS,med_in,marker="o",label="resident context (input)",color="#e74c3c")
ax[0].semilogy(TURNS,med_new,marker="s",label="new prefill this turn",color="#2980b9")
ax[0].set_xlabel("turn");ax[0].set_ylabel("tokens (median, log)");ax[0].legend()
ax[0].set_xlim(1,30)
ax[0].set_title("C2a Resident state explodes,\nmarginal work collapses")
ax[1].plot(TURNS,ratio,marker="o",color="#8e44ad")
ax[1].set_xlabel("turn");ax[1].set_ylabel("resident / new-prefill");ax[1].set_xlim(1,30)
ax[1].set_title("C2b The PD tax = resident/delta\n(grows to ~250x by deep turns)")
ax[2].plot(TURNS,reuse_pct,marker="o",color="#27ae60")
ax[2].set_xlabel("turn");ax[2].set_ylabel("per-turn reuse %");ax[2].set_ylim(50,100);ax[2].set_xlim(1,30)
ax[2].set_title("C2c Per-turn reuse climbs to 99.6%\n(deep turns are near-pure cache hits)")
fig.tight_layout(); fig.savefig(f"{OUT}/c2_work_amortization.png"); plt.close(fig)
# ---- C3 ----
fig,ax=plt.subplots(1,2,figsize=(11,4.4))
# token mass decomposition
vals=[tot_reused/1e9, tot_new_prefill/1e9, tot_out/1e9]
labs=[f"reused prefix\n{tot_reused/tot_in*100:.0f}% of input",
f"new prefill\n{tot_new_prefill/tot_in*100:.0f}% of input",
f"decode output\n{tot_out/tot_in*100:.1f}% of input"]
ax[0].bar(range(3),vals,color=["#95a5a6","#2980b9","#e67e22"])
ax[0].set_xticks(range(3));ax[0].set_xticklabels(labs,fontsize=8.5)
ax[0].set_ylabel("tokens (billions)")
ax[0].set_title("C3a Token mass: prefill-dominated\n(but tokens != time, see C3b)")
# per-turn prefill vs decode TIME (cost model)
ax[1].semilogy(TURNS,t_pref,marker="o",label="prefill time (new tok / 7k·s⁻¹)",color="#2980b9")
ax[1].semilogy(TURNS,t_dec,marker="s",label="decode time (out·10ms)",color="#e67e22")
ax[1].set_xlabel("turn");ax[1].set_ylabel("seconds (median, log)");ax[1].legend(fontsize=8);ax[1].set_xlim(1,30)
ax[1].set_title(f"C3b Prefill→decode bottleneck flips within a session\n(agg decode-time share ≈ {frac_mid*100:.0f}%, range {frac_lo*100:.0f}{frac_hi*100:.0f}%)")
fig.tight_layout(); fig.savefig(f"{OUT}/c3_prefill_decode_balance.png"); plt.close(fig)
print("FIGURES + chars.json written to", OUT)
print(json.dumps({k:chars[k] for k in ["mixture","turns","token_mass","decode_time_fraction"]}, indent=2))

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB