Workload characterization C1-C3 on full production trace

Joint/temporal characterizations of the full 051315 cluster trace (2.11M req / 1.31M sessions / 2h), beyond the existing single-variable marginals: - C1 mixture: 90.3% sessions single-turn, but multi-turn (9.7%) = 44% reqs / 67% prefill mass; continuation hazard rises 10%->94% (Lindy); heaviness unpredictable at turn 1 (corr 0.04-0.15) => reactive routing justified. - C2 resident/delta: resident context 11k->56k while new-prefill 2.7k->~200; per-turn reuse ->99.6%; resident/delta ("PD tax") ->~250-450x. - C3 prefill/decode: token mass 98.7% input / 1.3% output, BUT decode ~70% of TIME (robust 68-71%); "decode negligible" is wrong (tokens != time). Correct colo argument = roofline complementarity, not "no decode". Maps each to (1) PD-colocation and (2) routing. compute_chars.py + chars.json + figs/workload_chars/. Raw-file exact validation (cached_tokens, real timings) pending. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 18:19:39 +08:00
parent 847f52f03b
commit cf812b6264
6 changed files with 1225 additions and 0 deletions
--- a/analysis/workload_chars/README.md
+++ b/analysis/workload_chars/README.md
@@ -0,0 +1,81 @@
+# Agentic workload characterization C1–C3 (full 051315 production trace)
+
+Date 2026-05-29. Source: `trace-glm5.1-formatted/051315-051317.jsonl` on dash1
+(release file, 2,114,220 requests / 1,307,276 sessions / 2h, type=100% `coder`).
+This release file **is the full cluster-level production trace** — session skew
+reproduces 46.5/66.5/74.6/87.5/96.0 exactly. Compute: `compute_chars.py`
+(2-pass, ~65s, `~/ali-trace/.venv` python). Numbers: `chars.json`.
+
+> ⚠️ **Cluster-level, not per-instance.** This is one cluster's aggregate stream.
+> Concurrent-session counts have NO denominator of "8 instances" — do not compare
+> them to a single deployment's instance count.
+
+These three are NOT in the existing 13 analyzer figures (which are single-variable
+marginals on the older 041x traces). C1–C3 are joint/temporal and argument-bearing.
+
+## C1 — the workload is a MIXTURE, not "multi-turn agentic" (`c1_session_mixture.png`)
+
+- **90.3%** of sessions are single-turn; mean 1.62 turns, p99=18, max=3091.
+- But multi-turn sessions (9.7%) = **44.2% of requests** and **66.9% of input
+  (prefill) mass**. Single-turn = **60.2% of output (decode) mass**.
+- Continuation hazard P(reach k+1 | reached k): turn1→2 only **10.2%**, but
+  turn2→3 50.6%, turn5→6 87%, turn12→13 **94.3%** (Lindy / Pareto).
+- Predictability of heaviness at cold-start is near-zero:
+  corr(turn1_input, session_mass)=0.15, corr(turn1_input, n_turns)=**0.04**.
+
+**Routing:** heaviness is unpredictable at session start → proactive placement
+cannot pre-empt hot-pin → a REACTIVE mechanism (observable-load routing /
+migration) is required. But once a session has shown depth, it almost surely
+continues → "observed accumulated load" is the signal that works (not turn-1
+features, not cost-model prediction). The single/multi optimal strategies are
+opposite (load-balance the 90% one-shot sea vs affinity-pin the deep tail) and
+you can't tell them apart at turn 1 → the only viable policy starts everyone
+load-balanced and becomes sticky as turns accrue. This is exactly LPWL's
+emergent behavior (`new_uncached≈input`→by-load; `new_uncached≈0`→sticks), so
+C1 explains *why* a cache-aware-load score is the right shape — it auto-segments
+the mixture with no classifier.
+
+## C2 — marginal work collapses while resident state explodes (`c2_work_amortization.png`)
+
+Per turn: resident context grows 11k→56k+ tokens while new prefill collapses
+2.7k→~200 tokens; per-turn reuse climbs 83%→**99.6%**; resident/new ratio
+("the PD tax") grows to ~250× by turn 12, ~450× by turn 30.
+
+**PD-colocation:** the dominant cost is keeping ~50k+ resident KV available for
+the next turn's tiny delta. Disaggregation physically splits a turn's prefill-KV
+(P) and decode-KV (D), and the next turn's prefix = [prevPrompt + prevAnswer]
+spans both → must be gathered/transferred; colocation keeps it local for free.
+**Routing:** route on delta (`input − cache_hit`), never total input — C2 is the
+trace-level justification for LPWL's score function.
+
+## C3 — prefill/decode BALANCE (honest reframe) (`c3_prefill_decode_balance.png`)
+
+- Token mass: 98.7% input / **1.3% output**; of input, 60% reused-prefix, 40%
+  new-prefill (28.6B new-prefill tokens vs 0.94B decode tokens).
+- **But tokens ≠ time.** Under a per-request latency model (prefill@7k tok/s,
+  TPOT 10ms), aggregate decode-time share ≈ **70% (robust 68–71% across
+  constants)** — each decode token costs ~70–140× a prefill token. So this is
+  NOT a "decode is negligible" workload.
+- Per-request the bottleneck FLIPS within a session: turn-1 (and the 90%
+  single-turn) is prefill-bound; turns ≥3 are strongly decode-bound.
+
+**PD-colocation (correct argument):** the workload has *substantial* work on both
+sides of the roofline — compute-bound prefill (~30% of time) and memory-bound
+decode (~70%). Colocation interleaves them on one GPU (chunked prefill +
+continuous batching) so compute and HBM bandwidth are both used; static
+disaggregation strands P-instances bandwidth-idle and D-instances compute-idle.
+The earlier "decode is 1.3% so nothing to isolate" instinct was WRONG (token vs
+time confusion) — C3b is the correction.
+
+**Caveat:** C3b's 70% is a per-request-latency-weighted estimate; batched decode
+throughput will shift it. Ground-truth needs `-raw.jsonl` (`usage.cached_tokens`
+for exact reuse; `backend_first_response_time_ms` / `total_cost_time_ms` for real
+prefill vs decode wall time). Sampling that 522GB file is the next step.
+
+## Goal mapping
+
+| | argue PD-colocation | guide routing |
+|---|---|---|
+| C1 mixture + hazard | both segments favor colo (diff reasons) | reactive + auto-segment ⇒ LPWL shape |
+| C2 resident/delta | the PD tax (transfer/split resident KV) | route on delta, not total |
+| C3 prefill/decode | roofline complementarity (interleave) | per-req bottleneck flips within session |
--- a/analysis/workload_chars/chars.json
+++ b/analysis/workload_chars/chars.json
@@ -0,0 +1,964 @@
+{
+  "mixture": {
+    "single_sessions": 1179990,
+    "multi_sessions": 127286,
+    "req_single_pct": 55.81207253738968,
+    "req_multi_pct": 44.187927462610325,
+    "in_single_pct": 33.12487590117447,
+    "in_multi_pct": 66.87512409882554,
+    "out_single_pct": 60.24502960903973,
+    "out_multi_pct": 39.75497039096027
+  },
+  "turns": {
+    "mean": 1.6172713336739908,
+    "p99": 18.0,
+    "max": 3091,
+    "single_turn_pct": 90.26326498765371
+  },
+  "hazard": {
+    "1": 0.102101621998721,
+    "2": 0.5062146469376287,
+    "3": 0.7351961756478754,
+    "4": 0.8113739305485657,
+    "5": 0.8723731546954472,
+    "6": 0.8669264241631353,
+    "7": 0.9093235352011023,
+    "8": 0.9240204920989971,
+    "9": 0.901725753553022,
+    "10": 0.9346178826585841,
+    "11": 0.9260597637248089,
+    "12": 0.9427685226874781,
+    "13": 0.91950119395065,
+    "14": 0.936865189289012,
+    "15": 0.9382160896883085,
+    "16": 0.9308646838684262,
+    "17": 0.9371561574269995,
+    "18": 0.9312862196131557,
+    "19": 0.9333279456925813,
+    "20": 0.9351459000779289,
+    "21": 0.9399074074074074,
+    "22": 0.9404984730568416,
+    "23": 0.9473132921336546,
+    "24": 0.9193940734188413,
+    "25": 0.9497294046903187,
+    "26": 0.9323793845764214,
+    "27": 0.9483906016569333,
+    "28": 0.9368466275239868,
+    "29": 0.9472638336900031
+  },
+  "token_mass": {
+    "total_input": 71116829368,
+    "total_output": 940765734,
+    "out_in_ratio_pct": 1.3228454394837104,
+    "new_prefill": 28616906067,
+    "reused_prefix": 42499923301,
+    "new_prefill_pct_of_input": 40.23928839532401
+  },
+  "decode_time_fraction": {
+    "optimistic_for_prefill": 0.6812079219496285,
+    "mid": 0.6970810590484581,
+    "pessimistic": 0.711448473592609
+  },
+  "per_turn": {
+    "turn": [
+      1,
+      2,
+      3,
+      4,
+      5,
+      6,
+      7,
+      8,
+      9,
+      10,
+      11,
+      12,
+      13,
+      14,
+      15,
+      16,
+      17,
+      18,
+      19,
+      20,
+      21,
+      22,
+      23,
+      24,
+      25,
+      26,
+      27,
+      28,
+      29,
+      30,
+      31,
+      32,
+      33,
+      34,
+      35,
+      36,
+      37,
+      38,
+      39,
+      40,
+      41,
+      42,
+      43,
+      44,
+      45,
+      46,
+      47,
+      48,
+      49,
+      50,
+      51,
+      52,
+      53,
+      54,
+      55,
+      56,
+      57,
+      58,
+      59,
+      60,
+      61,
+      62,
+      63,
+      64,
+      65,
+      66,
+      67,
+      68,
+      69,
+      70,
+      71,
+      72,
+      73,
+      74,
+      75,
+      76,
+      77,
+      78,
+      79,
+      80,
+      81,
+      82,
+      83,
+      84,
+      85,
+      86,
+      87,
+      88,
+      89,
+      90,
+      91,
+      92,
+      93,
+      94,
+      95,
+      96,
+      97,
+      98,
+      99,
+      100,
+      101,
+      102,
+      103,
+      104,
+      105,
+      106,
+      107,
+      108,
+      109,
+      110,
+      111,
+      112,
+      113,
+      114,
+      115,
+      116,
+      117,
+      118,
+      119,
+      120,
+      121,
+      122,
+      123,
+      124,
+      125,
+      126,
+      127,
+      128,
+      129,
+      130,
+      131,
+      132,
+      133,
+      134,
+      135,
+      136,
+      137,
+      138,
+      139,
+      140,
+      141,
+      142,
+      143,
+      144,
+      145,
+      146,
+      147,
+      148
+    ],
+    "med_resident_input": [
+      11035.0,
+      19505.0,
+      28059.0,
+      35089.0,
+      41215.0,
+      44750.0,
+      47419.5,
+      49874.0,
+      51905.0,
+      53068.0,
+      54782.0,
+      56414.0,
+      58229.0,
+      59123.5,
+      60434.5,
+      61320.0,
+      62243.0,
+      63411.0,
+      64510.5,
+      65423.0,
+      66942.5,
+      67965.0,
+      68826.0,
+      70165.5,
+      70052.0,
+      70936.0,
+      71547.0,
+      72648.0,
+      73406.0,
+      73844.0,
+      73604.0,
+      74937.5,
+      74778.0,
+      75460.0,
+      75029.0,
+      74978.0,
+      75933.0,
+      76590.0,
+      74695.0,
+      76813.0,
+      77079.5,
+      78310.0,
+      77848.0,
+      77549.0,
+      78203.0,
+      79102.0,
+      79202.0,
+      78821.0,
+      79868.0,
+      80229.5,
+      80912.0,
+      81620.0,
+      81612.5,
+      81836.5,
+      82506.0,
+      82948.0,
+      82633.0,
+      84107.5,
+      84176.0,
+      84441.0,
+      84101.0,
+      85192.0,
+      84127.0,
+      84783.5,
+      85087.0,
+      85771.5,
+      86110.0,
+      85374.5,
+      87137.0,
+      87677.0,
+      88587.0,
+      88656.0,
+      88882.0,
+      89284.0,
+      91512.0,
+      89850.0,
+      90596.0,
+      91244.0,
+      92102.0,
+      93431.0,
+      92333.5,
+      96682.0,
+      94999.0,
+      95226.5,
+      95173.0,
+      95910.0,
+      96528.0,
+      96508.0,
+      97270.0,
+      97301.0,
+      97076.5,
+      97105.0,
+      98032.0,
+      97962.5,
+      97968.5,
+      98310.0,
+      97061.0,
+      97631.0,
+      100126.0,
+      97765.0,
+      101076.0,
+      98198.5,
+      98678.0,
+      98307.0,
+      99174.0,
+      99882.0,
+      99974.0,
+      99757.0,
+      100065.5,
+      99943.0,
+      100612.0,
+      101138.0,
+      106738.0,
+      99621.0,
+      101980.0,
+      102252.0,
+      103018.0,
+      101238.0,
+      102005.0,
+      101897.0,
+      103576.0,
+      102159.5,
+      102695.5,
+      100590.5,
+      103236.0,
+      101812.0,
+      103074.0,
+      99966.0,
+      102183.5,
+      101882.0,
+      102572.5,
+      105622.5,
+      106066.0,
+      103974.0,
+      105443.5,
+      104716.0,
+      105041.0,
+      106628.0,
+      108320.0,
+      108022.5,
+      107621.5,
+      107664.0,
+      107913.0,
+      108630.0,
+      108382.0,
+      107216.5,
+      105731.0,
+      103986.0
+    ],
+    "med_new_prefill": [
+      11035.0,
+      2920.0,
+      1249.0,
+      767.0,
+      628.0,
+      485.0,
+      400.0,
+      359.0,
+      314.0,
+      274.0,
+      263.0,
+      258.0,
+      244.0,
+      231.0,
+      227.0,
+      222.0,
+      201.0,
+      200.0,
+      198.0,
+      189.0,
+      182.5,
+      184.0,
+      179.0,
+      188.0,
+      173.0,
+      180.0,
+      164.0,
+      167.0,
+      159.5,
+      168.0,
+      156.0,
+      174.0,
+      156.0,
+      159.0,
+      166.0,
+      165.0,
+      153.0,
+      158.0,
+      182.0,
+      149.0,
+      184.0,
+      172.0,
+      149.0,
+      167.0,
+      163.0,
+      152.0,
+      153.0,
+      171.0,
+      151.0,
+      146.0,
+      162.0,
+      153.0,
+      156.0,
+      164.0,
+      148.0,
+      143.0,
+      143.0,
+      149.0,
+      170.5,
+      159.0,
+      144.0,
+      168.0,
+      148.0,
+      144.5,
+      142.5,
+      146.5,
+      147.0,
+      157.0,
+      168.0,
+      153.0,
+      155.0,
+      127.5,
+      145.0,
+      143.0,
+      146.0,
+      123.0,
+      139.0,
+      137.0,
+      115.0,
+      139.5,
+      117.0,
+      154.0,
+      111.0,
+      124.0,
+      118.0,
+      90.0,
+      104.0,
+      116.0,
+      112.0,
+      76.5,
+      110.0,
+      101.0,
+      123.0,
+      114.0,
+      86.0,
+      92.0,
+      108.0,
+      85.0,
+      146.0,
+      77.5,
+      101.0,
+      102.0,
+      85.0,
+      77.0,
+      114.0,
+      66.0,
+      105.0,
+      90.0,
+      89.0,
+      100.0,
+      108.5,
+      100.0,
+      169.0,
+      89.0,
+      106.5,
+      78.0,
+      75.0,
+      90.0,
+      77.0,
+      88.0,
+      102.0,
+      83.5,
+      123.5,
+      116.5,
+      108.0,
+      119.0,
+      82.0,
+      80.0,
+      105.0,
+      90.0,
+      91.0,
+      113.0,
+      122.0,
+      102.0,
+      101.5,
+      64.0,
+      78.0,
+      52.5,
+      98.5,
+      72.0,
+      87.0,
+      102.0,
+      97.0,
+      123.0,
+      80.0,
+      132.5,
+      86.5,
+      111.0
+    ],
+    "med_output": [
+      63.0,
+      67.0,
+      111.0,
+      142.0,
+      158.0,
+      162.0,
+      164.0,
+      164.0,
+      159.0,
+      160.0,
+      159.0,
+      161.0,
+      160.0,
+      158.0,
+      154.0,
+      154.0,
+      154.0,
+      149.0,
+      146.0,
+      147.0,
+      142.0,
+      144.0,
+      143.0,
+      142.0,
+      140.0,
+      136.0,
+      137.0,
+      139.0,
+      136.0,
+      133.0,
+      130.0,
+      131.0,
+      125.0,
+      123.0,
+      122.0,
+      122.0,
+      118.0,
+      122.0,
+      114.0,
+      112.0,
+      115.0,
+      111.0,
+      109.0,
+      112.0,
+      109.0,
+      107.0,
+      111.0,
+      105.0,
+      108.0,
+      107.0,
+      100.0,
+      100.0,
+      95.0,
+      105.0,
+      103.0,
+      102.0,
+      100.0,
+      100.0,
+      98.0,
+      98.0,
+      101.0,
+      99.0,
+      101.0,
+      102.0,
+      97.0,
+      91.0,
+      100.0,
+      97.0,
+      94.0,
+      98.5,
+      92.5,
+      97.0,
+      102.0,
+      92.0,
+      95.0,
+      91.0,
+      91.0,
+      92.0,
+      85.0,
+      98.0,
+      96.0,
+      99.0,
+      94.0,
+      96.0,
+      90.0,
+      85.0,
+      99.0,
+      86.0,
+      99.0,
+      93.0,
+      92.0,
+      93.0,
+      87.0,
+      83.0,
+      87.5,
+      82.0,
+      80.0,
+      90.0,
+      92.0,
+      80.0,
+      77.0,
+      82.0,
+      87.0,
+      74.0,
+      83.0,
+      79.0,
+      84.0,
+      80.5,
+      79.0,
+      76.0,
+      78.5,
+      71.5,
+      81.0,
+      87.0,
+      82.0,
+      85.0,
+      87.0,
+      75.0,
+      75.0,
+      82.0,
+      86.0,
+      76.5,
+      77.5,
+      70.0,
+      78.0,
+      85.0,
+      77.0,
+      67.0,
+      76.5,
+      107.0,
+      92.0,
+      80.5,
+      85.0,
+      83.0,
+      77.0,
+      70.0,
+      84.0,
+      69.0,
+      97.0,
+      72.0,
+      81.0,
+      87.0,
+      89.0,
+      102.0,
+      83.0,
+      82.5,
+      91.0,
+      79.5
+    ],
+    "resident_over_new": [
+      1.0,
+      6.679794520547945,
+      22.46517213771017,
+      45.748370273794,
+      65.62898089171975,
+      92.26804123711341,
+      118.54875,
+      138.92479108635098,
+      165.30254777070064,
+      193.67883211678833,
+      208.29657794676805,
+      218.65891472868216,
+      238.64344262295083,
+      255.94588744588745,
+      266.23127753303964,
+      276.2162162162162,
+      309.6666666666667,
+      317.055,
+      325.81060606060606,
+      346.15343915343914,
+      366.8082191780822,
+      369.375,
+      384.5027932960894,
+      373.22074468085106,
+      404.9248554913295,
+      394.0888888888889,
+      436.2621951219512,
+      435.0179640718563,
+      460.2257053291536,
+      439.54761904761904,
+      471.8205128205128,
+      430.67528735632186,
+      479.34615384615387,
+      474.59119496855345,
+      451.98192771084337,
+      454.41212121212124,
+      496.29411764705884,
+      484.746835443038,
+      410.4120879120879,
+      515.5234899328859,
+      418.9103260869565,
+      455.2906976744186,
+      522.4697986577181,
+      464.36526946107784,
+      479.7730061349693,
+      520.4078947368421,
+      517.6601307189543,
+      460.94152046783626,
+      528.9271523178808,
+      549.5171232876712,
+      499.4567901234568,
+      533.4640522875817,
+      523.1570512820513,
+      499.0030487804878,
+      557.472972972973,
+      580.0559440559441,
+      577.8531468531469,
+      564.4798657718121,
+      493.7008797653959,
+      531.0754716981132,
+      584.0347222222222,
+      507.0952380952381,
+      568.4256756756756,
+      586.7370242214533,
+      597.1017543859649,
+      585.4709897610921,
+      585.7823129251701,
+      543.7866242038217,
+      518.672619047619,
+      573.0522875816994,
+      571.5290322580645,
+      695.3411764705883,
+      612.9793103448276,
+      624.3636363636364,
+      626.7945205479452,
+      730.4878048780488,
+      651.7697841726618,
+      666.014598540146,
+      800.8869565217391,
+      669.7562724014336,
+      789.1752136752136,
+      627.8051948051948,
+      855.8468468468468,
+      767.9556451612904,
+      806.5508474576271,
+      1065.6666666666667,
+      928.1538461538462,
+      831.9655172413793,
+      868.4821428571429,
+      1271.9084967320262,
+      882.5136363636364,
+      961.4356435643564,
+      797.0081300813008,
+      859.3201754385965,
+      1139.1686046511627,
+      1068.5869565217392,
+      898.7129629629629,
+      1148.6,
+      685.7945205479452,
+      1261.483870967742,
+      1000.7524752475248,
+      962.7303921568628,
+      1160.9176470588236,
+      1276.7142857142858,
+      869.9473684210526,
+      1513.3636363636363,
+      952.1333333333333,
+      1108.411111111111,
+      1124.3314606741574,
+      999.43,
+      927.2995391705069,
+      1011.38,
+      631.5857988165681,
+      1119.3370786516855,
+      957.5586854460093,
+      1310.923076923077,
+      1373.5733333333333,
+      1124.8666666666666,
+      1324.7402597402597,
+      1157.9204545454545,
+      1015.4509803921569,
+      1223.4670658682635,
+      831.5425101214574,
+      863.4377682403433,
+      955.8888888888889,
+      855.563025210084,
+      1257.0,
+      1249.575,
+      973.1761904761905,
+      1132.0222222222221,
+      1127.1703296703297,
+      934.712389380531,
+      869.3934426229508,
+      1019.3529411764706,
+      1038.8522167487686,
+      1636.1875,
+      1346.679487179487,
+      2031.009523809524,
+      1099.6954314720813,
+      1500.3125,
+      1237.028735632184,
+      1055.5294117647059,
+      1112.5051546391753,
+      883.170731707317,
+      1354.775,
+      809.1811320754717,
+      1222.3236994219653,
+      936.8108108108108
+    ],
+    "reuse_pct": [
+      0.0,
+      85.02947962061009,
+      95.5486653123775,
+      97.81412978426287,
+      98.47628290670872,
+      98.91620111731844,
+      99.1564651672835,
+      99.28018606889361,
+      99.39504864656584,
+      99.48368131453984,
+      99.5199153006462,
+      99.54266671393626,
+      99.5809648113483,
+      99.60929241333818,
+      99.62438673274372,
+      99.63796477495107,
+      99.67707212055974,
+      99.68459730961506,
+      99.69307322063851,
+      99.71111077144124,
+      99.72737797363409,
+      99.72927241962775,
+      99.73992386598088,
+      99.73206205328829,
+      99.75304059841261,
+      99.74625014097215,
+      99.7707800466826,
+      99.77012443563484,
+      99.78271530937526,
+      99.77249336438979,
+      99.78805499701103,
+      99.76780650542119,
+      99.79138249217684,
+      99.78929234031276,
+      99.77875221580989,
+      99.77993544773133,
+      99.7985065781676,
+      99.7937067502285,
+      99.75634245933462,
+      99.80602241808027,
+      99.76128542608606,
+      99.78036010726599,
+      99.80860137704244,
+      99.78465228436214,
+      99.79156809841055,
+      99.8078430381027,
+      99.80682306002375,
+      99.7830527397521,
+      99.81093804777883,
+      99.81802204924622,
+      99.79978247973106,
+      99.8125459446214,
+      99.8088528105376,
+      99.79960042279423,
+      99.82061910648923,
+      99.82760283551141,
+      99.82694565125313,
+      99.822845762863,
+      99.79744820376354,
+      99.81170284577398,
+      99.82877730348034,
+      99.80279838482487,
+      99.82407550489143,
+      99.82956589430727,
+      99.8325243574224,
+      99.82919734410615,
+      99.82928811984671,
+      99.81610434028896,
+      99.80720015607606,
+      99.82549585410084,
+      99.8250307607211,
+      99.85618570655116,
+      99.83686235683265,
+      99.83983692486895,
+      99.84045808200017,
+      99.86310517529216,
+      99.84657159256479,
+      99.84985314102846,
+      99.87513843347593,
+      99.85069195449047,
+      99.87328542728262,
+      99.84071492108149,
+      99.88315666480699,
+      99.8697841462198,
+      99.87601525642776,
+      99.90616202690022,
+      99.89225924084204,
+      99.87980271065611,
+      99.88485658476407,
+      99.9213779920042,
+      99.88668730331234,
+      99.89598887801864,
+      99.87453076546434,
+      99.88362893964528,
+      99.91221668189266,
+      99.90641847217984,
+      99.88872976787793,
+      99.91293748911718,
+      99.8541837285021,
+      99.92072827699074,
+      99.90007519094543,
+      99.89612875960428,
+      99.91386124566772,
+      99.92167393980083,
+      99.88505051727267,
+      99.93392202799302,
+      99.89497269290015,
+      99.90978076726445,
+      99.91105825684177,
+      99.89994296749147,
+      99.89215998091679,
+      99.90112519527774,
+      99.84166838426802,
+      99.91066140673152,
+      99.89556775838399,
+      99.92371787348903,
+      99.9271971888408,
+      99.91110057488295,
+      99.92451350423998,
+      99.91363828179436,
+      99.90152158801267,
+      99.91826506590185,
+      99.87974156608614,
+      99.8841838941053,
+      99.89538533069859,
+      99.883117903587,
+      99.92044550517105,
+      99.91997279074886,
+      99.89724368415645,
+      99.91166251153295,
+      99.91128226376466,
+      99.89301521929514,
+      99.88497727829841,
+      99.90189855156096,
+      99.9037399175862,
+      99.93888231024867,
+      99.92574328119497,
+      99.95076340173313,
+      99.90906573116692,
+      99.9333472193293,
+      99.91916113415999,
+      99.90526081141329,
+      99.91011277603255,
+      99.88677161005248,
+      99.92618700522226,
+      99.8764182751722,
+      99.91818861071965,
+      99.89325486123131
+    ]
+  }
+}
--- a/analysis/workload_chars/compute_chars.py
+++ b/analysis/workload_chars/compute_chars.py
@@ -0,0 +1,180 @@
+import json, sys, math, statistics as st
+from collections import defaultdict, Counter
+import matplotlib; matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import numpy as np
+
+PATH="/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl"
+OUT="/tmp/wlc_out"; import os; os.makedirs(OUT, exist_ok=True)
+BLOCK=512
+# --- transparent cost model for C3 (clearly-labeled estimate; raw-timing validation pending) ---
+PREFILL_TOK_S=7000.0     # MB1: 32k->4.5s ~7100 tok/s effective on H20 / 30B-A3B
+TPOT_S=0.010             # ~10ms/token decode (crossover unloaded ~5ms, loaded ~25ms)
+
+def pct(v,p):
+    if not v: return float('nan')
+    s=sorted(v);k=(len(s)-1)*p;f=int(k)
+    return s[f] if f+1>=len(s) else s[f]+(s[f+1]-s[f])*(k-f)
+
+# ---------- Pass A: structure (scalars only) ----------
+parents={}; recs={}; childcount=Counter()
+for line in open(PATH):
+    if not line.strip(): continue
+    d=json.loads(line); cid=d["chat_id"]; pid=d["parent_chat_id"]
+    parents[cid]=pid
+    recs[cid]=(float(d["timestamp"]),int(d["input_length"]),int(d["output_length"]),int(d["turn"]))
+    if pid!="-1": childcount[pid]+=1
+print(f"[A] records={len(recs)}", file=sys.stderr)
+
+root_of={}
+def root(cid):
+    path=[];c=cid
+    while True:
+        if c in root_of:r=root_of[c];break
+        p=parents.get(c,"-1")
+        if p=="-1" or p not in recs:r=c;break
+        path.append(c);c=p
+    for x in path:root_of[x]=r
+    root_of[cid]=r;return r
+sessions=defaultdict(list)
+for cid in recs: sessions[root(cid)].append(cid)
+seq={r:sorted(m,key=lambda c:(recs[c][3],recs[c][0])) for r,m in sessions.items()}
+print(f"[A] sessions={len(seq)}", file=sys.stderr)
+
+# ---------- C1: mixture + turn tail + hazard ----------
+sr=mr=sm=mm=so=mo=0
+turns_per=[]
+for r,s in seq.items():
+    multi=len(s)>1; turns_per.append(len(s))
+    for c in s:
+        _,inl,outl,_=recs[c]
+        if multi: mr+=1;mm+=inl;mo+=outl
+        else: sr+=1;sm+=inl;so+=outl
+tot_r=sr+mr; tot_in=sm+mm; tot_out=so+mo
+cnt_turn=Counter()
+for r,s in seq.items():
+    for c in s: cnt_turn[recs[c][3]]+=1
+hazard={k: (cnt_turn[k+1]/cnt_turn[k] if cnt_turn[k] else 0) for k in range(1,30)}
+
+# ---------- C2/C3: per-turn resident vs new-prefill (scalar) + hash_ids reuse ----------
+by_in=defaultdict(list); by_new=defaultdict(list); by_out=defaultdict(list)
+by_reuse_hash=defaultdict(list)  # hash-block prefix stability: reused/parent_blocks
+store={}  # cid -> (blockset, in, out) for chats with pending children
+tot_new_prefill=0; tot_reused=0
+for line in open(PATH):
+    if not line.strip(): continue
+    d=json.loads(line); cid=d["chat_id"]; pid=d["parent_chat_id"]
+    inl=int(d["input_length"]); outl=int(d["output_length"]); turn=int(d["turn"])
+    blocks=set(d["hash_ids"])
+    if pid in store:
+        pblk,pin,pout=store[pid]
+        new_prefill=max(0, inl - pin - pout)              # actual recompute (accounts for cached answer)
+        reused_blk=len(blocks & pblk)
+        by_reuse_hash[turn].append(reused_blk/len(pblk) if pblk else 0)
+        childcount[pid]-=1
+        if childcount[pid]<=0: del store[pid]
+        tot_reused += (inl-new_prefill)
+    else:
+        new_prefill=inl                                    # session start: all new (intra-session)
+    tot_new_prefill+=new_prefill
+    by_in[turn].append(inl); by_new[turn].append(new_prefill); by_out[turn].append(outl)
+    if childcount[cid]>0: store[cid]=(blocks,inl,outl)
+print(f"[B] done; store residual={len(store)}", file=sys.stderr)
+
+TURNS=[t for t in sorted(by_in) if len(by_in[t])>=50]
+med_in=[pct(by_in[t],.5) for t in TURNS]
+med_new=[max(pct(by_new[t],.5),1) for t in TURNS]
+med_out=[pct(by_out[t],.5) for t in TURNS]
+ratio=[med_in[i]/med_new[i] for i in range(len(TURNS))]
+reuse_pct=[(1-med_new[i]/med_in[i])*100 for i in range(len(TURNS))]
+# C3 time per turn (cost model)
+t_pref=[med_new[i]/PREFILL_TOK_S for i in range(len(TURNS))]
+t_dec=[med_out[i]*TPOT_S for i in range(len(TURNS))]
+
+# aggregate decode/prefill time fraction over a RANGE of constants
+def agg_time(prate,tpot):
+    tp=tot_new_prefill/prate; td=tot_out*tpot; return td/(tp+td)
+frac_lo=agg_time(13000,0.005); frac_mid=agg_time(7000,0.010); frac_hi=agg_time(3000,0.025)
+
+chars={
+ "mixture":{"single_sessions":sr if False else sum(1 for s in seq.values() if len(s)==1),
+   "multi_sessions":sum(1 for s in seq.values() if len(s)>1),
+   "req_single_pct":sr/tot_r*100,"req_multi_pct":mr/tot_r*100,
+   "in_single_pct":sm/tot_in*100,"in_multi_pct":mm/tot_in*100,
+   "out_single_pct":so/tot_out*100,"out_multi_pct":mo/tot_out*100},
+ "turns":{"mean":st.mean(turns_per),"p99":pct(turns_per,.99),"max":max(turns_per),
+   "single_turn_pct":sum(1 for x in turns_per if x==1)/len(turns_per)*100},
+ "hazard":hazard,
+ "token_mass":{"total_input":tot_in,"total_output":tot_out,"out_in_ratio_pct":tot_out/tot_in*100,
+   "new_prefill":tot_new_prefill,"reused_prefix":tot_reused,
+   "new_prefill_pct_of_input":tot_new_prefill/tot_in*100},
+ "decode_time_fraction":{"optimistic_for_prefill":frac_lo,"mid":frac_mid,"pessimistic":frac_hi},
+ "per_turn":{"turn":TURNS,"med_resident_input":med_in,"med_new_prefill":med_new,
+   "med_output":med_out,"resident_over_new":ratio,"reuse_pct":reuse_pct},
+}
+json.dump(chars, open(f"{OUT}/chars.json","w"), indent=2)
+
+# ================= FIGURES =================
+plt.rcParams.update({"figure.dpi":140,"font.size":10,"axes.grid":True,"grid.alpha":.3})
+
+# ---- C1 ----
+fig,ax=plt.subplots(1,3,figsize=(15,4.2))
+cats=["% sessions","% requests","% input\ntokens","% output\ntokens"];
+singv=[chars["mixture"]["single_sessions"]/len(seq)*100, chars["mixture"]["req_single_pct"],
+       chars["mixture"]["in_single_pct"], chars["mixture"]["out_single_pct"]]
+multv=[100-x for x in singv]
+x=np.arange(len(cats))
+ax[0].bar(x,singv,label="single-turn",color="#7fb3d5")
+ax[0].bar(x,multv,bottom=singv,label="multi-turn",color="#e74c3c")
+for i in range(len(cats)):
+    ax[0].text(i,singv[i]/2,f"{singv[i]:.0f}",ha="center",va="center",fontsize=9)
+    ax[0].text(i,singv[i]+multv[i]/2,f"{multv[i]:.0f}",ha="center",va="center",color="white",fontsize=9)
+ax[0].set_xticks(x);ax[0].set_xticklabels(cats);ax[0].set_ylabel("%");ax[0].set_ylim(0,100)
+ax[0].set_title("C1a Mixture: 90% sessions single-turn,\nbut multi-turn carries 2/3 prefill mass");ax[0].legend(loc="center right")
+# turn CCDF log-log
+tc=sorted(turns_per); n=len(tc); xs=sorted(set(tc))
+ccdf=[sum(1 for v in tc if v>=xx)/n for xx in xs]
+ax[1].loglog(xs,ccdf,marker=".",ms=3,color="#34495e")
+ax[1].set_xlabel("turns per session (k)");ax[1].set_ylabel("P(turns >= k)")
+ax[1].set_title(f"C1b Heavy-tailed session length\n(p99={chars['turns']['p99']:.0f}, max={chars['turns']['max']})")
+# hazard
+hk=list(range(1,20)); hv=[hazard[k]*100 for k in hk]
+ax[2].plot(hk,hv,marker="o",color="#16a085")
+ax[2].set_xlabel("reached turn k");ax[2].set_ylabel("P(continue to k+1) %");ax[2].set_ylim(0,100)
+ax[2].set_title("C1c Continuation hazard rises 10%->94%\n(unpredictable at start, Lindy after)")
+fig.tight_layout(); fig.savefig(f"{OUT}/c1_session_mixture.png"); plt.close(fig)
+
+# ---- C2 ----
+fig,ax=plt.subplots(1,3,figsize=(15,4.2))
+ax[0].semilogy(TURNS,med_in,marker="o",label="resident context (input)",color="#e74c3c")
+ax[0].semilogy(TURNS,med_new,marker="s",label="new prefill this turn",color="#2980b9")
+ax[0].set_xlabel("turn");ax[0].set_ylabel("tokens (median, log)");ax[0].legend()
+ax[0].set_xlim(1,30)
+ax[0].set_title("C2a Resident state explodes,\nmarginal work collapses")
+ax[1].plot(TURNS,ratio,marker="o",color="#8e44ad")
+ax[1].set_xlabel("turn");ax[1].set_ylabel("resident / new-prefill");ax[1].set_xlim(1,30)
+ax[1].set_title("C2b The PD tax = resident/delta\n(grows to ~250x by deep turns)")
+ax[2].plot(TURNS,reuse_pct,marker="o",color="#27ae60")
+ax[2].set_xlabel("turn");ax[2].set_ylabel("per-turn reuse %");ax[2].set_ylim(50,100);ax[2].set_xlim(1,30)
+ax[2].set_title("C2c Per-turn reuse climbs to 99.6%\n(deep turns are near-pure cache hits)")
+fig.tight_layout(); fig.savefig(f"{OUT}/c2_work_amortization.png"); plt.close(fig)
+
+# ---- C3 ----
+fig,ax=plt.subplots(1,2,figsize=(11,4.4))
+# token mass decomposition
+vals=[tot_reused/1e9, tot_new_prefill/1e9, tot_out/1e9]
+labs=[f"reused prefix\n{tot_reused/tot_in*100:.0f}% of input",
+      f"new prefill\n{tot_new_prefill/tot_in*100:.0f}% of input",
+      f"decode output\n{tot_out/tot_in*100:.1f}% of input"]
+ax[0].bar(range(3),vals,color=["#95a5a6","#2980b9","#e67e22"])
+ax[0].set_xticks(range(3));ax[0].set_xticklabels(labs,fontsize=8.5)
+ax[0].set_ylabel("tokens (billions)")
+ax[0].set_title("C3a Token mass: prefill-dominated\n(but tokens != time, see C3b)")
+# per-turn prefill vs decode TIME (cost model)
+ax[1].semilogy(TURNS,t_pref,marker="o",label="prefill time (new tok / 7k·s⁻¹)",color="#2980b9")
+ax[1].semilogy(TURNS,t_dec,marker="s",label="decode time (out·10ms)",color="#e67e22")
+ax[1].set_xlabel("turn");ax[1].set_ylabel("seconds (median, log)");ax[1].legend(fontsize=8);ax[1].set_xlim(1,30)
+ax[1].set_title(f"C3b Prefill→decode bottleneck flips within a session\n(agg decode-time share ≈ {frac_mid*100:.0f}%, range {frac_lo*100:.0f}–{frac_hi*100:.0f}%)")
+fig.tight_layout(); fig.savefig(f"{OUT}/c3_prefill_decode_balance.png"); plt.close(fig)
+print("FIGURES + chars.json written to", OUT)
+print(json.dumps({k:chars[k] for k in ["mixture","turns","token_mass","decode_time_fraction"]}, indent=2))
--- a/figs/workload_chars/c1_session_mixture.png
+++ b/figs/workload_chars/c1_session_mixture.png
--- a/figs/workload_chars/c2_work_amortization.png
+++ b/figs/workload_chars/c2_work_amortization.png
--- a/figs/workload_chars/c3_prefill_decode_balance.png
+++ b/figs/workload_chars/c3_prefill_decode_balance.png