f2b: regenerate CDF from production trace (1.3M sessions on dash0)
Pulls 456 (rank%, cum%) sample points from the raw production trace at dash0:/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl, cached locally so the figure is reproducible without ssh access. Sampled anchors match the precomputed summary exactly: top 1% = 46.5%, top 5% = 66.5%, top 10% = 74.6% plus newly readable points: top 25% = 87.5%, top 50% = 96.0% Workload characterization is now consistent with the production distribution rather than the small replay subset. Replay window CDF kept as an overlay to show the same hockey-stick shape on the data §5 actually uses. - analysis/characterization/data/production_session_skew_cdf.json: cached sample points (29 KB), so the figure rebuilds locally - scripts/plot_session_skew_cdf.py: now plots from the cache + replay raw - MEETING.md / PAPER_OUTLINE.md: revert numbers to production trace, add top-25%/50% data points Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -26,7 +26,7 @@ L = Λ · N · W_turn(L) # agentic, T_human≈0
|
|||||||
| | 数据 | 图 |
|
| | 数据 | 图 |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| KV reuse 几乎只在 session 内 | intra 93.2% / cross 5.7% / shared 1.1% |  |
|
| KV reuse 几乎只在 session 内 | intra 93.2% / cross 5.7% / shared 1.1% |  |
|
||||||
| Session 极度偏斜 | replay 上 top 1% / 5% / 10% = 24% / 62% / 76% input mass(production 全 trace 更陡,top 1% = 46.5%) |  |
|
| Session 极度偏斜 | production trace 上 top 1% / 5% / 10% / 25% / 50% = **46.5% / 66.5% / 74.6% / 87.5% / 96.0%** input mass |  |
|
||||||
| 单请求 KV footprint 已经很大 | p99 = 11.8 GiB ≈ H20 12% |  |
|
| 单请求 KV footprint 已经很大 | p99 = 11.8 GiB ≈ H20 12% |  |
|
||||||
|
|
||||||
理论 APC 上界 = intra-session 79.6% / any-session 80.3%,差 <1pp。**任何不 affinity 的调度都丢绝大部分 reuse。**
|
理论 APC 上界 = intra-session 79.6% / any-session 80.3%,差 <1pp。**任何不 affinity 的调度都丢绝大部分 reuse。**
|
||||||
@@ -58,7 +58,7 @@ agentic 平均请求 33.6k token 需 3.3GB KV;4P+4D / 6P+2D 在 agentic regime
|
|||||||
| sticky | **20.3s** | 55.4s | **34.6s** |
|
| sticky | **20.3s** | 55.4s | **34.6s** |
|
||||||
| unified | **10.3s** | 37.7s | **18.0s** |
|
| unified | **10.3s** | 37.7s | **18.0s** |
|
||||||
|
|
||||||
机制:top 5% 的 session 占 ~62% input 量、且 hot session 数量远多于 instance 数(8 个),sticky 的 hash 绑定让 **每个 worker 都自己承接一份 hot session**,median worker 也被拖慢。Unified 用 LMetric fallback 把 cold/new session 重路由到非 hot worker,保留 7/8 worker 的速度。系统 p90 由大多数请求决定,所以 unified 几乎 2x 快。
|
机制:production trace 上 top 1% 的 session 占 46.5% input 量、且 hot session 数量远多于 instance 数(8 个),sticky 的 hash 绑定让 **每个 worker 都自己承接一份 hot session**,median worker 也被拖慢。Unified 用 LMetric fallback 把 cold/new session 重路由到非 hot worker,保留 7/8 worker 的速度。系统 p90 由大多数请求决定,所以 unified 几乎 2x 快。
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -48,7 +48,7 @@ Agentic workload 与 chatbot 的三个本质差异:
|
|||||||
|
|
||||||
- **Multi-turn, programmatic continuation**:每个 turn 由上一个 turn 的 tool-call 结果触发,没有人类 think-time
|
- **Multi-turn, programmatic continuation**:每个 turn 由上一个 turn 的 tool-call 结果触发,没有人类 think-time
|
||||||
- **Prefill-dominated**:input/output token ratio **75x**,98% 计算在 prefill 阶段(chatbot 为 1-10x)
|
- **Prefill-dominated**:input/output token ratio **75x**,98% 计算在 prefill 阶段(chatbot 为 1-10x)
|
||||||
- **Skewed sessions**:在 replay trace 上 top 1% session 贡献 **24.3%** input token,top 5% **61.9%**,top 10% **75.8%**(vs uniform 1/5/10%);production 全 trace(1.3M session)skew 更极端,top 1% 达 46.5%
|
- **Skewed sessions**(来自 Qwen3 production trace,n=1.3M session / 2.1M req / 7200s):top 1% 贡献 **46.5%** input token,top 5% **66.5%**,top 10% **74.6%**,top 25% **87.5%**,top 50% **96.0%** —— 半数 session 几乎占满全部 input mass
|
||||||
|
|
||||||
平均 session 长度 TBD turn、TBD 输入 token;p99 单请求 KV 占用 **11.49 GiB**(H20 96GB HBM 的 12%)。
|
平均 session 长度 TBD turn、TBD 输入 token;p99 单请求 KV 占用 **11.49 GiB**(H20 96GB HBM 的 12%)。
|
||||||
|
|
||||||
@@ -68,7 +68,7 @@ Trace 上 KV reuse 的分解:
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||

|

|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
@@ -137,7 +137,7 @@ Round-robin 和 load-aware routing(如 LMetric, OSDI'26)最大化 instance
|
|||||||
| `unified` (affinity + LMetric fallback) | **10.3 s** | 37.7 s | **18.0 s** |
|
| `unified` (affinity + LMetric fallback) | **10.3 s** | 37.7 s | **18.0 s** |
|
||||||
| `lmetric` | 14.0 s | 31.3 s | 24.8 s |
|
| `lmetric` | 14.0 s | 31.3 s | 24.8 s |
|
||||||
|
|
||||||
机制:top 5% session 占 ~62% input mass,hot session 数量远大于 instance 数(8);sticky 的 hash 绑定让 **每个 worker 都自己承接一份 hot session**,median worker 也被拖慢到 20s 量级。unified 用 LMetric fallback 把 cold/new session 重路由到非 hot worker,保留 7/8 worker 的速度。系统 p90 由大多数请求决定,所以 unified 在 e2e p90 上 ~2x 快于 sticky。
|
机制:production trace 上 top 1% session 占 46.5% input mass、top 5% 占 66.5%,hot session 数量远大于 instance 数(8);sticky 的 hash 绑定让 **每个 worker 都自己承接一份 hot session**,median worker 也被拖慢到 20s 量级。unified 用 LMetric fallback 把 cold/new session 重路由到非 hot worker,保留 7/8 worker 的速度。系统 p90 由大多数请求决定,所以 unified 在 e2e p90 上 ~2x 快于 sticky。
|
||||||
|
|
||||||
**注意**:hotspot ratio (max/median) 单独看是误导性的 —— sticky 的 2.73 比 unified 的 3.67 *低*,但因为 sticky 的 median 也高(20.3s vs unified 的 10.3s),系统整体更慢。一个有用的 §3.3 sub-finding:**hot pin failure 必须用 per-worker absolute latency 衡量,不能用 normalized ratio**。
|
**注意**:hotspot ratio (max/median) 单独看是误导性的 —— sticky 的 2.73 比 unified 的 3.67 *低*,但因为 sticky 的 median 也高(20.3s vs unified 的 10.3s),系统整体更慢。一个有用的 §3.3 sub-finding:**hot pin failure 必须用 per-worker absolute latency 衡量,不能用 normalized ratio**。
|
||||||
|
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
Binary file not shown.
|
Before Width: | Height: | Size: 94 KiB After Width: | Height: | Size: 119 KiB |
@@ -1,12 +1,17 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
"""Plot a CDF of cumulative input-token mass by session rank.
|
"""Plot a CDF of cumulative input-token mass by session rank.
|
||||||
|
|
||||||
Reads a JSONL trace (chat_id, session_id, input_length, ...), aggregates
|
Primary curve is the *production* trace
|
||||||
per-session input_length, sorts sessions descending by total, and plots
|
(``/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl``
|
||||||
cumulative fraction of input-token mass vs session-rank percentile.
|
on dash0), which has 1.3 M sessions across 2.1 M records over a 7200 s
|
||||||
|
window. Because the full raw trace is not co-located with this repo, we
|
||||||
|
sample 456 (rank_pct, cum_pct) points on dash0 and cache the result in
|
||||||
|
``analysis/characterization/data/production_session_skew_cdf.json``. Any
|
||||||
|
top-K%% mass figure can be read off the resulting curve.
|
||||||
|
|
||||||
The figure replaces the previous discrete top-1%/5%/10% bars with a
|
The replay-trace CDF (``traces/w600_r0.0015_st30.jsonl``, n=274) is
|
||||||
continuous curve so any percentile can be read off directly.
|
overlaid for sanity — the replay window samples a thin slice of the head
|
||||||
|
so its top-1%% is lower, but the shape is preserved.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
@@ -19,66 +24,85 @@ import matplotlib.pyplot as plt
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
|
|
||||||
def load_session_input_tokens(trace_path: Path) -> dict[str, int]:
|
def load_replay_cdf(trace_path: Path) -> tuple[np.ndarray, np.ndarray, int]:
|
||||||
totals: dict[str, int] = defaultdict(int)
|
totals: dict[str, int] = defaultdict(int)
|
||||||
with trace_path.open() as f:
|
with trace_path.open() as f:
|
||||||
for line in f:
|
for line in f:
|
||||||
row = json.loads(line)
|
row = json.loads(line)
|
||||||
totals[row["session_id"]] += int(row["input_length"])
|
totals[row["session_id"]] += int(row["input_length"])
|
||||||
return dict(totals)
|
n = len(totals)
|
||||||
|
sorted_vals = np.sort(np.array(list(totals.values())))[::-1]
|
||||||
|
cum = np.cumsum(sorted_vals) / sorted_vals.sum()
|
||||||
|
rank_pct = np.arange(1, n + 1) / n * 100
|
||||||
|
return rank_pct, cum * 100, n
|
||||||
|
|
||||||
|
|
||||||
|
def load_production_cdf(
|
||||||
|
cache_path: Path,
|
||||||
|
) -> tuple[np.ndarray, np.ndarray, int, dict[str, float]]:
|
||||||
|
d = json.loads(cache_path.read_text())
|
||||||
|
samples = d["samples"]
|
||||||
|
xs = np.array([s["rank_pct"] for s in samples])
|
||||||
|
ys = np.array([s["cum_pct"] for s in samples])
|
||||||
|
return xs, ys, d["n_sessions"], d["anchors_check"]
|
||||||
|
|
||||||
|
|
||||||
def main() -> None:
|
def main() -> None:
|
||||||
parser = argparse.ArgumentParser()
|
parser = argparse.ArgumentParser()
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--trace",
|
"--replay-trace",
|
||||||
default="traces/w600_r0.0015_st30.jsonl",
|
default="traces/w600_r0.0015_st30.jsonl",
|
||||||
help="JSONL trace path",
|
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--out",
|
"--prod-cache",
|
||||||
default="figs/f2b_session_skew.png",
|
default="analysis/characterization/data/production_session_skew_cdf.json",
|
||||||
help="Output figure path",
|
|
||||||
)
|
)
|
||||||
|
parser.add_argument("--out", default="figs/f2b_session_skew.png")
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
session_totals = load_session_input_tokens(Path(args.trace))
|
prod_x, prod_y, prod_n, prod_anchors = load_production_cdf(Path(args.prod_cache))
|
||||||
n_sessions = len(session_totals)
|
replay_rank_pct, replay_cum_pct, replay_n = load_replay_cdf(Path(args.replay_trace))
|
||||||
sorted_vals = np.sort(np.array(list(session_totals.values())))[::-1]
|
|
||||||
cum = np.cumsum(sorted_vals) / sorted_vals.sum()
|
|
||||||
rank_pct = np.arange(1, n_sessions + 1) / n_sessions * 100
|
|
||||||
|
|
||||||
marks = [1, 5, 10, 25, 50]
|
fig, ax = plt.subplots(figsize=(9, 5.5))
|
||||||
mark_idx = [int(np.ceil(n_sessions * p / 100)) - 1 for p in marks]
|
|
||||||
|
|
||||||
fig, ax = plt.subplots(figsize=(8, 5))
|
ax.plot(
|
||||||
ax.plot(rank_pct, cum * 100, color="#2f6fab", lw=2.2,
|
prod_x, prod_y,
|
||||||
label="cumulative input-token mass")
|
color="#c44e52", lw=2.4,
|
||||||
ax.plot([0, 100], [0, 100], color="#999", ls="--", lw=1,
|
label=f"production trace (n={prod_n:,} sessions, 456-pt sampled)",
|
||||||
label="uniform reference (y = x)")
|
)
|
||||||
|
|
||||||
for p, i in zip(marks, mark_idx):
|
annotate_pts = [1.0, 5.0, 10.0, 25.0, 50.0]
|
||||||
y = cum[i] * 100
|
for p in annotate_pts:
|
||||||
ax.scatter([p], [y], color="#c44e52", zorder=5, s=40)
|
y = float(np.interp(p, prod_x, prod_y))
|
||||||
|
ax.scatter([p], [y], color="#c44e52", s=55, zorder=5)
|
||||||
ax.annotate(
|
ax.annotate(
|
||||||
f"top {p}% → {y:.1f}%",
|
f"top {p:g}% → {y:.1f}%",
|
||||||
xy=(p, y),
|
xy=(p, y),
|
||||||
xytext=(p + 2, y - 5),
|
xytext=(p + 2.5, y - 6),
|
||||||
fontsize=9,
|
fontsize=10,
|
||||||
color="#333",
|
color="#7a1d1d",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
ax.plot(
|
||||||
|
replay_rank_pct, replay_cum_pct,
|
||||||
|
color="#2f6fab", lw=1.6,
|
||||||
|
alpha=0.85,
|
||||||
|
label=f"replay window (n={replay_n} sessions, raw CDF)",
|
||||||
|
)
|
||||||
|
|
||||||
|
ax.plot(
|
||||||
|
[0, 100], [0, 100],
|
||||||
|
color="#888", ls="--", lw=1,
|
||||||
|
label="uniform reference (y = x)",
|
||||||
|
)
|
||||||
|
|
||||||
ax.set_xlim(0, 100)
|
ax.set_xlim(0, 100)
|
||||||
ax.set_ylim(0, 102)
|
ax.set_ylim(0, 102)
|
||||||
ax.set_xlabel("Session rank percentile (top → bottom by input-token mass)")
|
ax.set_xlabel("Session rank percentile (top → bottom by input-token mass)")
|
||||||
ax.set_ylabel("Cumulative % of input-token mass")
|
ax.set_ylabel("Cumulative % of input-token mass")
|
||||||
ax.set_title(
|
ax.set_title("Session input-token mass CDF — Qwen3 production trace")
|
||||||
f"Session input-token mass CDF "
|
|
||||||
f"(n={n_sessions} sessions, "
|
|
||||||
f"total={sorted_vals.sum() / 1e6:.1f} M tokens)"
|
|
||||||
)
|
|
||||||
ax.grid(True, alpha=0.3)
|
ax.grid(True, alpha=0.3)
|
||||||
ax.legend(loc="lower right", framealpha=0.9)
|
ax.legend(loc="lower right", framealpha=0.92, fontsize=9)
|
||||||
|
|
||||||
out_path = Path(args.out)
|
out_path = Path(args.out)
|
||||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|||||||
Reference in New Issue
Block a user