-
54e1f5266a
MB5 PD ablation v2 results: concurrency axis + reuse 3-way writeup
main
Gahow Wang
2026-06-01 09:35:25 +08:00
-
3f997fda14
MB5 PD ablation v2 tooling: conc completion-panel plot + gpu_monitor dep
Gahow Wang
2026-06-01 09:35:05 +08:00
-
19c443e3bc
paper f2a: reuse-topology decomposition + mixture-sensitivity sweep
Gahow Wang
2026-06-01 01:03:40 +08:00
-
9c105cf05a
MB5 PD ablation: controlled-variable reuse/conc redo + campaign tooling
Gahow Wang
2026-06-01 01:03:27 +08:00
-
32f7f55990
v2: linear (default cache-aware) baseline + 2x wall-cap on first600s
Gahow Wang
2026-06-01 00:55:40 +08:00
-
7529284cee
v2: LMetric PD-colo vs PD-disagg on the real agentic trace
Gahow Wang
2026-05-31 20:15:10 +08:00
-
fafc44da79
MB5 PD reuse-centric ablation: tooling, data, Fig 1-3
Gahow Wang
2026-05-31 20:14:46 +08:00
-
a2111b6e18
PD-disagg docs: annotated corrections for
e13391e contamination
Gahow Wang
2026-05-31 20:14:14 +08:00
-
0b180c191e
v2 exp(d): expand figure to 6 panels (TTFT/E2E mean+p90, TPS, per-worker GPU util)
Gahow Wang
2026-05-30 21:10:27 +08:00
-
9b6091fe6e
v2 exp(d): 5-policy routing under tracets vs thinktime — ranking flip
Gahow Wang
2026-05-30 20:59:18 +08:00
-
68f21bef23
bench harness: env-tunable vLLM health timeout + both-modes 5-policy driver
Gahow Wang
2026-05-30 20:59:02 +08:00
-
075f5bbc22
trace: time_to_parent_chat annotation + thinktime trace variants
Gahow Wang
2026-05-30 20:58:49 +08:00
-
8a6b22c11c
Replayer think-time dispatch mode + benchmarking guidance
Gahow Wang
2026-05-30 16:25:33 +08:00
-
f0d085ceda
Merge remote-tracking branch 'origin/main'
Gahow Wang
2026-05-30 15:39:25 +08:00
-
-
8d422c4301
Migration trigger validation: unified_v4 fires at 2x QPS, not at 1x
Gahow Wang
2026-05-30 15:36:58 +08:00
-
d9cf3126c6
docs: reframe PAPER_OUTLINE to GPU-hit-first + embed v2 figures
Gahow Wang
2026-05-30 13:34:19 +08:00
-
-
dc8e6dd5a8
v2 exp(a): add remote KV-store (RDMA) tier
Gahow Wang
2026-05-30 12:48:37 +08:00
-
ad754cfe0b
v2 exp(b): GPU KV-capacity APC/latency knee + writeup
Gahow Wang
2026-05-30 11:23:31 +08:00
-
837df6bc9e
v2 exp(a): three-tier KV-hit latency microbench (GPU >> CPU >> miss)
Gahow Wang
2026-05-30 11:23:04 +08:00
-
cf812b6264
Workload characterization C1-C3 on full production trace
Gahow Wang
2026-05-29 18:19:39 +08:00
-
847f52f03b
PD-disagg crossover: regular synthetic trace + goodput sweep + figure
Gahow Wang
2026-05-29 18:19:23 +08:00
-
48ae72467a
Replayer: closed-loop inter-turn think-time mode
Gahow Wang
2026-05-29 18:19:12 +08:00
-
657cd36f3d
Gate evict_sent_blocks behind VLLM_EVICT_SENT_BLOCKS
Gahow Wang
2026-05-29 18:18:59 +08:00
-
a0db3cbe77
Add leastwork_kappa decode-aware ablation (net-negative, documented)
Gahow Wang
2026-05-29 17:07:23 +08:00
-
71b0747b3b
600s-truncated trace + LPWL 5-policy results
Gahow Wang
2026-05-29 16:08:35 +08:00
-
160c29133d
Unified bench report: mean+TPS+per-worker GPU util, auto-captured
Gahow Wang
2026-05-29 16:08:22 +08:00
-
d9046322c6
Add parameter-free LPWL routing policy (--policy leastwork)
Gahow Wang
2026-05-29 16:08:10 +08:00
-
8a876e90d1
traces/README: clarify w600 is the session-start window, not span
Gahow Wang
2026-05-29 12:04:14 +08:00
-
e532e83d3e
mb5_run: scrape per-instance prefix-cache counters before teardown
Gahow Wang
2026-05-29 11:56:43 +08:00
-
d376d91fe1
Engine-state ablation: full sweep harness + results
Gahow Wang
2026-05-29 11:55:49 +08:00
-
08c3cf48aa
Ship anonymized benchmark trace w600_r0.0015_st30 + provenance
Gahow Wang
2026-05-29 11:54:43 +08:00
-
8708b75520
Merge layerwise KV transfer + engine-state ablation onto main
Gahow Wang
2026-05-29 11:53:40 +08:00
-
-
ee5db0b321
MB5 driver updates: PD-proxy + snapshot instrument + launcher tweaks
Gahow Wang
2026-05-29 11:53:27 +08:00
-
bad512d3c5
PD-disagg crossover: synthetic-trace generator + morpher + plotter
Gahow Wang
2026-05-29 11:53:21 +08:00
-
41a0c1c48f
Migration correctness smoke tests: direct-read, partial-transfer, NIXL
Gahow Wang
2026-05-29 11:53:13 +08:00
-
1262c9c22e
Migration transfer-cost study: KV transfer is slow on busy GPUs
Gahow Wang
2026-05-29 11:53:01 +08:00
-
67fcec7933
Unified-routing A+B ablation: decode-aware LMetric + v3 anti-hotspot
Gahow Wang
2026-05-29 11:52:44 +08:00
-
a2f2645fda
PD_DISAGG_RESULTS §6.3: producer hot-pinning figure
Gahow Wang
2026-05-29 00:38:20 +08:00
-
7947831e0f
run_v3_trace.sh: stage LAYERWISE conn + enhanced proxy from shared cpfs (dash1-ready)
Gahow Wang
2026-05-29 00:29:56 +08:00
-
6243b78bba
PD_DISAGG_RESULTS §6: session-affinity routing does not rescue PD
Gahow Wang
2026-05-29 00:25:10 +08:00
-
5b26c345f4
P2: all routing policies read real state via eff_ accessors + ablation harness
Gahow Wang
2026-05-28 20:21:12 +08:00
-
be948d32b8
P2: real engine-state feed replaces stale shadow counters for migration targeting
Gahow Wang
2026-05-28 20:01:26 +08:00
-
19191940e6
A/B x migration matrix runner (parameterized run_v3_trace.sh + wrapper)
Gahow Wang
2026-05-28 19:23:16 +08:00
-
63387f614d
Full v3 trace re-profile with layer-wise: matched migrations improve
Gahow Wang
2026-05-28 19:16:37 +08:00
-
21db2affb4
Trace runner (run_v3_trace.sh) + concurrent mb7 correctness test
Gahow Wang
2026-05-28 17:28:48 +08:00
-
e705bb33b6
Proxy write-mode: concurrent prefill+decode dispatch for v3 (EAR_WRITE_MODE=1)
Gahow Wang
2026-05-28 17:22:18 +08:00
-
4242bba034
Chunk-safe + concurrent layer-wise connector (per-step incremental shipping)
Gahow Wang
2026-05-28 17:15:54 +08:00
-
4cd71b6631
Working-set figure: extend left panel to ~50 nodes
Gahow Wang
2026-05-28 17:11:12 +08:00
-
2247d1de08
Working-set figure: right panel = W(t) time series
Gahow Wang
2026-05-28 16:31:26 +08:00
-
e77bdcac5a
Layerwise under load: overlap benefit survives (bg=16)
Gahow Wang
2026-05-28 16:30:14 +08:00
-
c94b2e237a
Working-set figure: linear node axes + benefit/cost split
Gahow Wang
2026-05-28 16:24:15 +08:00
-
3b8be5bb61
Working-set figure: express footprint in node count, not GB
Gahow Wang
2026-05-28 16:16:00 +08:00
-
dae98c6472
Working-set sizing tool + GLM-5.1-FP8/B300 result
Gahow Wang
2026-05-28 16:03:25 +08:00
-
fec50fa45d
Layerwise KV transfer on Mooncake: PoC + microbench (worktree exploration)
Gahow Wang
2026-05-28 15:34:43 +08:00
-
2e6a369046
PD_DISAGG_RESULTS §5.1: D-pool pressure crashes consumers
Gahow Wang
2026-05-28 13:02:21 +08:00
-
3957c2df86
MB5 patch: clamp PD-consumer metrics counter underflow
Gahow Wang
2026-05-28 13:01:23 +08:00
-
8596135680
MB5 analysis: per-role KV split proves static-partition mismatch
Gahow Wang
2026-05-28 12:05:17 +08:00
-
e8980ce957
MB5 proxy: session-affinity P routing (MB5_P_ROUTING=session)
Gahow Wang
2026-05-28 11:05:25 +08:00
-
b13ca10d19
PD_DISAGG_INVESTIGATION: snapshot Phase 0 done + sweep in flight
Gahow Wang
2026-05-28 00:51:28 +08:00
-
a66f24d242
MB5 aggregate: cross-config KV-pool + latency comparison
Gahow Wang
2026-05-28 00:49:21 +08:00
-
a9c7310f4a
MB5 PD-disagg pipeline: working end-to-end
Gahow Wang
2026-05-28 00:14:22 +08:00
-
-
e0d3b5150a
MB5 driver fixes: bash env-prefix + replayer flag names + python date math
Gahow Wang
2026-05-27 23:23:23 +08:00
-
e9abd70c8d
MB5 driver: launcher, orchestrator, KV-pool timeline plotter
Gahow Wang
2026-05-27 23:02:57 +08:00
-
a4f5dd56aa
MB5 instrumentation: per-request KV-block snapshot from vLLM V1 scheduler
Gahow Wang
2026-05-27 22:30:53 +08:00
-
4a93096c1e
Add PD_DISAGG_INVESTIGATION.md — living TODO for proving H1–H4
Gahow Wang
2026-05-27 22:24:31 +08:00
-
f739f7d461
Proxy/runner support for Nixl connector + unified_v3 (offload-decode) policy
Gahow Wang
2026-05-27 22:05:19 +08:00
-
da39ab6804
Correct PD-disagg cost/benefit framing across repo
Gahow Wang
2026-05-27 22:04:49 +08:00
-
abde010b64
Add RESULTS_SUMMARY.md — concise Chinese summary of current findings
Gahow Wang
2026-05-27 21:38:28 +08:00
-
029821c1b6
MB1: prefill-decode interference under chunked-prefill default; §3.2 headline
Gahow Wang
2026-05-27 21:25:09 +08:00
-
90127c3389
MB2 inter-node: dash1↔dash2 transfer cost is identical to intra-node
Gahow Wang
2026-05-27 20:56:08 +08:00
-
50f72d8875
MB2 inter-node scaffolding: per-host single-instance launcher + client host args
Gahow Wang
2026-05-27 20:26:54 +08:00
-
3f791ee074
MB2 doc: analysis/mb2/README.md as persistent record
Gahow Wang
2026-05-27 20:23:50 +08:00
-
de164e5a64
MB2: pure KV-transfer cost on dash1 intra-node — Mooncake ~9.7 GB/s steady
Gahow Wang
2026-05-27 19:04:03 +08:00
-
91673f1fb8
MB2: working end-to-end intra-node KV transfer microbench
Gahow Wang
2026-05-27 18:53:25 +08:00
-
622e0bc04c
MB2: parameterize vLLM roles (kv_producer + kv_consumer default)
Gahow Wang
2026-05-27 18:17:42 +08:00
-
efdcf3c555
MB2: per-stage instrumentation patch + launcher integration
Gahow Wang
2026-05-27 18:12:44 +08:00
-
7437422618
MB2 scaffolding: launch script for vLLM pair + KV-transfer-time client
Gahow Wang
2026-05-27 17:47:04 +08:00
-
0a63de5bcf
Phase 0: fresh vllm 0.18.1 + mooncake-transfer-engine on dash1/dash2
Gahow Wang
2026-05-27 17:42:36 +08:00
-
b11dc30945
§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic
Gahow Wang
2026-05-27 16:51:38 +08:00
-
876d09db83
Add chatbot T_external CDF; overlay on f3a vs agentic
Gahow Wang
2026-05-27 14:49:44 +08:00
-
cef914ecd4
§3.1: add LMetric vs load_only design analysis (cache signal diluted by ×score)
Gahow Wang
2026-05-27 14:04:14 +08:00
-
c33c825256
figs/v2: drop unified_v2 (buggy variant); re-render 4-policy panels
Gahow Wang
2026-05-27 13:55:10 +08:00
-
03d8c5d0d1
Render 4 per-policy figures on b3_replay_20260527_0114 into figs/v2/
Gahow Wang
2026-05-27 13:52:17 +08:00
-
41232f49d3
Measure inter-turn T_external on the raw production trace; add f3a CDF
Gahow Wang
2026-05-27 12:37:32 +08:00
-
555cabcf1f
f2c: switch to per-instance decode-concurrency view; correct KV pool ceiling
Gahow Wang
2026-05-27 11:28:47 +08:00
-
922d79ac95
Add full latency grid (mean/p50/p90/p99 × TTFT/TPOT/E2E) as f6 companion
Gahow Wang
2026-05-27 11:15:18 +08:00
-
5e6e98aee7
Replace max/median hotspot index with (median, max) absolute pair
Gahow Wang
2026-05-27 11:07:12 +08:00
-
9ddabee6ae
Remove 'capped' references from MEETING.md and PAPER_OUTLINE.md prose
Gahow Wang
2026-05-27 11:02:29 +08:00
-
09ff1069c3
Drop 'capped' from per-policy figures (f4a, f4c×2, f6)
Gahow Wang
2026-05-27 10:57:43 +08:00
-
74e0c2157a
Add solo production-trace CDF figure (f2b_session_skew_prod.png)
Gahow Wang
2026-05-27 10:53:30 +08:00
-
1220da249c
f2b: regenerate CDF from production trace (1.3M sessions on dash0)
Gahow Wang
2026-05-27 10:41:53 +08:00
-
22c4aa58e4
f2b: replace top-1/5/10% bars with full CDF; align all docs to replay-trace numbers
Gahow Wang
2026-05-27 10:37:22 +08:00
-
020a5c79a7
§3.3 reframe: hot pin failure is uniformly-slow workers, not max/median ratio
Gahow Wang
2026-05-27 10:10:23 +08:00
-
18f1bd4240
Update MEETING.md + PAPER_OUTLINE.md with connector_tax substrate validation
Gahow Wang
2026-05-27 09:17:31 +08:00
-
ef9e0102ec
Connector tax: trace-replay confirms +45% kv_both penalty is gone; DR-fix adds 22% more
Gahow Wang
2026-05-27 09:13:50 +08:00
-
df0ee5a02b
Use PNG for KV memory wall figure; switch outline to inline image embeds
Gahow Wang
2026-05-27 09:13:26 +08:00
-
0bb97c9dca
Add EAR meeting pitch doc
Gahow Wang
2026-05-27 01:48:53 +08:00
-
52cdb80367
EAR outline: copy reusable figures, mark migration sections deferred
Gahow Wang
2026-05-27 01:44:13 +08:00
-
e2f94495a1
EAR paper outline: anchor + dispatch coupling motivation
Gahow Wang
2026-05-27 01:24:02 +08:00
-
31cf8c9b11
DR-fix A/B: env-gate hash sync drops slope from +81 to -0.7 μs/1k blocks
Gahow Wang
2026-05-27 00:03:23 +08:00