Commit Graph

  • 54e1f5266a MB5 PD ablation v2 results: concurrency axis + reuse 3-way writeup main Gahow Wang 2026-06-01 09:35:25 +08:00
  • 3f997fda14 MB5 PD ablation v2 tooling: conc completion-panel plot + gpu_monitor dep Gahow Wang 2026-06-01 09:35:05 +08:00
  • 19c443e3bc paper f2a: reuse-topology decomposition + mixture-sensitivity sweep Gahow Wang 2026-06-01 01:03:40 +08:00
  • 9c105cf05a MB5 PD ablation: controlled-variable reuse/conc redo + campaign tooling Gahow Wang 2026-06-01 01:03:27 +08:00
  • 32f7f55990 v2: linear (default cache-aware) baseline + 2x wall-cap on first600s Gahow Wang 2026-06-01 00:55:40 +08:00
  • 7529284cee v2: LMetric PD-colo vs PD-disagg on the real agentic trace Gahow Wang 2026-05-31 20:15:10 +08:00
  • fafc44da79 MB5 PD reuse-centric ablation: tooling, data, Fig 1-3 Gahow Wang 2026-05-31 20:14:46 +08:00
  • a2111b6e18 PD-disagg docs: annotated corrections for e13391e contamination Gahow Wang 2026-05-31 20:14:14 +08:00
  • 0b180c191e v2 exp(d): expand figure to 6 panels (TTFT/E2E mean+p90, TPS, per-worker GPU util) Gahow Wang 2026-05-30 21:10:27 +08:00
  • 9b6091fe6e v2 exp(d): 5-policy routing under tracets vs thinktime — ranking flip Gahow Wang 2026-05-30 20:59:18 +08:00
  • 68f21bef23 bench harness: env-tunable vLLM health timeout + both-modes 5-policy driver Gahow Wang 2026-05-30 20:59:02 +08:00
  • 075f5bbc22 trace: time_to_parent_chat annotation + thinktime trace variants Gahow Wang 2026-05-30 20:58:49 +08:00
  • 8a6b22c11c Replayer think-time dispatch mode + benchmarking guidance Gahow Wang 2026-05-30 16:25:33 +08:00
  • f0d085ceda Merge remote-tracking branch 'origin/main' Gahow Wang 2026-05-30 15:39:25 +08:00
  • 8d422c4301 Migration trigger validation: unified_v4 fires at 2x QPS, not at 1x Gahow Wang 2026-05-30 15:36:58 +08:00
  • d9cf3126c6 docs: reframe PAPER_OUTLINE to GPU-hit-first + embed v2 figures Gahow Wang 2026-05-30 13:34:19 +08:00
  • dc8e6dd5a8 v2 exp(a): add remote KV-store (RDMA) tier Gahow Wang 2026-05-30 12:48:37 +08:00
  • ad754cfe0b v2 exp(b): GPU KV-capacity APC/latency knee + writeup Gahow Wang 2026-05-30 11:23:31 +08:00
  • 837df6bc9e v2 exp(a): three-tier KV-hit latency microbench (GPU >> CPU >> miss) Gahow Wang 2026-05-30 11:23:04 +08:00
  • cf812b6264 Workload characterization C1-C3 on full production trace Gahow Wang 2026-05-29 18:19:39 +08:00
  • 847f52f03b PD-disagg crossover: regular synthetic trace + goodput sweep + figure Gahow Wang 2026-05-29 18:19:23 +08:00
  • 48ae72467a Replayer: closed-loop inter-turn think-time mode Gahow Wang 2026-05-29 18:19:12 +08:00
  • 657cd36f3d Gate evict_sent_blocks behind VLLM_EVICT_SENT_BLOCKS Gahow Wang 2026-05-29 18:18:59 +08:00
  • a0db3cbe77 Add leastwork_kappa decode-aware ablation (net-negative, documented) Gahow Wang 2026-05-29 17:07:23 +08:00
  • 71b0747b3b 600s-truncated trace + LPWL 5-policy results Gahow Wang 2026-05-29 16:08:35 +08:00
  • 160c29133d Unified bench report: mean+TPS+per-worker GPU util, auto-captured Gahow Wang 2026-05-29 16:08:22 +08:00
  • d9046322c6 Add parameter-free LPWL routing policy (--policy leastwork) Gahow Wang 2026-05-29 16:08:10 +08:00
  • 8a876e90d1 traces/README: clarify w600 is the session-start window, not span Gahow Wang 2026-05-29 12:04:14 +08:00
  • e532e83d3e mb5_run: scrape per-instance prefix-cache counters before teardown Gahow Wang 2026-05-29 11:56:43 +08:00
  • d376d91fe1 Engine-state ablation: full sweep harness + results Gahow Wang 2026-05-29 11:55:49 +08:00
  • 08c3cf48aa Ship anonymized benchmark trace w600_r0.0015_st30 + provenance Gahow Wang 2026-05-29 11:54:43 +08:00
  • 8708b75520 Merge layerwise KV transfer + engine-state ablation onto main Gahow Wang 2026-05-29 11:53:40 +08:00
  • ee5db0b321 MB5 driver updates: PD-proxy + snapshot instrument + launcher tweaks Gahow Wang 2026-05-29 11:53:27 +08:00
  • bad512d3c5 PD-disagg crossover: synthetic-trace generator + morpher + plotter Gahow Wang 2026-05-29 11:53:21 +08:00
  • 41a0c1c48f Migration correctness smoke tests: direct-read, partial-transfer, NIXL Gahow Wang 2026-05-29 11:53:13 +08:00
  • 1262c9c22e Migration transfer-cost study: KV transfer is slow on busy GPUs Gahow Wang 2026-05-29 11:53:01 +08:00
  • 67fcec7933 Unified-routing A+B ablation: decode-aware LMetric + v3 anti-hotspot Gahow Wang 2026-05-29 11:52:44 +08:00
  • a2f2645fda PD_DISAGG_RESULTS §6.3: producer hot-pinning figure Gahow Wang 2026-05-29 00:38:20 +08:00
  • 7947831e0f run_v3_trace.sh: stage LAYERWISE conn + enhanced proxy from shared cpfs (dash1-ready) Gahow Wang 2026-05-29 00:29:56 +08:00
  • 6243b78bba PD_DISAGG_RESULTS §6: session-affinity routing does not rescue PD Gahow Wang 2026-05-29 00:25:10 +08:00
  • 5b26c345f4 P2: all routing policies read real state via eff_ accessors + ablation harness Gahow Wang 2026-05-28 20:21:12 +08:00
  • be948d32b8 P2: real engine-state feed replaces stale shadow counters for migration targeting Gahow Wang 2026-05-28 20:01:26 +08:00
  • 19191940e6 A/B x migration matrix runner (parameterized run_v3_trace.sh + wrapper) Gahow Wang 2026-05-28 19:23:16 +08:00
  • 63387f614d Full v3 trace re-profile with layer-wise: matched migrations improve Gahow Wang 2026-05-28 19:16:37 +08:00
  • 21db2affb4 Trace runner (run_v3_trace.sh) + concurrent mb7 correctness test Gahow Wang 2026-05-28 17:28:48 +08:00
  • e705bb33b6 Proxy write-mode: concurrent prefill+decode dispatch for v3 (EAR_WRITE_MODE=1) Gahow Wang 2026-05-28 17:22:18 +08:00
  • 4242bba034 Chunk-safe + concurrent layer-wise connector (per-step incremental shipping) Gahow Wang 2026-05-28 17:15:54 +08:00
  • 4cd71b6631 Working-set figure: extend left panel to ~50 nodes Gahow Wang 2026-05-28 17:11:12 +08:00
  • 2247d1de08 Working-set figure: right panel = W(t) time series Gahow Wang 2026-05-28 16:31:26 +08:00
  • e77bdcac5a Layerwise under load: overlap benefit survives (bg=16) Gahow Wang 2026-05-28 16:30:14 +08:00
  • c94b2e237a Working-set figure: linear node axes + benefit/cost split Gahow Wang 2026-05-28 16:24:15 +08:00
  • 3b8be5bb61 Working-set figure: express footprint in node count, not GB Gahow Wang 2026-05-28 16:16:00 +08:00
  • dae98c6472 Working-set sizing tool + GLM-5.1-FP8/B300 result Gahow Wang 2026-05-28 16:03:25 +08:00
  • fec50fa45d Layerwise KV transfer on Mooncake: PoC + microbench (worktree exploration) Gahow Wang 2026-05-28 15:34:43 +08:00
  • 2e6a369046 PD_DISAGG_RESULTS §5.1: D-pool pressure crashes consumers Gahow Wang 2026-05-28 13:02:21 +08:00
  • 3957c2df86 MB5 patch: clamp PD-consumer metrics counter underflow Gahow Wang 2026-05-28 13:01:23 +08:00
  • 8596135680 MB5 analysis: per-role KV split proves static-partition mismatch Gahow Wang 2026-05-28 12:05:17 +08:00
  • e8980ce957 MB5 proxy: session-affinity P routing (MB5_P_ROUTING=session) Gahow Wang 2026-05-28 11:05:25 +08:00
  • b13ca10d19 PD_DISAGG_INVESTIGATION: snapshot Phase 0 done + sweep in flight Gahow Wang 2026-05-28 00:51:28 +08:00
  • a66f24d242 MB5 aggregate: cross-config KV-pool + latency comparison Gahow Wang 2026-05-28 00:49:21 +08:00
  • a9c7310f4a MB5 PD-disagg pipeline: working end-to-end Gahow Wang 2026-05-28 00:14:22 +08:00
  • e0d3b5150a MB5 driver fixes: bash env-prefix + replayer flag names + python date math Gahow Wang 2026-05-27 23:23:23 +08:00
  • e9abd70c8d MB5 driver: launcher, orchestrator, KV-pool timeline plotter Gahow Wang 2026-05-27 23:02:57 +08:00
  • a4f5dd56aa MB5 instrumentation: per-request KV-block snapshot from vLLM V1 scheduler Gahow Wang 2026-05-27 22:30:53 +08:00
  • 4a93096c1e Add PD_DISAGG_INVESTIGATION.md — living TODO for proving H1–H4 Gahow Wang 2026-05-27 22:24:31 +08:00
  • f739f7d461 Proxy/runner support for Nixl connector + unified_v3 (offload-decode) policy Gahow Wang 2026-05-27 22:05:19 +08:00
  • da39ab6804 Correct PD-disagg cost/benefit framing across repo Gahow Wang 2026-05-27 22:04:49 +08:00
  • abde010b64 Add RESULTS_SUMMARY.md — concise Chinese summary of current findings Gahow Wang 2026-05-27 21:38:28 +08:00
  • 029821c1b6 MB1: prefill-decode interference under chunked-prefill default; §3.2 headline Gahow Wang 2026-05-27 21:25:09 +08:00
  • 90127c3389 MB2 inter-node: dash1↔dash2 transfer cost is identical to intra-node Gahow Wang 2026-05-27 20:56:08 +08:00
  • 50f72d8875 MB2 inter-node scaffolding: per-host single-instance launcher + client host args Gahow Wang 2026-05-27 20:26:54 +08:00
  • 3f791ee074 MB2 doc: analysis/mb2/README.md as persistent record Gahow Wang 2026-05-27 20:23:50 +08:00
  • de164e5a64 MB2: pure KV-transfer cost on dash1 intra-node — Mooncake ~9.7 GB/s steady Gahow Wang 2026-05-27 19:04:03 +08:00
  • 91673f1fb8 MB2: working end-to-end intra-node KV transfer microbench Gahow Wang 2026-05-27 18:53:25 +08:00
  • 622e0bc04c MB2: parameterize vLLM roles (kv_producer + kv_consumer default) Gahow Wang 2026-05-27 18:17:42 +08:00
  • efdcf3c555 MB2: per-stage instrumentation patch + launcher integration Gahow Wang 2026-05-27 18:12:44 +08:00
  • 7437422618 MB2 scaffolding: launch script for vLLM pair + KV-transfer-time client Gahow Wang 2026-05-27 17:47:04 +08:00
  • 0a63de5bcf Phase 0: fresh vllm 0.18.1 + mooncake-transfer-engine on dash1/dash2 Gahow Wang 2026-05-27 17:42:36 +08:00
  • b11dc30945 §2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic Gahow Wang 2026-05-27 16:51:38 +08:00
  • 876d09db83 Add chatbot T_external CDF; overlay on f3a vs agentic Gahow Wang 2026-05-27 14:49:44 +08:00
  • cef914ecd4 §3.1: add LMetric vs load_only design analysis (cache signal diluted by ×score) Gahow Wang 2026-05-27 14:04:14 +08:00
  • c33c825256 figs/v2: drop unified_v2 (buggy variant); re-render 4-policy panels Gahow Wang 2026-05-27 13:55:10 +08:00
  • 03d8c5d0d1 Render 4 per-policy figures on b3_replay_20260527_0114 into figs/v2/ Gahow Wang 2026-05-27 13:52:17 +08:00
  • 41232f49d3 Measure inter-turn T_external on the raw production trace; add f3a CDF Gahow Wang 2026-05-27 12:37:32 +08:00
  • 555cabcf1f f2c: switch to per-instance decode-concurrency view; correct KV pool ceiling Gahow Wang 2026-05-27 11:28:47 +08:00
  • 922d79ac95 Add full latency grid (mean/p50/p90/p99 × TTFT/TPOT/E2E) as f6 companion Gahow Wang 2026-05-27 11:15:18 +08:00
  • 5e6e98aee7 Replace max/median hotspot index with (median, max) absolute pair Gahow Wang 2026-05-27 11:07:12 +08:00
  • 9ddabee6ae Remove 'capped' references from MEETING.md and PAPER_OUTLINE.md prose Gahow Wang 2026-05-27 11:02:29 +08:00
  • 09ff1069c3 Drop 'capped' from per-policy figures (f4a, f4c×2, f6) Gahow Wang 2026-05-27 10:57:43 +08:00
  • 74e0c2157a Add solo production-trace CDF figure (f2b_session_skew_prod.png) Gahow Wang 2026-05-27 10:53:30 +08:00
  • 1220da249c f2b: regenerate CDF from production trace (1.3M sessions on dash0) Gahow Wang 2026-05-27 10:41:53 +08:00
  • 22c4aa58e4 f2b: replace top-1/5/10% bars with full CDF; align all docs to replay-trace numbers Gahow Wang 2026-05-27 10:37:22 +08:00
  • 020a5c79a7 §3.3 reframe: hot pin failure is uniformly-slow workers, not max/median ratio Gahow Wang 2026-05-27 10:10:23 +08:00
  • 18f1bd4240 Update MEETING.md + PAPER_OUTLINE.md with connector_tax substrate validation Gahow Wang 2026-05-27 09:17:31 +08:00
  • ef9e0102ec Connector tax: trace-replay confirms +45% kv_both penalty is gone; DR-fix adds 22% more Gahow Wang 2026-05-27 09:13:50 +08:00
  • df0ee5a02b Use PNG for KV memory wall figure; switch outline to inline image embeds Gahow Wang 2026-05-27 09:13:26 +08:00
  • 0bb97c9dca Add EAR meeting pitch doc Gahow Wang 2026-05-27 01:48:53 +08:00
  • 52cdb80367 EAR outline: copy reusable figures, mark migration sections deferred Gahow Wang 2026-05-27 01:44:13 +08:00
  • e2f94495a1 EAR paper outline: anchor + dispatch coupling motivation Gahow Wang 2026-05-27 01:24:02 +08:00
  • 31cf8c9b11 DR-fix A/B: env-gate hash sync drops slope from +81 to -0.7 μs/1k blocks Gahow Wang 2026-05-27 00:03:23 +08:00