Commit Graph

3 Commits

Author SHA1 Message Date
a2111b6e18 PD-disagg docs: annotated corrections for e13391e contamination
Adds dated, non-destructive correction notes to the contaminated PD-vs-colo
artifacts after the producer-eviction bug (`evict_blocks(sent_block_ids)` on
`finished_sending`, deployed over the "fresh" pip vLLM by
`scripts/deploy_vllm_patches.sh`) was found and gated behind
`VLLM_EVICT_SENT_BLOCKS` (default off).

  PD_DISAGG_RESULTS.md  top CORRECTION banner + §6 RETRACTED marker.
                        §6 (session-affinity hot-pin) was an `e13391e`
                        artifact under controlled concurrency; §3 RR, §4
                        TPOT win, §5 D-pool ceiling, §5.1 consumer crash
                        stand.
  RESULTS_SUMMARY.md    §4 confirm+refine note: clean ablation confirms
                        the D-pool capacity thesis and adds regime-
                        dependence.
  pd_separation_analysis.md  scoped caution: thesis confirmed; flags
                        only reuse-dependent figures for cross-check
                        (this study used a different stack).
  figs/mb5/CORRECTION.md  flags mb5_producer_hotspot.png as retracted;
                        §3 RR and §5 D-pool figures stand.
2026-05-31 20:14:14 +08:00
da39ab6804 Correct PD-disagg cost/benefit framing across repo
The §3.2 cost-vs-benefit math in commits 029821c (MB1 plot +
pd_cost_vs_benefit.png) and abde010 (RESULTS_SUMMARY.md) was wrong.

What was wrong:
  I framed PD-disagg's max phase-isolation benefit as "≤ decode duration
  of the new request (~50–200 ms)" — implicitly treating the benefit as
  per-request and bounded by that request's own decode. The correct
  accounting is per-prefill-event across all stalled streams:

      benefit_per_prefill = D × T_prefill × (1 − TPOT_baseline/TPOT_during)
                          ≈ D × T_prefill

  which follows from the chunked-prefill math (each of L/N chunks slows
  D ongoing decode steps from ~10 ms to t ms, summing to D × T_prefill).

Plug MB1 + MB2 numbers in:

  prefill size | T_prefill | T_transfer | D=8 benefit | cost/benefit
   2k tok      | 0.14 s    |     8 ms   |   1.1 s     |    0.7 %
  33k tok      | 4.5  s    |  320 ms    |  36   s     |    0.9 %
 125k tok      | 57   s    |  1.9 s     | 456   s     |    0.4 %

On the phase-isolation axis alone, PD-disagg WINS by 100×–250× — the
opposite of what the deleted figure showed.

The actual dominant reason static PD-disagg fails in agentic is the
D-side KV pool capacity wall (figs/f4b_pdsep_kv_wall.png) — p99
single-request KV is 11.5 GiB, per-D-instance pool is 38 GiB, so 4P+4D
halves system decode capacity. Colleague's 4P+4D experiment showed
TTFT p50 62× worse and success rate 99.5% → 52%, driven by pool
overflow + queueing, not by transfer latency.

Changes (all touched files explicitly listed; no `git add -u`):
- figs/pd_cost_vs_benefit.png : DELETED (figure built on wrong math)
- microbench/fresh_setup/plot_mb1.py : drop the pd_cost_vs_benefit
  function; keep mb1_interference.png and update its title to note
  per-prefill aggregate stall = D × T_prefill (not capped by decode)
- figs/mb1_interference.png : regenerated, no misleading band annotation
- analysis/mb1/README.md : Summary block rewritten ("what MB1 measures";
  no more "max benefit = decode duration" claim); §3.2 implications
  section replaced with the corrected per-prefill-event table; explicit
  ⚠ Correction note documents what was wrong
- analysis/mb2/README.md : Summary block + §3.2 implications section
  rewritten the same way; ⚠ Correction note links to RESULTS_SUMMARY §4
- RESULTS_SUMMARY.md §4 + §6 : §4 reordered to lead with the D-side
  capacity argument (the real failure mode), MB1/MB2 demoted from
  "kill-shot for PD-disagg" to "supporting context inputs to a
  cost-benefit table that actually favors PD-disagg on this axis";
  §6 paper-claims list reordered to remove the wrong "PD-disagg loses
  on cost-vs-benefit" claim and replace with the corrected ones

PAPER_OUTLINE.md and MEETING.md were checked and never picked up this
specific wrong claim — they already (correctly) frame §3.2 around the
D-side KV memory wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:04:49 +08:00
abde010b64 Add RESULTS_SUMMARY.md — concise Chinese summary of current findings
One-page distillation of what the paper can claim today, with figure /
data path next to each row. Sections:

  1. Workload 性质 — intra-session reuse, skew, KV footprint
  2. Dispatch Coupling — agentic vs chatbot inter-turn gap regime
  3. 现有调度三类失败 — load-balance / static PD-disagg / pure sticky
  4. PD-disagg cost vs benefit — MB2 (transfer 9.7 GB/s ceiling,
     topology-independent) + MB1 (decode halted during prefill 15-200x),
     joined into the §3.2 cost > benefit headline for any KV ≥ 80 MiB
  5. EAR 实证状态 — Pillar 1 (affinity) validated, Pillar 2 (migration)
     substrate validated + strategy-layer pending
  6. 已能写的 paper 主张(按 confidence 排序)
  7. 待做(MB3-5, migration e2e, wall-clock sweep, scale-out)

Designed to be the one doc to read when re-entering the project after
a break.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 21:38:28 +08:00