Use PNG for KV memory wall figure; switch outline to inline image embeds

- Convert figs/f4b_pdsep_kv_wall.pdf to PNG via pdftoppm @ 150 DPI so
  MEETING.md and PAPER_OUTLINE.md render the figure inline on GitHub /
  any standard markdown viewer (PDF !() embeds don't render).
- PAPER_OUTLINE.md F2, F4, F6: switch from backtick code references to
  proper ![]() image embeds so the doc is actually viewable as a deck.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-27 09:13:26 +08:00
parent 0bb97c9dca
commit df0ee5a02b
3 changed files with 23 additions and 11 deletions

View File

@@ -43,7 +43,7 @@ LMetric 56.9%、load_only 54.1%、capped 31.6% APC远低于 79.6% 上界。23
### 静态 PD-disaggD 侧 KV 容量墙
![](figs/f4b_pdsep_kv_wall.pdf)
![](figs/f4b_pdsep_kv_wall.png)
agentic 平均请求 33.6k token 3.3GB KV4P+4D / 6P+2D agentic regime 都穿过 90% 内存墙。**TTFT p50 暴涨 62-72x成功率 99.5% 52-68%**。

View File

@@ -64,9 +64,12 @@ Trace 上 KV reuse 的分解:
理论 APC 上界any-session **80.3%**intra-session-only **79.6%**,差距 <1pp。**cache 本质上是 session-local **任何不保留 session affinity 的调度都丢掉绝大部分 reuse 机会
**Figure 2: Workload characterization (3 panels)** 现有数据可复用
- `figs/f2a_reuse_topology.png` intra-session 93.2% / cross-session 5.7% / shared 1.1% bar
- `figs/f2b_session_skew.png` top 1%/5%/10% session input-token mass
- `figs/f2c_kv_footprint_cdf.png` per-request KV footprint p50/p90/p95/p99 (p99 = 11.8 GiB 12% of H20)
![F2a Reuse topology — intra 93.2% / cross 5.7% / shared 1.1%](figs/f2a_reuse_topology.png)
![F2b Session skew — top 1% = 46.5% input mass](figs/f2b_session_skew.png)
![F2c KV footprint CDF — p99 = 11.8 GiB ≈ 12% of H20](figs/f2c_kv_footprint_cdf.png)
> 📝 Layout TBD三张拼成 1×3 还是分散到 §2.1/§2.2/§2.4 各一张。
@@ -128,11 +131,18 @@ Round-robin 和 load-aware routing如 LMetric, OSDI'26最大化 instance
session-instance 绑定恢复 localityAPC **77.2%**达到上界 97%但把 skew 中的大 session 锁在单 instance **interference index LMetric 6.53 翻倍到 13.65** trace 同硬件)。违反 §2.4 skew 容忍要求
**Figure 4: Three baselines, three failure modes** 拆成三个子图分别放在 §3.13.23.3
- §3.1 `figs/f4a_apc_loss.png` APC 实测 vs 理论上界 79.6% (lmetric 56.9%, load_only 54.1%, capped 31.6%, sticky 77.2%, unified 79.4%)
- §3.2 `figs/f4b_pdsep_kv_wall.pdf` D KV pool 占用 vs per-request KV footprint4P+4D 6P+2D agentic regime 都穿过 90% 内存墙
- §3.3 `figs/f4c_apc_vs_hotspot_tradeoff.png` APC vs hotspot index 散点unified/sticky 在高 APC hotspot lmetric/load_only 在低 APC hotspot
> 📝 可选 `figs/f4d_pd_interference.png` ✅ — Prefill-decode 干扰(同 GPU 8k prefill 让 TPOT 退化 66x放 §3.3 支撑 sticky 的 interference 论证。
§3.1 APC 实测 vs 理论上界 79.6% (lmetric 56.9%, load_only 54.1%, capped 31.6%, sticky 77.2%, unified 79.4%)
![F4a APC loss](figs/f4a_apc_loss.png)
§3.2 D KV pool 占用 vs per-request KV footprint4P+4D 6P+2D agentic regime 都穿过 90% 内存墙
![F4b PD-sep KV memory wall](figs/f4b_pdsep_kv_wall.png)
§3.3 APC vs hotspot index 散点unified/sticky 在高 APC hotspot lmetric/load_only 在低 APC hotspot
![F4c APC vs hotspot tradeoff](figs/f4c_apc_vs_hotspot_tradeoff.png)
> 📝 可选支撑图 — Prefill-decode 干扰(同 GPU 8k prefill 让 TPOT 退化 66x放 §3.3 支撑 sticky 的 interference 论证:
![F4d PD interference](figs/f4d_pd_interference.png)
### §3.4 Takeaway
@@ -215,9 +225,11 @@ KV transfer 发生在触发该 migration 的 request 的 critical path 上,但
### §5.2 End-to-end Performance
**Figure 6: End-to-end performance** `figs/f6_e2e_latency_bars.png` (PARTIAL)
> 现有TTFT/TPOT/E2E p90 bar chart × 5 policies (lmetric / load_only / sticky / unified / capped)。
> **🚧 TBD (NEW DATA)**:缺 `static PD-disagg` 那一列EAR 列也是 TBD需 migration validation。要再补一张同样格式但包含全 6 个 baseline 的图。
**Figure 6: End-to-end performance** (PARTIAL PD-disagg )
![F6 E2E latency bars — 5 policies](figs/f6_e2e_latency_bars.png)
> **🚧 TBD (NEW DATA)**:上图缺 `static PD-disagg` 那一列EAR 列也是 TBD需 migration validation。要再补一张同样格式但包含全 6 个 baseline 的图。
| Scheduler | TTFT p50 | TTFT p90 | TPOT p90 | APC | Hotspot idx | Wall-clock factor |
|---|---|---|---|---|---|---|

BIN
figs/f4b_pdsep_kv_wall.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 93 KiB