Headline: KVC v2 + load-floor + RDMA beats naive PD-disagg on
mean/p50/p90 by 30-65% (TTFT p50 31s vs 88s, lat p50 37s vs 93s,
wall-clock 64 min vs 88 min). Loses p99 by ~8% (TTFT 224 vs 207).
Wrote 4 figures (docs/figures/):
e1_vs_e4_ttft_pdf.png — bimodal E4 fast-path peak vs E1 single peak
e1_vs_e4_latency_cdf.png — CDF + log-survival showing tail crossover
e4_path_latency.png — per-execution-mode latency breakdown
e1_vs_e4_p99_attribution.png — what makes up E4's p99 tail
P99 tail attribution (this is the key finding):
E4 p99 tail (n=65, TTFT ≥ 179.9s):
fast-path direct-to-d 0 % (0/65)
reseed paths 5 % (3/65)
fallback paths 88 % (57/65)
large-append-session-cap 43 % ← biggest culprit
no-d-capacity 17 %
large-append 14 %
Implication: D→P snapshot (designed to optimize reseed slow path)
even if fully working would touch ≤5% of the p99 tail. The real
bottleneck is *fallback chain* (admission retry + seeded-router
cold start), not reseed. Optimizing p99 needs work on fallback,
not more D→P plumbing.
Full analysis: docs/E4_VS_E1_RESULTS_ZH.md
222 KiB
2385x1002px
222 KiB
2385x1002px