Files
agentic-pd-hybrid/docs/figures/tpot_pdf_comparison.png
kzlin 722032a13b docs(kvc): add TPOT probability density figure (KVC v2 vs 4DP)
Mirrors the TTFT PDF figure style. Inserted into V2_DEEP_ANALYSIS as a
new §3.5 immediately following §3.4 (TTFT PDF).

The figure preempts a likely reviewer challenge: "Is KVC's TTFT win
bought by sacrificing decode throughput (TPOT)?". The empirical answer
is no -- two KDE curves overlap visually almost perfectly.

Measured TPOT deltas (KVC v2 vs DP 4w, n>=4382 each):
  mean: +0.019 ms  (+0.34%)
  p50:  +0.035 ms  (+0.63%)
  p90:  -0.050 ms  (-0.75%, slight KVC advantage)
  p99:  +0.026 ms  (+0.34%)

The only visible difference is in max-of-distribution:
  KVC max = 11.32 ms  vs  DP max = 9.53 ms
(plausibly cold-start jitter on the first decode step after a reseed;
affects <= 0.1% of requests)

Two-panel figure mirroring the TTFT PDF style:
  left  panel: linear x in [3.5, 9.0] ms -- body
  right panel: log x in [1, 20] ms -- full range with tail

Each panel annotates the percentile gaps with bbox callouts so the
reader's takeaway is "they overlap" not "is there a difference".

Paper purpose: cited from V2_DEEP_ANALYSIS §3.5 as the supporting
evidence that the path-level latency win in §3.2 is concentrated in
the TTFT segment, not in decode. This is what makes the win a real
end-to-end win, not a measurement artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 10:24:44 +08:00

306 KiB
2385x1021px

/gahow/agentic-pd-hybrid/raw/commit/8fc31be605c62e94e71965ac6c690afa4ab93535/docs/figures/tpot_pdf_comparison.png