From 0958823cdbafa92ea7578cecf15bb2a4cb714870 Mon Sep 17 00:00:00 2001
From: Gahow Wang <gahow.wang@gmail.com>
Date: Sat, 23 May 2026 20:58:38 +0800
Subject: [PATCH] =?UTF-8?q?REPORT:=20add=20=C2=A71.1=20errata=20flagging?=
 =?UTF-8?q?=20superseded=20sections=20(S3)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Calls out that §3.1 (old random sampler, time-scale compression, 1 req/GPU
cap) and the early elastic v3 warm-vs-fresh runs are no longer current,
and that the "--max-inflight-sessions 64+" next-step text refers to a
flag that was removed and must be restored per FIXES.md §B2 before those
numbers can be reproduced. Points readers at §3.6/§3.7 as authoritative.
---
 REPORT.md | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/REPORT.md b/REPORT.md
index 8c96df4..25745da 100644
--- a/REPORT.md
+++ b/REPORT.md
@@ -10,6 +10,26 @@
 
 For agentic LLM workloads (long input, short output, high KV cache reuse), is prefill-decode disaggregation beneficial? If full PD separation hurts (proven in §3), can **selective** disaggregation of only heavy requests improve serving latency while preserving KV cache locality?
 
+## 1.1 Errata / Superseded sections
+
+> This report has been revised several times as the methodology matured.
+> The sections below are kept for historical context but their numerical
+> conclusions have been **superseded** — do not cite them in isolation.
+>
+> - **§3.1 (initial PD-sep vs PD-combined)**: ran with the old random
+>   sampler + `--time-scale` compression + `--max-inflight-sessions 8`.
+>   Cross-session KV reuse dropped from 52% → 16%, and per-GPU concurrency
+>   was capped at 1 req/GPU. Superseded by §3.6.
+> - **Earlier "elastic v3" warm-vs-fresh runs**: baselines were not
+>   restarted between trials, leaving residual KV cache that inflated
+>   baseline TTFT ~2×. Superseded by the cold-start results in §3.6/§3.7.
+> - **Any reference to running `--max-inflight-sessions 64+`**: that flag
+>   was removed when replay moved to trace-driven dispatch. The next-step
+>   experiment requires restoring the flag first (see `FIXES.md` §B2
+>   route A) before any production-concurrency numbers can be produced.
+>
+> The authoritative results are in **§3.6 and §3.7**.
+
 ## 2. Experimental Setup
 
 ### 2.1 Hardware