Document decode harness one-shot mechanism

This commit is contained in:
2026-05-02 06:25:06 +08:00
parent 9e5394b557
commit 6d3459c82d
8 changed files with 185 additions and 4 deletions

View File

@@ -77,6 +77,7 @@ Best-point latency:
- `TP1/DP8/EP8` launched, but did not beat `TP2/DP4/EP8`.
- `EP4` under `TP2/DP4` failed at launch and should be treated as negative evidence for this stack.
- After topology settled at `TP2/DP4/EP8`, the useful runtime refinement was tighter decode batching: `max-num-seqs=128`, `max-num-batched-tokens=256`.
- Harness mechanism and ablation notes are in `one-shot-mechanism-ablation-20260502.md`.
## Current recommendation