Document decode harness one-shot mechanism
This commit is contained in:
@@ -77,6 +77,7 @@ Best-point latency:
|
||||
- `TP1/DP8/EP8` launched, but did not beat `TP2/DP4/EP8`.
|
||||
- `EP4` under `TP2/DP4` failed at launch and should be treated as negative evidence for this stack.
|
||||
- After topology settled at `TP2/DP4/EP8`, the useful runtime refinement was tighter decode batching: `max-num-seqs=128`, `max-num-batched-tokens=256`.
|
||||
- Harness mechanism and ablation notes are in `one-shot-mechanism-ablation-20260502.md`.
|
||||
|
||||
## Current recommendation
|
||||
|
||||
|
||||
Reference in New Issue
Block a user