v5 sweep (sweep_tp1_v5_optD.sh) lands the previously-deferred Option D:
worker admission_mode authoritative for direct_append + seed + reseed,
bypassing replay's local _decode_session_soft_cap.
Key findings now documented:
- errors collapse from 9-10% to 0.2% (mooncake timeouts gone)
- session-cap fallback rises 33-35% -> 46-51% — D's true KV pool is the
binding constraint, not replay's estimator; v4's "low fallback" was
hiding capacity overruns as transfer-timeout errors
- direct-to-D subset latency unchanged from v4 (admission overhead negligible)
- new bottleneck: D's physical KV pool — points v6 at prefill backup release
timing, priority eviction tuning, chunked seed, cross-D session migration,
and real RDMA
Also adds a 5th lesson on errors-vs-fallback reciprocity and updates the
code index with the v5 endpoint extension and new CLI knobs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>