bf037594c4e079b42e0e7a348c6e2a1760b26e10
Updated methodology: - Window+thin sampling preserves cross-session sharing (48% vs 16%) - --max-single-turn-ratio 0.3 boosts multi-turn to 70% - --window-seconds 600 for 10-min contiguous window - Trace-driven replay (no session limit, no time compression) - Daily config: --requests 850 (~13 min, APC~76%) Key result: TPOT p90=0.175s (vs 0.073s in legacy 1-req/GPU setup), confirming prefill-decode interference is real at production concurrency. APC 67.5% (vs 44%) from better KV reuse preservation. Also fixed KV reuse breakdown: 62% intra-session / 38% cross-session (was incorrectly reported as 91% / 9%). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%