a473c71cac7facafe24904a90753ad6c3322f980
Per-step timing from engine_step.jsonl definitively resolves H3: plain: 53 μs/step (p50) noop_connector: 69 μs/step (+16 μs = negligible framework cost) mooncake_producer: 1461 μs/step (build_connector_meta = 1386 μs) mooncake_both: 1452 μs/step (same as producer) The substrate tax is NOT in the v1 framework — it's specifically in Mooncake's build_connector_meta() which walks set(cache.keys()) every scheduler step (O(|cache|) per step, E2 audit §6.5). Accumulated per-request tax: 256 decode steps × 1.4ms = 358ms. Observed TTFT tax at rate=1.0: plain 378ms vs mooncake_both 422ms (+12%). At rate=2.0 (near saturation): +29%, approaching trace-replay's +45%. Also fixes kill_vllm() to properly kill EngineCore subprocesses.
Description
No description provided
Languages
Python
82.9%
Shell
17.1%