Gahow Wang a473c71cac Connector tax Phase A: build_connector_meta is 1.4ms/step (the tax source)
Per-step timing from engine_step.jsonl definitively resolves H3:
  plain:            53 μs/step (p50)
  noop_connector:   69 μs/step (+16 μs = negligible framework cost)
  mooncake_producer: 1461 μs/step (build_connector_meta = 1386 μs)
  mooncake_both:    1452 μs/step (same as producer)

The substrate tax is NOT in the v1 framework — it's specifically in
Mooncake's build_connector_meta() which walks set(cache.keys()) every
scheduler step (O(|cache|) per step, E2 audit §6.5).

Accumulated per-request tax: 256 decode steps × 1.4ms = 358ms.
Observed TTFT tax at rate=1.0: plain 378ms vs mooncake_both 422ms (+12%).
At rate=2.0 (near saturation): +29%, approaching trace-replay's +45%.

Also fixes kill_vllm() to properly kill EngineCore subprocesses.
2026-05-26 19:33:15 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%