Gahow Wang
c8ec73c548
Connector tax: high-concurrency confirms +7-9% tax, resolves trace-replay gap
High-concurrency test (512 input, 64 output, rates 4-32 req/s):
Rate=8: plain TTFT p90=94ms, mooncake_both=102ms → +9% tax
Rate=16: plain TTFT p90=144ms, mooncake_both=156ms → +8% tax
Rate=32: both saturated at ~6.1s → no distinguishable difference
Low-concurrency back-to-back retest (4096 input, 256 output):
mooncake_both_v2 vs plain_v2: tax is ≈0% (within noise)
because scheduler's 1.4ms/step is hidden behind model forward.
Decomposition of trace-replay's +45%:
+7-9% from build_connector_meta per-step cost (this microbench)
+20-30% from multi-instance coupling amplification (not measurable here)
remainder from large-cache O(|cache|) scaling (Phase B follow-up)
Also: bench_loop.py now emits mean/p50/p90/p99 for all three metrics.
2026-05-26 21:00:25 +08:00
..
2026-05-26 21:00:25 +08:00
2026-05-26 14:21:30 +08:00
2026-05-26 13:59:09 +08:00
2026-05-26 13:59:09 +08:00
2026-05-26 17:27:41 +08:00
2026-05-26 00:57:06 +08:00
2026-05-26 00:57:06 +08:00
2026-05-26 13:59:09 +08:00
2026-05-26 13:59:09 +08:00
2026-05-26 14:21:30 +08:00
2026-05-26 00:57:06 +08:00
2026-05-26 00:57:06 +08:00