8×TP1 + load_only proxy, shape 512×64, rates 32/64/128 req/s total:
Rate=32 (non-saturated, thr=0.95-0.97):
plain TTFT p90=64ms, mooncake_both=65ms → +2% (noise)
Rate=64 (non-saturated, thr=0.96):
plain TTFT p90=114ms, mooncake_both=107ms → -6% (noise)
Rate=128 (saturated, thr=0.70-0.71):
plain TTFT p90=702ms, mooncake_both=822ms → +17%
plain TTFT p50=339ms, mooncake_both=470ms → +39%
Conclusion: The elastic_migration_v2 +45% is a saturation artifact.
Under SLO-compliant load (TTFT<10s, thr_ratio>0.9), mooncake_both's
1.4ms/step build_connector_meta overhead is completely masked by the
scheduler-model async pipeline. The tax only manifests when the system
is already saturated and queueing amplifies per-step differences.
For practical deployment: enabling kv_role=kv_both has effectively zero
cost as long as the serving system stays within SLO capacity bounds.