098d86385a361cd7d4c114f9d5c4b094fedcdc68
Tracks all hypotheses tested during elastic PD disaggregation research: - H1 (kv_both overhead): REJECTED — zero overhead at idle - H2 (PS cold prefill): REJECTED — PS slower than cached C - H3 (C_s+flexD): PARTIALLY VALIDATED — E2E -9% but HEAVY p90 +117% - H4 (cache-aware offload): TODO — only offload high-cache-hit HEAVY - H5 (RDMA overhead): TODO — Mooncake lacks layerwise transfer - H6 (session migration): TODO — verify D's APC after migration Key insight: offload decision should be cache-aware (new_tokens), not size-based (total_input). 80k request with 90% cache = 8k prefill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%