Phase 1 milestone: system-level analysis + reproducible report
- REPORT.md: self-contained milestone report covering baseline vs elastic
setup, exact launch commands, benchmark params, results, log locations,
and repo structure — sufficient for anyone to reproduce
- analysis/pd_separation_analysis.md §5: elastic P2P system-level breakdown
(KV cache hit ratio, per-class TTFT, GPU util paradox explanation)
- scripts/cache_aware_proxy.py: round-robin P-instance selection replacing
argmin(ongoing_tokens) to fix GPU load imbalance (3.0x → expected ~2x)
- scripts/launch_elastic_p2p.sh: one-command launch for elastic P2P config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>