Add comprehensive research findings document
Synthesizes all experiments into a paper-ready analysis:
- Agentic workload characteristics vs chatbot/API
- Why PD-Sep, LMetric, elastic RDMA, chunk-size tuning don't work
- Why cache-aware session-sticky routing IS the key optimization
(-60% TTFT, +24pp APC vs round-robin)
- System-level insights: prefill-decode interference threshold,
Mooncake limitations, effective request weight after cache
- GPU balance → HEAVY TTFT -10.5% (demonstrated)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>