Gahow Wang 8e0c6e78b0 Add comprehensive research findings document
Synthesizes all experiments into a paper-ready analysis:
- Agentic workload characteristics vs chatbot/API
- Why PD-Sep, LMetric, elastic RDMA, chunk-size tuning don't work
- Why cache-aware session-sticky routing IS the key optimization
  (-60% TTFT, +24pp APC vs round-robin)
- System-level insights: prefill-decode interference threshold,
  Mooncake limitations, effective request weight after cache
- GPU balance → HEAVY TTFT -10.5% (demonstrated)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-23 07:16:31 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%