obsidian/phd/weekly-report/25/250608.md

Objectives
- Serverless KVCache cache
- MoE autoscaling

Key Results
- [10/10] Refine a final version of KV$ cache for ATC'25
- [8/10] Run MoE model in Ali
- [0/10] Analysis the pattern of experts loading in Ali trace
- [0/10] Understand how EP influence performance fully

Last Week
- Modify vLLM to support tracing the activated experts and test on Ali trace with Qwen3-32B.
- Prepare and submit KV$ cache to arXiv.

Next Week
- Analysis the experts pattern.
- Test on more MoE models.