488 B
488 B
Objectives
- Serverless KVCache cache
- MoE autoscaling
Key Results
- [10/10] Refine a final version of KV$ cache for ATC'25
- [8/10] Run MoE model in Ali
- [0/10] Analysis the pattern of experts loading in Ali trace
- [0/10] Understand how EP influence performance fully
Last Week
- Modify vLLM to support tracing the activated experts and test on Ali trace with Qwen3-32B.
- Prepare and submit KV$ cache to arXiv.
Next Week
- Analysis the experts pattern.
- Test on more MoE models.