18 lines
488 B
Markdown
18 lines
488 B
Markdown
Objectives
|
|
- Serverless KVCache cache
|
|
- MoE autoscaling
|
|
|
|
Key Results
|
|
- [10/10] Refine a final version of KV$ cache for ATC'25
|
|
- [8/10] Run MoE model in Ali
|
|
- [0/10] Analysis the pattern of experts loading in Ali trace
|
|
- [0/10] Understand how EP influence performance fully
|
|
|
|
Last Week
|
|
- Modify vLLM to support tracing the activated experts and test on Ali trace with Qwen3-32B.
|
|
- Prepare and submit KV$ cache to arXiv.
|
|
|
|
Next Week
|
|
- Analysis the experts pattern.
|
|
- Test on more MoE models.
|