Files
obsidian/phd/weekly-report/25/250615.md

20 lines
699 B
Markdown

Objectives
- Serverless KVCache cache
- MoE pattern feature
- EP design for inference performance
Key Results
- [0/10] Prepare slides for ATC'25 presentation w/ Jinbo
- [8/10] Run MoE models in Ali
- [5/10] Analysis the pattern of experts loading in Ali trace
- [3/10] Analysis the expert pattern in different models
- [0/10] Understand how EP influence performance fully
- [0/10] Verify how dynamic EP influence performance
Last Week
- Develop in vLLM to support tracing expert pattern with PP and distributed with Ray for DeepSeek-671B.
- Analysis expert pattern's temporal locality.
Next Week
- Develop in vLLM fully for all models.
- Analysis the expert pattern's correlations between layers.