23 lines
1.1 KiB
Markdown
23 lines
1.1 KiB
Markdown
Objectives
|
|
- Heterogenous parallelism in cluster
|
|
- EP design for inference performance
|
|
|
|
Key Results
|
|
- [5/10] Profile different parallelism setup with real trace and analysis their difference
|
|
- [0/10] Meta-analysis for the theory maximum improvement with heterogenous setup
|
|
- [0/10] Understand how EP influence performance fully
|
|
- [0/10] Verify how dynamic EP influence performance
|
|
- [4/10] Analysis correlations between MoE layers (suspended)
|
|
|
|
Last Week
|
|
- [For KR1] Read vLLM code and understand how vLLM TP/PP/DP works.
|
|
- [For KR1] Run profile test with different config in a more complete search space.
|
|
- [Surveying] Understand the bottleneck of autoscaling in Ali.
|
|
- [Surveying] The opportunity for profile kernel and get a best compute graph to guide the parallelism config.
|
|
- [Misc] Prepare slides for AIR project conclusion defense.
|
|
|
|
Next Week
|
|
- Survey the possibility of a universal parallelism config search based on kernel. (Start from the related works about NanoFlow)
|
|
- Check the possibility to use GPU bubbles which running small models.
|
|
- Check the challenges to switch parallelism config with context.
|