Initial commit: obsidian to gitea
This commit is contained in:
21
phd/weekly-report/25/250817.md
Normal file
21
phd/weekly-report/25/250817.md
Normal file
@@ -0,0 +1,21 @@
|
||||
Objectives
|
||||
- Heterogenous parallelism in cluster
|
||||
- EP design for inference performance
|
||||
|
||||
Key Results
|
||||
- [6/10] Profile vLLM to get compute graph
|
||||
- [2/10] Understand the possibility/challenges in LLM inference compute graph arrangement automatically
|
||||
- [5/10] Profile different parallelism setup with real trace and analysis their difference
|
||||
- [0/10] Meta-analysis for the theory maximum improvement with heterogenous setup
|
||||
- [0/10] Understand how EP influence performance fully
|
||||
- [0/10] Verify how dynamic EP influence performance
|
||||
- [4/10] Analysis correlations between MoE layers (suspended)
|
||||
|
||||
Last Week
|
||||
- [Surveying] Learn about the compute graph arrangement in traditional streaming/batch system and compared to LLM inference system.
|
||||
- [KR1] Profile the vLLM to get kernels time consuming, overlapping status.
|
||||
- [Misc] Review 3 papers as shadow PC for Round 2.
|
||||
- [Misc] Prepare and finish the AIR project conclusion defense with slides.
|
||||
|
||||
Next Week
|
||||
- Summarize a table for the similarities and challenges in compute graph arrangement optimization between traditional streaming system and LLM inference system.
|
||||
Reference in New Issue
Block a user