1.3 KiB
1.3 KiB
Objectives
- Auto LLM inference config tuner
Key Results
- [6/10] Build the first version auto tuner system
- [7/10] Check the current situation of parallelism config optimization
- [4/10] Understand the possibility/challenges in LLM inference compute graph arrangement automatically
- [0/10] Trace vLLM compute graph and data flow
- [3/10] Implement a minimal Rust inference framework
- [1/10] Define the IR for automatic optimization
- [5/10] Profile different parallelism setup with real trace and analysis their difference
- [0/10] Meta-analysis for the theory maximum improvement with heterogenous setup [offtrack]
Last Week
- [KR2] Benchmark different configs in different hardware, prove that different hardware and different workload will cause different trends of performance change. 5f2c1ec3 ~ 65d05520
- [KR1] Build a precise workload generator from real workload. Benchmark on quite similar generated workloads and find that even the similar workloads still trigger different performance.
Next Week
- Find the root cause of performance gap under similar workloads.