21 lines
1.3 KiB
Markdown
21 lines
1.3 KiB
Markdown
Objectives
|
|
- Auto LLM inference config tuner
|
|
|
|
Key Results
|
|
- [9/10] Build the first version auto tuner system
|
|
- [7/10] Check the current situation of parallelism config optimization
|
|
- [4/10] Understand the possibility/challenges in LLM inference compute graph arrangement automatically
|
|
- [0/10] Trace vLLM compute graph and data flow
|
|
- [3/10] Implement a minimal Rust inference framework
|
|
- [1/10] Define the IR for automatic optimization
|
|
- [5/10] Profile different parallelism setup with real trace and analysis their difference
|
|
- [0/10] Meta-analysis for the theory maximum improvement with heterogenous setup [offtrack]
|
|
|
|
Last Week
|
|
- [KR1] Study and summarize the system intelligence, learn the basic way to implement auto tuner.
|
|
- [KR1] Implement the naive auto tuner framework, which supports to run vLLM with sampled configs, then aggregate the benchmark results as the context for LLM to get proposals from LLM for evolving. [ad0b0fc3](https://ipads.se.sjtu.edu.cn:1312/wangjh/auto-tuner/-/commit/ad0b0fc3eb3dea5f91a2c75efc69894fac011301)~[420afa3c](https://ipads.se.sjtu.edu.cn:1312/wangjh/auto-tuner/-/commit/420afa3c7a48d19e2d864f212db0efcd86b40ca8)
|
|
|
|
Next Week
|
|
- Benchmark and summarize and performance of auto tuner vs expert.
|
|
- Survey the heterogenous hardware's utilization in Ali.
|