obsidian/251123.md at 8036c9016c25ef801bc279902f76583bb052c820 - obsidian - Local Gitea

gahow/obsidian

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

1.3 KiB

Raw Blame History

Objectives

Auto LLM inference config tuner

Key Results

[6/10] Build the first version auto tuner system
[7/10] Check the current situation of parallelism config optimization
[4/10] Understand the possibility/challenges in LLM inference compute graph arrangement automatically
[0/10] Trace vLLM compute graph and data flow
[3/10] Implement a minimal Rust inference framework
[1/10] Define the IR for automatic optimization
[5/10] Profile different parallelism setup with real trace and analysis their difference
[0/10] Meta-analysis for the theory maximum improvement with heterogenous setup [offtrack]

Last Week

[KR2] Benchmark different configs in different hardware, prove that different hardware and different workload will cause different trends of performance change. 5f2c1ec3 ~ 65d05520
[KR1] Build a precise workload generator from real workload. Benchmark on quite similar generated workloads and find that even the similar workloads still trigger different performance.

Next Week

Find the root cause of performance gap under similar workloads.