obsidian/251221.md at a57afa86b47c58aeca557e7cbcb0d38b81159d78 - obsidian - Local Gitea

gahow/obsidian

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

1.1 KiB

Raw Blame History

Objectives

Auto LLM inference config tuner

Key Results

[9/10] Build the first version auto tuner system
[7/10] Check the current situation of parallelism config optimization
[4/10] Understand the possibility/challenges in LLM inference compute graph arrangement automatically
[0/10] Trace vLLM compute graph and data flow
[3/10] Implement a minimal Rust inference framework
[1/10] Define the IR for automatic optimization
[5/10] Profile different parallelism setup with real trace and analysis their difference
[0/10] Meta-analysis for the theory maximum improvement with heterogenous setup [offtrack]

Last Week

Refine the story. Focus on heterogenous workloads are classified by labels or input length, which is not enough. We should define a classification method through the grouping of similar performance under the same config.
Prepare slides to summarize the story and what to do next.
Prepare slides for IPADS group meeting.

Next Week

Run benchmark for current workload classification to prove different classes need different configs to max the goodput.