obsidian/251012.md at 8036c9016c25ef801bc279902f76583bb052c820 - obsidian - Local Gitea

gahow/obsidian

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

992 B

Raw Blame History

Objectives

Auto distributed LLM inference config optimization

Key Results

[3/10] Implement a minimal Rust inference framework
[1/10] Define the IR for automatic optimization
[0/10] Trace vLLM compute graph and data flow
[2/10] Understand the possibility/challenges in LLM inference compute graph arrangement automatically
[5/10] Profile different parallelism setup with real trace and analysis their difference
[0/10] Meta-analysis for the theory maximum improvement with heterogenous setup [offtrack]

Last Week

Theoretically analyze for the dual batch overlap optimization to show that different models with different hardware should apply different execution flow.
Survey the DBO and hybrid KVCache management in vLLM.
Make some bottom-up things to do in roadmap https://ipads.se.sjtu.edu.cn:1312/wangjh/infer-framework/-/issues/3.

Next Week

Go through the vLLM codebase to find the feasibility and challenges for auto apply an execution flow for different models.