obsidian/250817.md at a57afa86b47c58aeca557e7cbcb0d38b81159d78 - obsidian - Local Gitea

gahow/obsidian

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

1.1 KiB

Raw Blame History

Objectives

Heterogenous parallelism in cluster
EP design for inference performance

Key Results

[6/10] Profile vLLM to get compute graph
[2/10] Understand the possibility/challenges in LLM inference compute graph arrangement automatically
[5/10] Profile different parallelism setup with real trace and analysis their difference
[0/10] Meta-analysis for the theory maximum improvement with heterogenous setup
[0/10] Understand how EP influence performance fully
[0/10] Verify how dynamic EP influence performance
[4/10] Analysis correlations between MoE layers (suspended)

Last Week

[Surveying] Learn about the compute graph arrangement in traditional streaming/batch system and compared to LLM inference system.
[KR1] Profile the vLLM to get kernels time consuming, overlapping status.
[Misc] Review 3 papers as shadow PC for Round 2.
[Misc] Prepare and finish the AIR project conclusion defense with slides.

Next Week

Summarize a table for the similarities and challenges in compute graph arrangement optimization between traditional streaming system and LLM inference system.