Files
obsidian/phd/weekly-report/25/250817.md

1.1 KiB

Objectives

  • Heterogenous parallelism in cluster
  • EP design for inference performance

Key Results

  • [6/10] Profile vLLM to get compute graph
  • [2/10] Understand the possibility/challenges in LLM inference compute graph arrangement automatically
  • [5/10] Profile different parallelism setup with real trace and analysis their difference
  • [0/10] Meta-analysis for the theory maximum improvement with heterogenous setup
  • [0/10] Understand how EP influence performance fully
  • [0/10] Verify how dynamic EP influence performance
  • [4/10] Analysis correlations between MoE layers (suspended)

Last Week

  • [Surveying] Learn about the compute graph arrangement in traditional streaming/batch system and compared to LLM inference system.
  • [KR1] Profile the vLLM to get kernels time consuming, overlapping status.
  • [Misc] Review 3 papers as shadow PC for Round 2.
  • [Misc] Prepare and finish the AIR project conclusion defense with slides.

Next Week

  • Summarize a table for the similarities and challenges in compute graph arrangement optimization between traditional streaming system and LLM inference system.