Files
obsidian/phd/weekly-report/25/250803.md

828 B

Objectives

  • Heterogenous parallelism in cluster
  • EP design for inference performance

Key Results

  • [5/10] Profile different parallelism setup with real trace and analysis their difference
  • [0/10] Meta-analysis for the theory maximum improvement with heterogenous setup
  • [0/10] Understand how EP influence performance fully
  • [0/10] Verify how dynamic EP influence performance
  • [4/10] Analysis correlations between MoE layers (suspended)

Last Week

  • [For KR1] Run latest vLLM with different parallelism configurations (TP, PP, DP, EP) in Qwen-30B with fixed input/output length to get their difference.
  • [Misc] Write AIR project conclusion docs for the collaboration in Ali w/ Jinbo.

Next Week

  • Test different parallelism configurations with latest Ali trace.
  • Analysis the performance pattern in different workloads.