Objectives - Heterogenous parallelism in cluster - EP design for inference performance Key Results - [5/10] Profile different parallelism setup with real trace and analysis their difference - [0/10] Meta-analysis for the theory maximum improvement with heterogenous setup - [0/10] Understand how EP influence performance fully - [0/10] Verify how dynamic EP influence performance - [4/10] Analysis correlations between MoE layers (suspended) Last Week - [For KR1] Run latest vLLM with different parallelism configurations (TP, PP, DP, EP) in Qwen-30B with fixed input/output length to get their difference. - [Misc] Write AIR project conclusion docs for the collaboration in Ali w/ Jinbo. Next Week - Test different parallelism configurations with latest Ali trace. - Analysis the performance pattern in different workloads.