Objectives - Serverless KVCache cache - MoE pattern feature - EP design for inference performance Key Results - [0/10] Prepare slides for ATC'25 presentation w/ Jinbo - [8/10] Run MoE models in Ali - [5/10] Analysis the pattern of experts loading in Ali trace - [3/10] Analysis the expert pattern in different models - [0/10] Understand how EP influence performance fully - [0/10] Verify how dynamic EP influence performance Last Week - Develop in vLLM to support tracing expert pattern with PP and distributed with Ray for DeepSeek-671B. - Analysis expert pattern's temporal locality. Next Week - Develop in vLLM fully for all models. - Analysis the expert pattern's correlations between layers.