Files
obsidian/phd/weekly-report/25/250615.md

699 B

Objectives

  • Serverless KVCache cache
  • MoE pattern feature
  • EP design for inference performance

Key Results

  • [0/10] Prepare slides for ATC'25 presentation w/ Jinbo
  • [8/10] Run MoE models in Ali
  • [5/10] Analysis the pattern of experts loading in Ali trace
  • [3/10] Analysis the expert pattern in different models
  • [0/10] Understand how EP influence performance fully
  • [0/10] Verify how dynamic EP influence performance

Last Week

  • Develop in vLLM to support tracing expert pattern with PP and distributed with Ray for DeepSeek-671B.
  • Analysis expert pattern's temporal locality.

Next Week

  • Develop in vLLM fully for all models.
  • Analysis the expert pattern's correlations between layers.