Files
obsidian/phd/weekly-report/25/250608.md

488 B

Objectives

  • Serverless KVCache cache
  • MoE autoscaling

Key Results

  • [10/10] Refine a final version of KV$ cache for ATC'25
  • [8/10] Run MoE model in Ali
  • [0/10] Analysis the pattern of experts loading in Ali trace
  • [0/10] Understand how EP influence performance fully

Last Week

  • Modify vLLM to support tracing the activated experts and test on Ali trace with Qwen3-32B.
  • Prepare and submit KV$ cache to arXiv.

Next Week

  • Analysis the experts pattern.
  • Test on more MoE models.