obsidian/250615.md at main - obsidian - Local Gitea

gahow/obsidian

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

699 B

Raw Permalink Blame History

Objectives

Serverless KVCache cache
MoE pattern feature
EP design for inference performance

Key Results

[0/10] Prepare slides for ATC'25 presentation w/ Jinbo
[8/10] Run MoE models in Ali
[5/10] Analysis the pattern of experts loading in Ali trace
[3/10] Analysis the expert pattern in different models
[0/10] Understand how EP influence performance fully
[0/10] Verify how dynamic EP influence performance

Last Week

Develop in vLLM to support tracing expert pattern with PP and distributed with Ray for DeepSeek-671B.
Analysis expert pattern's temporal locality.

Next Week

Develop in vLLM fully for all models.
Analysis the expert pattern's correlations between layers.