obsidian/projects/moe-autoscaling/Ongoing.md

if sent_method not in [
+            "determine_num_available_blocks",
+            "initialize_cache",
+        ]:


ray 2.46.0 -> 2.47.1
ipconfig -a

`VLLM_USE_PRECOMPILED=1 pip install --editable .`


```
[Credentials]
language=EN
endpoint=oss-cn-hangzhou.aliyuncs.com
accessKeyID=LTAIJO7wLG9y8KJH
accessKeySecret=nbx8fIu9B94JoICuKRBhxfSQsMgYeY
```


---

基于 Qwen3-30B（128 experts, 48 layers, activate 8 experts）的测试来看：
- 每一层的 expert activation 并没有做到负载均衡，std/mean 的值都接近 1
- 最后几层的 std 明显比前面层的 std 大


TBD
- [ ] 不同 workload 的 expert activation 是否有显著区别
- [ ] 相邻层的 expert activation 是否有关联
- [ ] temporal pattern 和全局的关联
- [ ] 理解 EP 浴盆曲线
- [ ] 列个表，survey 现有工作的 points，和我们测试的对比
- [ ] reasoning 与 non reasoning 在同一个 session 混合