Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00
commit a57afa86b4
323 changed files with 42569 additions and 0 deletions
--- a/projects/moe-autoscaling/Ongoing.md
+++ b/projects/moe-autoscaling/Ongoing.md
@@ -0,0 +1,38 @@
+if sent_method not in [
+            "determine_num_available_blocks",
+            "initialize_cache",
+        ]:
+
+
+
+ray 2.46.0 -> 2.47.1
+ipconfig -a
+
+`VLLM_USE_PRECOMPILED=1 pip install --editable .`
+
+
+```
+[Credentials]
+language=EN
+endpoint=oss-cn-hangzhou.aliyuncs.com
+accessKeyID=LTAIJO7wLG9y8KJH
+accessKeySecret=nbx8fIu9B94JoICuKRBhxfSQsMgYeY
+```
+
+
+---
+
+基于 Qwen3-30B（128 experts, 48 layers, activate 8 experts）的测试来看：
+- 每一层的 expert activation 并没有做到负载均衡，std/mean 的值都接近 1
+- 最后几层的 std 明显比前面层的 std 大
+
+
+
+TBD
+- [ ] 不同 workload 的 expert activation 是否有显著区别
+- [ ] 相邻层的 expert activation 是否有关联
+- [ ] temporal pattern 和全局的关联
+- [ ] 理解 EP 浴盆曲线
+- [ ] 列个表，survey 现有工作的 points，和我们测试的对比
+- [ ] reasoning 与 non reasoning 在同一个 session 混合
+