Initial commit: obsidian to gitea
This commit is contained in:
38
projects/moe-autoscaling/Ongoing.md
Normal file
38
projects/moe-autoscaling/Ongoing.md
Normal file
@@ -0,0 +1,38 @@
|
||||
if sent_method not in [
|
||||
+ "determine_num_available_blocks",
|
||||
+ "initialize_cache",
|
||||
+ ]:
|
||||
|
||||
|
||||
|
||||
ray 2.46.0 -> 2.47.1
|
||||
ipconfig -a
|
||||
|
||||
`VLLM_USE_PRECOMPILED=1 pip install --editable .`
|
||||
|
||||
|
||||
```
|
||||
[Credentials]
|
||||
language=EN
|
||||
endpoint=oss-cn-hangzhou.aliyuncs.com
|
||||
accessKeyID=LTAIJO7wLG9y8KJH
|
||||
accessKeySecret=nbx8fIu9B94JoICuKRBhxfSQsMgYeY
|
||||
```
|
||||
|
||||
|
||||
---
|
||||
|
||||
基于 Qwen3-30B(128 experts, 48 layers, activate 8 experts)的测试来看:
|
||||
- 每一层的 expert activation 并没有做到负载均衡,std/mean 的值都接近 1
|
||||
- 最后几层的 std 明显比前面层的 std 大
|
||||
|
||||
|
||||
|
||||
TBD
|
||||
- [ ] 不同 workload 的 expert activation 是否有显著区别
|
||||
- [ ] 相邻层的 expert activation 是否有关联
|
||||
- [ ] temporal pattern 和全局的关联
|
||||
- [ ] 理解 EP 浴盆曲线
|
||||
- [ ] 列个表,survey 现有工作的 points,和我们测试的对比
|
||||
- [ ] reasoning 与 non reasoning 在同一个 session 混合
|
||||
|
||||
Reference in New Issue
Block a user