Initial commit: obsidian to gitea
This commit is contained in:
58
study/conf/ChinaSys25-Spr.md
Normal file
58
study/conf/ChinaSys25-Spr.md
Normal file
@@ -0,0 +1,58 @@
|
||||
|
||||
- [x] HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
|
||||
- [ ] MILLION: MasterIng Long-Context LLM Inference Via Outlier-Immunized KV Product QuaNtization
|
||||
- [ ] Multiplexing Dynamic Deep Learning Workloads with SLO-awareness in GPU Clusters
|
||||
- [x] SimAI: 面向 AI 大规模集群的高精度仿真器
|
||||
scale up: the bigger, the better
|
||||
- [ ] SoMa:深度神经网络加速器 DRAM 通信调度空间的识别、探索与理解
|
||||
|
||||
|
||||
---
|
||||
内存存储优化:SSD 结构优化与落地的问题
|
||||
|
||||
|
||||
- 面向多核系统的高性能高可靠异步通信
|
||||
![[250524-113000.jpeg]]
|
||||
|
||||
衡量 AI 赋能开发的角度:AI 生成代码的上库率
|
||||
|
||||
|
||||
mini panel
|
||||
- startup 短板要长,否则会被 judge,市面上的产品功能,你做的不好
|
||||
- 形式化验证的速度慢,与开发速度之间的 gap
|
||||
|
||||
|
||||
KTransformers
|
||||
- AMX 指令加速 CPU 上的计算
|
||||
|
||||
---
|
||||
吞吐 = 时延 * 处理速度
|
||||
世界本质是稀疏的:MoE
|
||||
![[250525-095739.jpeg]]
|
||||
训练时 lookahead,使得支持 MTP
|
||||
![[250525-100200.jpeg]]
|
||||
|
||||
逐请求分布式(DP) -> 逐阶段分离(PD 分离) -> 逐层分布式(DeepEP、注意力卸载)
|
||||
|
||||
![[250525-100839.jpeg]]
|
||||
网络:全双工;内存读写:半双工
|
||||
**当前网络带宽已经超过内存带宽**
|
||||
|
||||
目标:EP 通信量均衡和 DP 计算量均衡
|
||||
![[250525-101147.jpeg]]
|
||||
|
||||
![[250525-101629.jpeg]]
|
||||
|
||||
2000+ H800,服务国内外全部 DeepSeek 流量,春节过后做了吞吐优化牺牲了延迟优化
|
||||
|
||||
|
||||
EP 动态扩缩容,EP32 -> EP320,PD role 的动态转化
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user