Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00
commit a57afa86b4
323 changed files with 42569 additions and 0 deletions
--- a/projects/auto-tuner/Theoretical
+++ b/projects/auto-tuner/Theoretical
@@ -0,0 +1,89 @@
+### prefill
+
+对于 attention：
+
+对一个序列长度为 S 的 transformer attention（多头、按 d = H * head_dim）：
+
+1. **Q/K/V 投影**（一次性做 3 个线性变换）：
+    
+    FLOPsQKV_proj≈2×S×d×(3d)=6Sd2FLOPsQKV_proj≈2×S×d×(3d)=6Sd2
+    
+    （矩阵乘法的常用近似：2·m·n·k）
+    
+2. **Attention 矩阵乘（Q·K^T）**：
+    
+    FLOPsQK≈2×H×S×dhead×S=2S2dFLOPsQK≈2×H×S×dhead×S=2S2d
+    
+    因为 Hdhead=dHdhead=d。
+    
+3. **Attention·V（权重与 V 相乘）**：
+    
+    FLOPsAV≈2S2dFLOPsAV≈2S2d
+4. **输出投影**（从 heads 拼回 d 再线性变换）：
+    
+    FLOPsout≈2Sd2FLOPsout≈2Sd2
+
+总 FLOPs：$8Sd^2 + 4S^2d$
+
+$$
+T_{\text{comp}} = \frac{\text{FLOPs}_{\text{per\_GPU}}}{\text{peak\_flops\_per\_GPU} \times \text{compute\_utils}}
+$$
+
+
+总 memory：$\text{bytes}_\text{prefill} \approx N \cdot \alpha \cdot BLd \cdot \text{elem\_bytes}$，$\alpha \sim 6$
+
+
+$$
+T_{\text{mem}} = \frac{\text{bytes}_{\text{per\_GPU}}}{\text{bandwidth\_per\_GPU} \times \text{mem\_utils}}
+$$
+
+
+
+
+### decode
+
+总 FLOPs：$8d^2 + 4dL$
+
+总 memory：$\text{bytes}_\text{decode} \approx N \cdot \beta \cdot BLd \cdot \text{elem\_bytes}$，$\beta \sim 4$
+
+
+
+$\text{output} = \text{SiLU}(xW_1)W_2$
+
+TP 下，每 token $T$ 激活的 expert $E$ 的通信为：
+1. 输入 $x$ AllGather 到所有 TP 节点，通信量：hidden_size * (TP - 1)
+2. 每个 TP 节点独立计算 $xW_1'$
+3. AllGather 后每个节点得到完整的 $xW_1$，通信量：moe_intermediate_size / TP * (TP - 1)
+4. 每个节点计算 SiLU 和 $IW2'$，AllReduce 每个节点得到完整的 output，通信量：hidden_size * (TP - 1)
+
+
+EP 下：dispatch+combine
+
+2 * hidden_size * (EP - 1) / EP （假设负载均衡）
+
+
+
+---
+With batch_size=2000, seq_len=2048, EP=8
+
+Qwen-235B:
+attention comp time 0.06944874306412531
+moe combine comm time 0.00028672
+moe comp time 0.0004069259060131379
+moe comm time with TP 0.00045056
+
+Qwen-30B:
+attention comp time 0.01736218682573421
+moe combine comm time 0.00014336
+moe comp time 0.00010173147650328448
+moe comm time with TP 0.00022528
+
+
+EP=64:
+Qwen-235B:
+attention comp time 0.06944874306412531
+moe combine comm time 0.00032256
+moe comp time 0.0004069259060131379
+moe comm time with TP 0.00045056
+
+