Files
obsidian/projects/auto-tuner/codex-problems.md

25 lines
2.0 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

测试的 config 极为受限,会在一个 knob 失效时就完全放弃
例如TP8 一次失败,就不再测试,但是 TP8 + EP 是可以跑的,而且效果好。说明 codex 还没有完全的 engine 配置的理解能力
```
- 真正的“为什么不能用 TP=8”是在后面被明确写出来的[codex_tuning_v2.jsonl (line 270)](https://file+.vscode-resource.vscode-cdn.net/Users/gahow/.vscode/extensions/openai.chatgpt-26.304.20706-darwin-arm64/webview/#) 和 [codex_tuning_v2.jsonl (line 272)](https://file+.vscode-resource.vscode-cdn.net/Users/gahow/.vscode/extensions/openai.chatgpt-26.304.20706-darwin-arm64/webview/#) 说明 TP=8 在这个 FP8 的 Qwen3-MoE checkpoint 上会让 TP-sharded MoE gate/up output size = 192而 FP8 权重量化要求的 block size 是 128192 不能整除 128所以模型初始化阶段不兼容。
- 这个结论在最终总结里又重复了一次:[codex_tuning_v2.jsonl (line 1154)](https://file+.vscode-resource.vscode-cdn.net/Users/gahow/.vscode/extensions/openai.chatgpt-26.304.20706-darwin-arm64/webview/#)。
```
```
| Rank | Exp | Config | Tput/GPU | TTFT p95 | SLO pass |
|---|---:|---|---:|---:|---:|
| 1 | 8 | tp4 b16 bt16384 s64 gm0.92 pc on ep off | 1448.724839 | 1.337592 | 97.58% |
| 2 | 13 | tp4 b16 bt16384 s64 gm0.92 pc on ep off | 1448.724767 | 1.328592 | 97.58% |
| 3 | 16 | tp4 b16 bt12288 s48 gm0.92 pc on ep off | 1448.722459 | 1.346786 | 97.75% |
| 4 | 14 | tp4 b32 bt16384 s64 gm0.92 pc on ep off | 1448.722451 | 1.382931 | 97.58% |
| 5 | 12 | tp4 b16 bt32768 s128 gm0.92 pc on ep off | 1448.719743 | 1.324819 | 97.58% |
| 6 | 11 | tp4 b16 bt8192 s32 gm0.92 pc on ep off | 1448.718885 | 1.314879 | 97.83% |
| 7 | 15 | tp4 b16 bt16384 s64 gm0.95 pc on ep off | 1448.715778 | 1.368400 | 97.50% |
| 8 | 17 | tp4 b16 bt16384 s64 gm0.92 pc on ep on | 1448.714795 | 1.864526 | 95.58% |
| 9 | 10 | tp4 b16 bt16384 s64 gm0.92 pc off ep off | 1448.437961 | 1.764754 | 95.50% |
| 10 | 9 | tp2 b16 bt16384 s64 gm0.92 pc on ep off | startup failed | - | - |
```