feat: new router and benchmark setup
This commit is contained in:
@@ -227,6 +227,7 @@ coefficients (`flops_per_token_prefill`, `attn_quadratic_coeff`, etc.).
|
||||
| Model | Path | Architecture |
|
||||
|-------|------|--------------|
|
||||
| GLM-5 (744B/40B-active) | `models/GLM-5/config.json` | MoE (256 routed, 8 active, 1 shared) + MLA + DSA |
|
||||
| GLM-5-FP8 | `models/GLM-5-FP8/config.json` | GLM-5 architecture + upstream FP8 quantization metadata |
|
||||
| Qwen3-Coder-480B-A35B FP8 | `models/Qwen3-Coder-480B-A35B-Instruct-FP8/config.json` | MoE (160 experts, 8 active) + GQA |
|
||||
|
||||
## Hardware configuration
|
||||
@@ -248,6 +249,7 @@ Available presets:
|
||||
| `h100` | 989 TFLOPS | 80 GB | 3.35 TB/s | Gen5 |
|
||||
| `h800` | 989 TFLOPS | 80 GB | 3.35 TB/s | Gen5 |
|
||||
| `h20` | 148 TFLOPS | 96 GB | 4.0 TB/s | Gen5 |
|
||||
| `h20-141g` | 148 TFLOPS | 141 GB | 4.8 TB/s | Gen5 |
|
||||
| `a100-80gb` | 312 TFLOPS | 80 GB | 2.0 TB/s | Gen4 |
|
||||
| `a100-40gb` | 312 TFLOPS | 40 GB | 1.555 TB/s | Gen4 |
|
||||
| `b200` | 2.25 PFLOPS| 192 GB | 8.0 TB/s | Gen6 |
|
||||
@@ -297,6 +299,7 @@ memory_time = layers * weight_bytes_per_layer / gpu_mem_bw
|
||||
| Config | Model | Hardware | Instances | Trace |
|
||||
|--------|-------|----------|-----------|-------|
|
||||
| `glm5-8xb200-hf.yaml` | GLM-5 via HF config.json | 8xB200 preset | 32 | GLM coder blk512 |
|
||||
| `glm5-fp8-8xh20-141g.yaml` | GLM-5-FP8 via ModelScope config.json | 8xH20-141G preset | 128 | GLM coder blk512 |
|
||||
| `glm5-nvfp4-8xb300.yaml` | GLM-5-NVFP4 via HF config.json | 8xB300 preset | 8 | GLM coder blk512 |
|
||||
| `qwen3-coder-480b-8xh20.yaml` | Qwen3-Coder via HF | 8xH20 preset | 32 | Qwen coder blk16 |
|
||||
|
||||
|
||||
Reference in New Issue
Block a user