chore: update ablation and clean configs

This commit is contained in:
2026-04-15 14:48:59 +08:00
parent eaf574cd4e
commit 365ceac3be
15 changed files with 879 additions and 324 deletions

View File

@@ -58,11 +58,19 @@ Prints `summary.json` to stdout and writes the full output directory
target/release/kvcache-sim ablate \
--config configs/glm5-8xb200-hf.yaml \
--routers random,least_loaded,least_tokens,min_pd,prefix_affinity \
--evict-policies lru \
--output-dir runs/glm5_ablation
```
Writes one subdirectory per router plus a combined
`ablation.json` with side-by-side summaries.
Writes `ablation.json` with one row per `router x evict_policy`.
`ablate` currently supports only `lru` as a valid eviction policy. The
aggregated output keeps the online prefill-time metrics
(`ttft_mean/p50/p95/p99`) and omits `e2e`.
The previous replay-based `belady` approximation has been removed from
the CLI because it was not an exact full-hierarchy Belady algorithm and
could produce misleading comparisons against `lru`.
### 3. Compute theoretical hit-rate ceilings (oracle)
@@ -115,7 +123,8 @@ so the same config can be reused across sweeps:
| `--ttl-seconds <S>` | `cluster.meta_store.ttl_seconds` |
`oracle` additionally takes `--capacity-blocks <N>` / `--per-instance`
and `--out <PATH>`. `ablate` additionally takes `--routers <csv>`.
and `--out <PATH>`. `ablate` additionally takes `--routers <csv>` and
`--evict-policies <csv>` (currently only `lru`).
## Router modes
@@ -288,12 +297,8 @@ memory_time = layers * weight_bytes_per_layer / gpu_mem_bw
| Config | Model | Hardware | Instances | Trace |
|--------|-------|----------|-----------|-------|
| `glm5-8xb200-hf.yaml` | GLM-5 via HF config.json | 8xB200 preset | 32 | GLM coder blk512 |
| `glm5-8xb200-blk512.yaml` | GLM-5 inline | 8xB200 inline | 64 | GLM coder blk512 |
| `glm5-8xb200.yaml` | GLM-5 inline | 8xB200 inline | 8 | GLM coder blk512 |
| `glm5-nvfp4-8xb300.yaml` | GLM-5-NVFP4 via HF config.json | 8xB300 preset | 8 | GLM coder blk512 |
| `qwen3-coder-480b-8xh20.yaml` | Qwen3-Coder via HF | 8xH20 preset | 32 | Qwen coder blk16 |
| `qwen2.5-coder-7b-h800.yaml` | Qwen2.5-7B inline | H800 inline | 16 | Qwen coder blk16 |
| `qwen2.5-coder-7b-preset.yaml` | Qwen2.5-7B inline | H800 preset | 16 | Qwen coder blk16 |
| `qwen2.5-coder-32b-h800.yaml` | Qwen2.5-32B inline | H800 inline | 16 | Qwen coder blk16 |
## Outputs