chore: update ablation and clean configs
This commit is contained in:
21
README.md
21
README.md
@@ -58,11 +58,19 @@ Prints `summary.json` to stdout and writes the full output directory
|
||||
target/release/kvcache-sim ablate \
|
||||
--config configs/glm5-8xb200-hf.yaml \
|
||||
--routers random,least_loaded,least_tokens,min_pd,prefix_affinity \
|
||||
--evict-policies lru \
|
||||
--output-dir runs/glm5_ablation
|
||||
```
|
||||
|
||||
Writes one subdirectory per router plus a combined
|
||||
`ablation.json` with side-by-side summaries.
|
||||
Writes `ablation.json` with one row per `router x evict_policy`.
|
||||
|
||||
`ablate` currently supports only `lru` as a valid eviction policy. The
|
||||
aggregated output keeps the online prefill-time metrics
|
||||
(`ttft_mean/p50/p95/p99`) and omits `e2e`.
|
||||
|
||||
The previous replay-based `belady` approximation has been removed from
|
||||
the CLI because it was not an exact full-hierarchy Belady algorithm and
|
||||
could produce misleading comparisons against `lru`.
|
||||
|
||||
### 3. Compute theoretical hit-rate ceilings (oracle)
|
||||
|
||||
@@ -115,7 +123,8 @@ so the same config can be reused across sweeps:
|
||||
| `--ttl-seconds <S>` | `cluster.meta_store.ttl_seconds` |
|
||||
|
||||
`oracle` additionally takes `--capacity-blocks <N>` / `--per-instance`
|
||||
and `--out <PATH>`. `ablate` additionally takes `--routers <csv>`.
|
||||
and `--out <PATH>`. `ablate` additionally takes `--routers <csv>` and
|
||||
`--evict-policies <csv>` (currently only `lru`).
|
||||
|
||||
## Router modes
|
||||
|
||||
@@ -288,12 +297,8 @@ memory_time = layers * weight_bytes_per_layer / gpu_mem_bw
|
||||
| Config | Model | Hardware | Instances | Trace |
|
||||
|--------|-------|----------|-----------|-------|
|
||||
| `glm5-8xb200-hf.yaml` | GLM-5 via HF config.json | 8xB200 preset | 32 | GLM coder blk512 |
|
||||
| `glm5-8xb200-blk512.yaml` | GLM-5 inline | 8xB200 inline | 64 | GLM coder blk512 |
|
||||
| `glm5-8xb200.yaml` | GLM-5 inline | 8xB200 inline | 8 | GLM coder blk512 |
|
||||
| `glm5-nvfp4-8xb300.yaml` | GLM-5-NVFP4 via HF config.json | 8xB300 preset | 8 | GLM coder blk512 |
|
||||
| `qwen3-coder-480b-8xh20.yaml` | Qwen3-Coder via HF | 8xH20 preset | 32 | Qwen coder blk16 |
|
||||
| `qwen2.5-coder-7b-h800.yaml` | Qwen2.5-7B inline | H800 inline | 16 | Qwen coder blk16 |
|
||||
| `qwen2.5-coder-7b-preset.yaml` | Qwen2.5-7B inline | H800 preset | 16 | Qwen coder blk16 |
|
||||
| `qwen2.5-coder-32b-h800.yaml` | Qwen2.5-32B inline | H800 inline | 16 | Qwen coder blk16 |
|
||||
|
||||
## Outputs
|
||||
|
||||
|
||||
Reference in New Issue
Block a user