Working-set sizing tool + GLM-5.1-FP8/B300 result
Configurable KV working-set analyzer (GPU model x TP/PP/EP x model config.json with MLA/GQA auto x KV/weight dtype). Computes Denning W(T), oracle [first,last], and retain-forever footprints vs a per-replica KV pool, plus the APC captured at each retention window. GLM-5.1-FP8 (MLA, 43.9 KiB/token) on 1x B300 node (1528 GB KV pool): live KV fits trivially (~533 GB), but the full 80.4% APC ceiling needs ~14 nodes (oracle) -> long-tail reuse motivates DRAM offload, not HBM. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
BIN
figs/working_set/glm5_fp8_tp8_b300.png
Normal file
BIN
figs/working_set/glm5_fp8_tp8_b300.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 134 KiB |
Reference in New Issue
Block a user