5 Commits

Author SHA1 Message Date
4cd71b6631 Working-set figure: extend left panel to ~50 nodes
Include T=600s/1800s points so the diminishing-returns tail is visible:
14 -> 52 nodes buys only +6pp APC (74%->79.8%), still under the 80.4%
ceiling that oracle/LRU reaches at 14 nodes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 17:11:12 +08:00
2247d1de08 Working-set figure: right panel = W(t) time series
Replace the (redundant) nodes-vs-T cost curve with the working-set
W(t) over wall-clock time for T=2/30/300s. Shows footprint is steady
(peak ~ median) after a short warm-up, so peak-based sizing is sound;
the 300s curve hugs the 14-node ceiling throughout.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 16:31:26 +08:00
c94b2e237a Working-set figure: linear node axes + benefit/cost split
Drop log node axis (decade ticks were unreadable). Left = APC vs #nodes
(linear), right = #nodes vs retention window T. Mark the 1-node budget
crossing (~7s reuse, ~8% APC) and the 14-node oracle ceiling.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 16:24:15 +08:00
3b8be5bb61 Working-set figure: express footprint in node count, not GB
Both axes now in "# nodes" (footprint / per-node KV pool) so the
cluster-size implication is direct: 1-node budget line + 14-node oracle
ceiling, instead of raw GB.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 16:16:00 +08:00
dae98c6472 Working-set sizing tool + GLM-5.1-FP8/B300 result
Configurable KV working-set analyzer (GPU model x TP/PP/EP x model
config.json with MLA/GQA auto x KV/weight dtype). Computes Denning W(T),
oracle [first,last], and retain-forever footprints vs a per-replica KV
pool, plus the APC captured at each retention window.

GLM-5.1-FP8 (MLA, 43.9 KiB/token) on 1x B300 node (1528 GB KV pool):
live KV fits trivially (~533 GB), but the full 80.4% APC ceiling needs
~14 nodes (oracle) -> long-tail reuse motivates DRAM offload, not HBM.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 16:03:25 +08:00