agentic-kvc

Author	SHA1	Message	Date
Gahow Wang	4cd71b6631	Working-set figure: extend left panel to ~50 nodes Include T=600s/1800s points so the diminishing-returns tail is visible: 14 -> 52 nodes buys only +6pp APC (74%->79.8%), still under the 80.4% ceiling that oracle/LRU reaches at 14 nodes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 17:11:12 +08:00
Gahow Wang	2247d1de08	Working-set figure: right panel = W(t) time series Replace the (redundant) nodes-vs-T cost curve with the working-set W(t) over wall-clock time for T=2/30/300s. Shows footprint is steady (peak ~ median) after a short warm-up, so peak-based sizing is sound; the 300s curve hugs the 14-node ceiling throughout. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 16:31:26 +08:00
Gahow Wang	c94b2e237a	Working-set figure: linear node axes + benefit/cost split Drop log node axis (decade ticks were unreadable). Left = APC vs #nodes (linear), right = #nodes vs retention window T. Mark the 1-node budget crossing (~7s reuse, ~8% APC) and the 14-node oracle ceiling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 16:24:15 +08:00
Gahow Wang	3b8be5bb61	Working-set figure: express footprint in node count, not GB Both axes now in "# nodes" (footprint / per-node KV pool) so the cluster-size implication is direct: 1-node budget line + 14-node oracle ceiling, instead of raw GB. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 16:16:00 +08:00
Gahow Wang	dae98c6472	Working-set sizing tool + GLM-5.1-FP8/B300 result Configurable KV working-set analyzer (GPU model x TP/PP/EP x model config.json with MLA/GQA auto x KV/weight dtype). Computes Denning W(T), oracle [first,last], and retain-forever footprints vs a per-replica KV pool, plus the APC captured at each retention window. GLM-5.1-FP8 (MLA, 43.9 KiB/token) on 1x B300 node (1528 GB KV pool): live KV fits trivially (~533 GB), but the full 80.4% APC ceiling needs ~14 nodes (oracle) -> long-tail reuse motivates DRAM offload, not HBM. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 16:03:25 +08:00

5 Commits