agentic-kvc/analysis at 2247d1de08cde07cd8d6eb21aca0bed6d5ceaeec - agentic-kvc - Local Gitea

gahow/agentic-kvc

Files

History

Gahow Wang dae98c6472 Working-set sizing tool + GLM-5.1-FP8/B300 result

Configurable KV working-set analyzer (GPU model x TP/PP/EP x model
config.json with MLA/GQA auto x KV/weight dtype). Computes Denning W(T),
oracle [first,last], and retain-forever footprints vs a per-replica KV
pool, plus the APC captured at each retention window.

GLM-5.1-FP8 (MLA, 43.9 KiB/token) on 1x B300 node (1528 GB KV pool):
live KV fits trivially (~533 GB), but the full 80.4% APC ceiling needs
~14 nodes (oracle) -> long-tail reuse motivates DRAM offload, not HBM.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-28 16:03:25 +08:00

..

characterization

Add chatbot T_external CDF; overlay on f3a vs agentic

2026-05-27 14:49:44 +08:00

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

MB5 analysis: per-role KV split proves static-partition mismatch

2026-05-28 12:05:17 +08:00

pd_sep_paper_section

PD-sep matrix results: C2/C3/C4 figures + empirical mechanism refined

2026-05-25 16:23:52 +08:00

Working-set sizing tool + GLM-5.1-FP8/B300 result

2026-05-28 16:03:25 +08:00

adaptive_prefill_offload_design.md

Design doc: Adaptive Prefill Offload

2026-05-22 00:44:22 +08:00

agentic_pd_unified_story_plan.md

Agentic PD / Unified routing story plan draft

2026-05-26 01:12:42 +08:00

characterization_todo_for_interns.md

Audit package refresh: Window 1 supported claims + risk register

2026-05-25 23:25:27 +08:00

claude_characterization_work_plan.md

Characterization plan: progress snapshot + Claude work plan

2026-05-25 16:18:41 +08:00

elastic_hypotheses.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

elastic_offload_design.md

Elastic P2P offload: TTFT p50 -49% vs baseline (0.551 vs 1.080)

2026-05-22 13:50:25 +08:00

kv_lifecycle_design.md

KV cache lifecycle design + eviction loss analysis

2026-05-22 01:27:22 +08:00

overnight_work_report.md

Update report: adaptive v2 confirms no KV transfer helps single-machine

2026-05-22 10:15:08 +08:00

pd_separation_analysis.md

Invalidate prior A/B results + add proper experiment harness

2026-05-22 17:54:21 +08:00

research_findings.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

unified_routing_fix_review.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00