gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang d9cf3126c6 docs: reframe PAPER_OUTLINE to GPU-hit-first + embed v2 figures

Reorganizes the outline from the EAR / dispatch-coupling framing (kept in
git history) into the GPU-hit-first structure:

- §1 background splits PD-colo / PD-disagg / KV storage hierarchy, each with
  a forward pointer to where it is used or refuted.
- §2 leads with the metric argument (request latency / TPS / GPU util, not
  TTFT/TPOT); dispatch coupling is demoted to that justification. §2.2 embeds
  the two new v2 figures -- the measured 4-tier hit hierarchy
  (GPU < CPU-local < remote-RDMA-store << miss) and the capacity->APC/latency
  knee (Evidence #1) -- plus the cluster-scale correction to the working_set
  "14 nodes" number.
- §3 recasts the three optimizations as corollaries of GPU-hit-first:
  make PD-colocation default (3.1), biased KV-awareness routing (3.2),
  dedup via migration not replication (3.3).
- §5 related work now engages the storage-hierarchy camp directly.
- Validation-status table and work plan updated (top priority: wall-clock
  amplification sweep).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-30 13:34:19 +08:00

Workload characterization C1-C3 on full production trace

2026-05-29 18:19:39 +08:00

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

Workload characterization C1-C3 on full production trace

2026-05-29 18:19:39 +08:00

PD-disagg crossover: regular synthetic trace + goodput sweep + figure

2026-05-29 18:19:23 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

Replayer: closed-loop inter-turn think-time mode

2026-05-29 18:19:12 +08:00

Add leastwork_kappa decode-aware ablation (net-negative, documented)

2026-05-29 17:07:23 +08:00

unified_v2.1: relax gates + add unified_kv_both isolation control

2026-05-26 10:40:57 +08:00

third_party/vllm

Gate evict_sent_blocks behind VLLM_EVICT_SENT_BLOCKS

2026-05-29 18:18:59 +08:00

600s-truncated trace + LPWL 5-policy results

2026-05-29 16:08:35 +08:00

v2 exp(a): add remote KV-store (RDMA) tier

2026-05-30 12:48:37 +08:00

.gitignore

600s-truncated trace + LPWL 5-policy results

2026-05-29 16:08:35 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

MEETING.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

PAPER_OUTLINE.md

docs: reframe PAPER_OUTLINE to GPU-hit-first + embed v2 figures

2026-05-30 13:34:19 +08:00

pyproject.toml

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

REPORT.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

RESULTS_SUMMARY.md

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00