gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang a0db3cbe77 Add leastwork_kappa decode-aware ablation (net-negative, documented)

--policy leastwork_kappa + --kappa (default 2.5e-6, derived from KV ~100KB/tok
/ HBM 4TB/s / TPOT 10ms on H20+Qwen3-30B-A3B): score = prefill_work * (1 +
kappa * ongoing_decode_tokens), modelling decode as a fractional throughput tax
on a new prefill.

Result on the 600s trace: NET-NEGATIVE vs plain leastwork — TTFT p90 +18%,
E2E p90 +14%, balance 1.55x->1.97x, and it does NOT fix the E2E-p99 it targeted.
Decode is too cheap in agentic (output p50~80) for the term to help; it just
bounces heavy reqs off their cache-owner into cold re-prefill. The E2E-p99 tail
is the structural HEAVY+>50k floor (per-class p99 ~51-52k for ALL policies), not
decode interference. Kept in-tree as a documented ablation justifying LPWL's
omission of any decode term; do not revive without a decode-heavy regime.
See analysis/lpwl_5policy_600s.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-29 17:07:23 +08:00

Add leastwork_kappa decode-aware ablation (net-negative, documented)

2026-05-29 17:07:23 +08:00

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

PD_DISAGG_RESULTS §6.3: producer hot-pinning figure

2026-05-29 00:38:20 +08:00

600s-truncated trace + LPWL 5-policy results

2026-05-29 16:08:35 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

EAR outline: copy reusable figures, mark migration sections deferred

2026-05-27 01:44:13 +08:00

Add leastwork_kappa decode-aware ablation (net-negative, documented)

2026-05-29 17:07:23 +08:00

unified_v2.1: relax gates + add unified_kv_both isolation control

2026-05-26 10:40:57 +08:00

third_party/vllm

A3: vLLM scheduler patch for step-level JSONL log

2026-05-25 16:19:11 +08:00

600s-truncated trace + LPWL 5-policy results

2026-05-29 16:08:35 +08:00

.gitignore

600s-truncated trace + LPWL 5-policy results

2026-05-29 16:08:35 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

MEETING.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

PAPER_OUTLINE.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

pyproject.toml

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

REPORT.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

RESULTS_SUMMARY.md

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00