gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang 8d422c4301 Migration trigger validation: unified_v4 fires at 2x QPS, not at 1x

Ran unified vs unified_v4 A/B on dash2 (8×H20, kv_both+DR-fix substrate,
w600_r0.0015_st30_first600s trace). Key findings:

- At 1x QPS (~1.3 req/s): zero migrations. pending_prefill_tokens is 0 for
  95% of routing decisions because instances complete prefill before the next
  request arrives. The relative arm (src_pp > fleet_median*1.5) never fires.
- At 2x QPS (~2.7 req/s): 4 migrations (0.5%). src_pp>0 rises to 24% of
  eligible decisions. Trigger correctly identifies genuinely overloaded
  instances (src_pp 13k–73k vs fleet median 3.8k–33k).

Conclusion: mechanism is correct but migration benefit requires higher
concurrency (scale-out or >3x QPS) where queue pressure makes the signal
non-trivial. At single-node 8-instance scale, Pillar 1 (affinity routing)
is sufficient and Pillar 2 gracefully degrades to no-op.

Next: scale-out validation (16+ GPU) where session skew naturally
concentrates load and triggers migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-30 15:36:58 +08:00

Migration trigger validation: unified_v4 fires at 2x QPS, not at 1x

2026-05-30 15:36:58 +08:00

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

Workload characterization C1-C3 on full production trace

2026-05-29 18:19:39 +08:00

PD-disagg crossover: regular synthetic trace + goodput sweep + figure

2026-05-29 18:19:23 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

Replayer: closed-loop inter-turn think-time mode

2026-05-29 18:19:12 +08:00

Add leastwork_kappa decode-aware ablation (net-negative, documented)

2026-05-29 17:07:23 +08:00

unified_v2.1: relax gates + add unified_kv_both isolation control

2026-05-26 10:40:57 +08:00

third_party/vllm

Gate evict_sent_blocks behind VLLM_EVICT_SENT_BLOCKS

2026-05-29 18:18:59 +08:00

600s-truncated trace + LPWL 5-policy results

2026-05-29 16:08:35 +08:00

v2 exp(a): add remote KV-store (RDMA) tier

2026-05-30 12:48:37 +08:00

.gitignore

600s-truncated trace + LPWL 5-policy results

2026-05-29 16:08:35 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

MEETING.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

PAPER_OUTLINE.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

pyproject.toml

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

REPORT.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

RESULTS_SUMMARY.md

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00