gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang 3957c2df86 MB5 patch: clamp PD-consumer metrics counter underflow

Root cause of the 6P+2D run-to-run collapse (rep1 100%, rep2 56%,
rep3 80%, session-routing 6.6%): not load-shedding, but a consumer
EngineCore crash.

Failure chain observed in the consumer logs:
  1. D-pool fills to ~97% (decode-side capacity ceiling, the H1 story)
  2. a large request's KV transfer fails: "Mooncake transfer engine
     returned -1" (112k-token request, pool full)
  3. scheduler fails the request (kv_load_failure_policy=fail)
  4. PromptTokenStats.local_cache_hit = num_cached + recomputed -
     num_external_computed goes NEGATIVE (external transfer exceeded
     cached count)
  5. loggers.record() calls Counter.inc(negative) -> prometheus raises
     "Counters can only be incremented by non-negative amounts."
  6. EngineCore dies -> every subsequent request fails (the cliff:
     all successes in the first ~110s, zero after)

This turns ONE failed request into a total config collapse, and is
what made the round-robin 6P+2D reps look randomly variable.

Fix: clamp the three per-source prompt-token counts to >= 0 in
loggers.record() before they hit Counter.inc(). Pure insertion,
revertible via the existing sentinel mechanism. Lets a transfer
failure stay a single failed request instead of killing the engine,
so routing arms can be compared on equal footing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-28 13:01:23 +08:00

MB5 analysis: per-role KV split proves static-partition mismatch

2026-05-28 12:05:17 +08:00

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

MB5 analysis: per-role KV split proves static-partition mismatch

2026-05-28 12:05:17 +08:00

MB5 patch: clamp PD-consumer metrics counter underflow

2026-05-28 13:01:23 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

EAR outline: copy reusable figures, mark migration sections deferred

2026-05-27 01:44:13 +08:00

Proxy/runner support for Nixl connector + unified_v3 (offload-decode) policy

2026-05-27 22:05:19 +08:00

unified_v2.1: relax gates + add unified_kv_both isolation control

2026-05-26 10:40:57 +08:00

third_party/vllm

A3: vLLM scheduler patch for step-level JSONL log

2026-05-25 16:19:11 +08:00

.gitignore

Phase 1 milestone: system-level analysis + reproducible report

2026-05-22 16:17:41 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

MEETING.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

PAPER_OUTLINE.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

pyproject.toml

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

REPORT.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

RESULTS_SUMMARY.md

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00