gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang 020a5c79a7 §3.3 reframe: hot pin failure is uniformly-slow workers, not max/median ratio

User pointed out the apparent paradox: in fig_b3_per_worker_ttft_p90, unified
has hotspot index 3.67 while sticky has 2.73, yet unified e2e p90 is roughly
half of sticky's. Resolution: hotspot index (max/median) is a *ratio* and
misleading on its own. Per-worker absolute TTFT p90:

  sticky : median 20.3s, max 55.4s -> system e2e p90 34.6s
  unified: median 10.3s, max 37.7s -> system e2e p90 18.0s

Mechanism: top 1% sessions own 46.5% input mass and there are more hot
sessions than instances (8), so sticky's hash binding gives *every* worker
its own hot session and the median worker is also slow. Unified's LMetric
fallback re-routes cold/new sessions away from hot affinity instances,
preserving 7/8 worker speed. System p90 is dominated by the majority of
requests landing on fast workers, hence the 2x e2e gap.

Changes:
- Replace §3.3 figure with figs/f4c_per_worker_ttft.png (per-worker bars)
  instead of figs/f4c_apc_vs_hotspot_tradeoff.png (the ratio scatter)
- §3.3 narrative in PAPER_OUTLINE.md and MEETING.md rewritten around
  absolute median + max + system e2e p90 instead of hotspot ratio
- Add a §3.3 sub-finding: "hot pin failure must be measured with
  per-worker absolute latency, not normalized ratio"
- Keep the scatter as supplementary for §5 multi-policy summary

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-27 10:10:23 +08:00

Add NIXL substrate isolation control + attribution decomposition

2026-05-26 16:02:12 +08:00

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

§3.3 reframe: hot pin failure is uniformly-slow workers, not max/median ratio

2026-05-27 10:10:23 +08:00

Connector tax: trace-replay confirms +45% kv_both penalty is gone; DR-fix adds 22% more

2026-05-27 09:13:50 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

EAR outline: copy reusable figures, mark migration sections deferred

2026-05-27 01:44:13 +08:00

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00

unified_v2.1: relax gates + add unified_kv_both isolation control

2026-05-26 10:40:57 +08:00

third_party/vllm

A3: vLLM scheduler patch for step-level JSONL log

2026-05-25 16:19:11 +08:00

.gitignore

Phase 1 milestone: system-level analysis + reproducible report

2026-05-22 16:17:41 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

MEETING.md

§3.3 reframe: hot pin failure is uniformly-slow workers, not max/median ratio

2026-05-27 10:10:23 +08:00

PAPER_OUTLINE.md

§3.3 reframe: hot pin failure is uniformly-slow workers, not max/median ratio

2026-05-27 10:10:23 +08:00

pyproject.toml

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

REPORT.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00