gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang 6b255fad91 Unified routing: single argmin(expected_latency) over all instances

Replace two-phase routing (pick_instance → offload gate) with a single
cost function evaluated per instance:

  latency(D) = queue(D) + prefill_time(D) + transfer_cost(D)

  - If D has local cache: prefill = (input - local_hit) / throughput
  - If D can receive PUSH from cache source: prefill = (input - push_hit) / throughput + rdma
  - Otherwise: prefill = input / throughput (cold)

Choose argmin(latency). If the winner needs PUSH → trigger migration.

Removed:
- WARM/MEDIUM/HEAVY classification (no routing purpose)
- heavy_threshold, overload_factor, max_offload_inflight, cache_gate_ratio
- Interference penalty magic number (0.3)
- Separate pick_instance + offload gate stages

Only 2 measured parameters remain:
- prefill_throughput = 7000 tokens/s (H20 measured)
- rdma_overhead_s = 0.1s (RDMA PUSH measured)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-24 02:21:34 +08:00

Add comprehensive research findings document

2026-05-23 07:16:31 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

replayer: wire --max-inflight-sessions cap into replay loop (B2)

2026-05-23 21:04:09 +08:00

Unified routing: single argmin(expected_latency) over all instances

2026-05-24 02:21:34 +08:00

proxy: Settings dataclass + cache-ratio gate + P-pick offload penalty (B4, M2, M3, D5)

2026-05-23 21:11:17 +08:00

third_party/vllm

Switch from RDMA READ to bootstrap-triggered PUSH

2026-05-24 01:47:49 +08:00

.gitignore

Phase 1 milestone: system-level analysis + reproducible report

2026-05-22 16:17:41 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

pyproject.toml

tests: add minimal coverage for percentile + proxy routing (S1)

2026-05-23 21:07:14 +08:00

REPORT.md

Report §3.8: Document direct KV cache migration architecture + bugs fixed

2026-05-24 01:52:38 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix NONE_HASH import: use module ref instead of from-import (value binding bug)

2026-05-24 01:32:19 +08:00