gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang cdf83493ab Fix A+C: real cache sync + cached-prefill-on-C architecture

A: Add /estimate_hit endpoint to bootstrap server for real-time cache
   probing. Proxy queries this before committing to PUSH, eliminating
   24% zero-match PUSH requests (shadow cache divergence).

C: Add _handle_cached_prefill_offload: C (cache source) does fast
   cached prefill → KV to Mooncake → D pulls and decodes.
   Replaces broken direct_read PUSH where D waited for RDMA transfer
   while occupying KV blocks without doing compute.

Also: update §3.9 baseline to plain vLLM with full mean/p50/p90/p99.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-24 11:22:38 +08:00

Add comprehensive research findings document

2026-05-23 07:16:31 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

replayer: wire --max-inflight-sessions cap into replay loop (B2)

2026-05-23 21:04:09 +08:00

Fix A+C: real cache sync + cached-prefill-on-C architecture

2026-05-24 11:22:38 +08:00

proxy: Settings dataclass + cache-ratio gate + P-pick offload penalty (B4, M2, M3, D5)

2026-05-23 21:11:17 +08:00

third_party/vllm

Fix A+C: real cache sync + cached-prefill-on-C architecture

2026-05-24 11:22:38 +08:00

.gitignore

Phase 1 milestone: system-level analysis + reproducible report

2026-05-22 16:17:41 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

pyproject.toml

tests: add minimal coverage for percentile + proxy routing (S1)

2026-05-23 21:07:14 +08:00

REPORT.md

Fix A+C: real cache sync + cached-prefill-on-C architecture

2026-05-24 11:22:38 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix NONE_HASH import: use module ref instead of from-import (value binding bug)

2026-05-24 01:32:19 +08:00