agentic-kvc

Go to file

Gahow Wang 5c66f500fc Fix offload gate: remove cache_gate for direct RDMA read, fix cost model

The cache_gate_ratio=0.3 check blocked 83/112 HEAVY requests (75%)
because they were cold (cache_ratio=0). But with direct RDMA read,
D reads C's cached blocks via RDMA regardless of cache ratio — the
gate was protecting against the OLD flow (C does prefill + push).

Also fixed cost model: offload_cost now reflects direct read reality:
  OLD: P_queue + P_full_prefill + RDMA (P has no cache → expensive)
  NEW: D_queue + RDMA_read + D_local_prefill(new_tokens)

Offload wins when C_s queue > RDMA_overhead (~2s).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-23 22:01:43 +08:00

analysis

Add comprehensive research findings document

2026-05-23 07:16:31 +08:00

experiments

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

patches

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

replayer

replayer: wire --max-inflight-sessions cap into replay loop (B2)