gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang c8ec73c548 Connector tax: high-concurrency confirms +7-9% tax, resolves trace-replay gap

High-concurrency test (512 input, 64 output, rates 4-32 req/s):
  Rate=8:  plain TTFT p90=94ms, mooncake_both=102ms → +9% tax
  Rate=16: plain TTFT p90=144ms, mooncake_both=156ms → +8% tax
  Rate=32: both saturated at ~6.1s → no distinguishable difference

Low-concurrency back-to-back retest (4096 input, 256 output):
  mooncake_both_v2 vs plain_v2: tax is ≈0% (within noise)
  because scheduler's 1.4ms/step is hidden behind model forward.

Decomposition of trace-replay's +45%:
  +7-9% from build_connector_meta per-step cost (this microbench)
  +20-30% from multi-instance coupling amplification (not measurable here)
  remainder from large-cache O(|cache|) scaling (Phase B follow-up)

Also: bench_loop.py now emits mean/p50/p90/p99 for all three metrics.

2026-05-26 21:00:25 +08:00

Add NIXL substrate isolation control + attribution decomposition

2026-05-26 16:02:12 +08:00

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

Connector tax: high-concurrency confirms +7-9% tax, resolves trace-replay gap

2026-05-26 21:00:25 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

A4: open-loop session-causal SRR loadgen

2026-05-25 16:19:20 +08:00

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00

unified_v2.1: relax gates + add unified_kv_both isolation control

2026-05-26 10:40:57 +08:00

third_party/vllm

A3: vLLM scheduler patch for step-level JSONL log

2026-05-25 16:19:11 +08:00

.gitignore

Phase 1 milestone: system-level analysis + reproducible report

2026-05-22 16:17:41 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

pyproject.toml

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

REPORT.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00