gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang a66f24d242 MB5 aggregate: cross-config KV-pool + latency comparison

Reads sweep root + tag, for each (config, rep):
- merges per-PID snapshots into cluster-wide KV timeline (carry-forward
  for PIDs without a sample in the bin)
- computes peak (max) and steady-state (10-90% median) pool utilization
- pulls latency p50/p90/p99 from replay_metrics.summary.json

Produces 4 outputs in --out-dir:
- mb5_kv_timeline.png    — N-panel cluster KV % over time, one panel per
                            config, faint per-rep lines + bold median
- mb5_peak_utilization.png — bar chart (peak vs steady) with ±std error bars
- mb5_latency_compare.png  — bar chart p50/p90/p99 e2e latency per config
- mb5_summary.csv          — flat per-(config, rep) table for the writeup

Validated on 4P+4D × 20-req smoke:
  4P+4D rep1: peak=12.8% steady=10.7% peak_wait=1
  p50=1.3s p90=10.5s p99=17.1s (vs. <1s for 8C — expected gap).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-28 00:49:21 +08:00

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

MB5 aggregate: cross-config KV-pool + latency comparison

2026-05-28 00:49:21 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

EAR outline: copy reusable figures, mark migration sections deferred

2026-05-27 01:44:13 +08:00

Proxy/runner support for Nixl connector + unified_v3 (offload-decode) policy

2026-05-27 22:05:19 +08:00

unified_v2.1: relax gates + add unified_kv_both isolation control

2026-05-26 10:40:57 +08:00

third_party/vllm

A3: vLLM scheduler patch for step-level JSONL log

2026-05-25 16:19:11 +08:00

.gitignore

Phase 1 milestone: system-level analysis + reproducible report

2026-05-22 16:17:41 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

MEETING.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

PAPER_OUTLINE.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

pyproject.toml

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

REPORT.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

RESULTS_SUMMARY.md

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00