agentic-kvc

Author	SHA1	Message	Date
Gahow Wang	f42c715ec1	A4: open-loop session-causal SRR loadgen New replayer/srr.py drives a Poisson session-arrival load against the existing proxy, with strict per-session turn sequentiality, explicit warmup/steady/drain windows, and per-arrival fresh session_id + request_id so APC/session-affinity counters are not contaminated by repeated draws from the trace pool. Writes window_summary.json with attempted/completed/errored split by window so latency tails can be read on the steady-state window only. Required by Batch 4 SRR sweep; trace-timestamp dispatch in replay.py cannot drive arrival rate independently. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:19:20 +08:00
Gahow Wang	5816aad731	A3: vLLM scheduler patch for step-level JSONL log When AGENTIC_STEP_LOG_PATH is set, the scheduler emits one JSONL line per scheduler step with t_unix, worker_id, prefill/decode token counts, n_running/n_waiting, preempted ids, and per-request phase labels. No-op when the env var is unset, so production engines are not impacted. bench.sh now threads AGENTIC_STEP_LOG_DIR through to each per-engine launch so step logs end up at engine_${i}.jsonl. Required by Batch 2 (PD-colo interference index) and Batch 5 (same-worker overlap attribution); engine /metrics polling cannot provide per-step granularity. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:19:11 +08:00
Gahow Wang	fe556b5d98	A2: proxy worker-state snapshot and request-id passthrough Honor incoming X-Request-Id so replayer metrics and proxy breakdown share a join key. Each route decision now captures session_id, the full per-worker candidate-score snapshot (ongoing/pending/num_requests /cached_blocks plus both linear and lmetric scores), the chosen score, and unix timestamps for first-token and done events. A separate _worker_state_log records one row per decision and is exposed via GET /worker_state; GET /worker_state/latest returns a live snapshot without recording it. Required by Batch 3 (session hot-spot proof) and Batch 5 (failure attribution); existing breakdown.json had no per-worker state at decision time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:19:01 +08:00
Gahow Wang	d57e338366	A1: replayer instrumentation for cross-process join RequestMetrics gains absolute unix timestamps (t_dispatch_unix, t_first_token_unix, t_finish_unix), the proxy_request_id, the chosen endpoint URL, and the trace hash_ids. Replayer sends X-Request-Id: <session_id>:<turn_id>:<chat_id>:<idx> so proxy breakdown rows can be joined to metrics by exact key. Required by Batch 0 (online sequentiality proof) and Batch 1 reuse decomposition; existing metrics.jsonl couldn't establish either. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:18:52 +08:00
Gahow Wang	ac6534c3ff	Cleanup: retire dead PUSH path + extract hybrid picker - Delete unreachable best_needs_push block in _handle_combined and the four orphaned helpers (_handle_cached_prefill_offload, _handle_direct_read_offload, _query_bootstrap_hit, _get_bootstrap_client). Their only caller was the retired PUSH gate; see REPORT §3.9 errata for the rejected experiments (`cc6e562`, `4c583f2`). - Extract pick_instance_unified_hybrid as a pure function returning (chosen, idx, decision_dict). The decision dict carries the review #7 breakdown fields (decision, affinity_idx/chosen_idx, cache_hit/ratio, avg_num_requests, fallback_score, tie_break_used). - Add LMetric-fallback tie-breaker (primary score, then new_uncached, num_requests, round-robin) so new sessions don't all pin to inst 0 when BS=0 across the board. - Drop the lmetric-policy affinity write so --policy lmetric stays affinity-free per review #3. - Mark --max-offload-inflight / --offload-mode / --cache-gate-ratio / --decode-iteration-s as [DEPRECATED] in --help; flags remain accepted so scripts/bench.sh and legacy launchers don't break. - Revert uncommitted overload_factor 2.0->1.5 default; H7 sweep already rejected this knob (within noise). Future sweeps should go via CLI. Tests: add 6 hybrid-policy tests in tests/test_proxy_pick.py covering affinity-hit, overload break, low-cache fallback, tie-break rotation, lmetric purity, and breakdown field shape. 19/19 pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 10:46:57 +08:00
Gahow Wang	c843f2e3db	proxy: Settings dataclass + cache-ratio gate + P-pick offload penalty (B4, M2, M3, D5) - Replace mutable module constants (HEAVY_THRESHOLD/OVERLOAD_FACTOR/ MAX_OFFLOAD_INFLIGHT/PREFILL_THROUGHPUT/RDMA_OVERHEAD_S/ CACHE_CAPACITY_BLOCKS) with a Settings dataclass + SETTINGS singleton. __main__ now mutates SETTINGS so CLI overrides survive even when the module is imported as a library (e.g. by tests/) (D5). - Add --max-offload-inflight CLI flag (M3) and read it from SETTINGS. - Add --cache-gate-ratio CLI flag and a real gate before the cost-model branch: if cache_hit/input_length < ratio, mark cache_gate_REASON and fall back to colocated. cache_ratio is no longer a write-only field (B4). - P candidate selection penalises instances already running offloaded HEAVY prefills, so back-to-back HEAVY requests don't pile onto the same P (M2). - bench.sh forwards --max-offload-inflight / --cache-gate-ratio to the proxy. - Tests cover SETTINGS knobs + the heavy_threshold-driven P-offload penalty.	2026-05-23 21:11:17 +08:00
Gahow Wang	0701f84c00	tests: add minimal coverage for percentile + proxy routing (S1) - tests/test_metrics.py asserts the new linear-interp _percentile against hand-computed expected values (single value, two-value interpolation, endpoints, numpy-equivalent linear default, on-integer rank). - tests/test_proxy_pick.py exercises InstanceState LRU eviction and move-to-end on hit, plus session-affinity stickiness, the overload fallback, the active_p_offloads penalty, and lmetric scoring. The proxy is loaded by file path with stub fastapi/uvicorn/httpx modules so the suite runs without the FastAPI server deps installed. - pyproject.toml gets a hatchling wheel target and a [tool.pytest] section so `uv run --extra dev pytest` works out of the box.	2026-05-23 21:07:14 +08:00

7 Commits