4 Commits

Author SHA1 Message Date
645b067dd4 Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps
Critical:
- cache_aware_proxy: _handle_pd_sep leaked p_inst.num_requests (never
  decremented) and never managed d_inst.num_requests; fix media_type
  from application/json to text/event-stream for SSE stream

High:
- b3_sweep/b3_isolated_policy/b3_analyze: replace hardcoded
  /home/admin/cpfs/wjh/ ROOT with script-relative $(dirname "$0")/..
- b3_analyze: replace hardcoded 8-port WORKER_MAP with dynamic
  generation from BASE_PORT and N_INSTANCES

Medium:
- analyze_breakdown: warn on stderr when records are skipped (was silent)
- deploy_vllm_patches: fail-fast on SSH/SCP errors instead of
  continuing with empty VENV_SITE
- pyproject.toml: declare fastapi and uvicorn as runtime dependencies
- launch_elastic_p2p: kill EngineCore and proxy in trap handler to
  prevent GPU memory leaks on exit
2026-05-26 15:54:55 +08:00
0e82612100 Fix B3 analysis bugs from subagent audit (median + percentile + sweep)
Three fixes from the B3 audit:

1) joined_analysis.hotspot_index used sorted[n//2] as median, which
   returns the ~60th percentile for n=8 (even-length). Systematically
   under-states the hotspot index. Recomputed values:
       lmetric   2.238 -> 2.253  (+0.7%)
       load_only 1.140 -> 1.294  (+13.5%)
       sticky    2.349 -> 2.728  (+16.1%)
       unified   3.350 -> 3.667  (+9.5%)
       capped    1.937 -> 2.020  (+4.3%)
   Qualitative ranking preserved; "capped only modestly reduces hotspot"
   story holds with ~10% drop instead of the previously reported 13%.
   Added test_hotspot_index_uses_true_median_for_even_n to lock in the
   fix.

2) b3_analyze.sh's pct() helper used floor-indexed percentile
   sorted[int(p*(n-1))], inconsistent with metrics._percentile and
   joined_analysis._percentile which both use linear interpolation.
   Now matches.

3) b3_sweep.sh's capped step called run_policy "capped", but the
   proxy's argparse has no "capped" choice, so the hot-sweep variant
   would have crashed on this step. The actual capped data was
   produced via b3_isolated_policy.sh with --policy lmetric. Replace
   the broken inline call with an explicit launch_proxy lmetric +
   inline replayer block so the sweep script matches the data path
   it documents.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 01:08:37 +08:00
df3249925b B3 analyze: prefer per-policy engine_state over slicing shared dir
The hot-sweep variant of B3 writes one shared engine_state across
all policies; the isolated variant writes per-policy. Previously
slice_engine_state.py was called unconditionally and would
overwrite an isolated policy's real data with an empty slice (the
isolated policy's run-window doesn't overlap with the shared dir's
contents).

Now we check the policy directory's engine_state for any non-empty
engine_*.jsonl first; if present, use it directly; else slice from
the shared one as before.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 22:19:43 +08:00
92db1c4370 B3 post-run helpers: engine_state slicer + per-policy aggregator
scripts/slice_engine_state.py filters a shared engine_*.jsonl by a
[t_start_unix, t_end_unix] window. Needed because the patched
scheduler appends to one file per engine across the whole sweep;
per-policy analysis requires the per-policy slice.

scripts/b3_analyze.sh drives the slice + joined_analysis loop for
every policy directory in a completed sweep, then aggregates one row
per policy (latency percentiles, APC, interference_index,
hotspot_index, reuse fractions, failure-cause counts) into
b3_policy_comparison.json.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 18:51:33 +08:00