Critical:
- cache_aware_proxy: _handle_pd_sep leaked p_inst.num_requests (never
decremented) and never managed d_inst.num_requests; fix media_type
from application/json to text/event-stream for SSE stream
High:
- b3_sweep/b3_isolated_policy/b3_analyze: replace hardcoded
/home/admin/cpfs/wjh/ ROOT with script-relative $(dirname "$0")/..
- b3_analyze: replace hardcoded 8-port WORKER_MAP with dynamic
generation from BASE_PORT and N_INSTANCES
Medium:
- analyze_breakdown: warn on stderr when records are skipped (was silent)
- deploy_vllm_patches: fail-fast on SSH/SCP errors instead of
continuing with empty VENV_SITE
- pyproject.toml: declare fastapi and uvicorn as runtime dependencies
- launch_elastic_p2p: kill EngineCore and proxy in trap handler to
prevent GPU memory leaks on exit
Three fixes from the B3 audit:
1) joined_analysis.hotspot_index used sorted[n//2] as median, which
returns the ~60th percentile for n=8 (even-length). Systematically
under-states the hotspot index. Recomputed values:
lmetric 2.238 -> 2.253 (+0.7%)
load_only 1.140 -> 1.294 (+13.5%)
sticky 2.349 -> 2.728 (+16.1%)
unified 3.350 -> 3.667 (+9.5%)
capped 1.937 -> 2.020 (+4.3%)
Qualitative ranking preserved; "capped only modestly reduces hotspot"
story holds with ~10% drop instead of the previously reported 13%.
Added test_hotspot_index_uses_true_median_for_even_n to lock in the
fix.
2) b3_analyze.sh's pct() helper used floor-indexed percentile
sorted[int(p*(n-1))], inconsistent with metrics._percentile and
joined_analysis._percentile which both use linear interpolation.
Now matches.
3) b3_sweep.sh's capped step called run_policy "capped", but the
proxy's argparse has no "capped" choice, so the hot-sweep variant
would have crashed on this step. The actual capped data was
produced via b3_isolated_policy.sh with --policy lmetric. Replace
the broken inline call with an explicit launch_proxy lmetric +
inline replayer block so the sweep script matches the data path
it documents.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The hot-sweep variant of B3 writes one shared engine_state across
all policies; the isolated variant writes per-policy. Previously
slice_engine_state.py was called unconditionally and would
overwrite an isolated policy's real data with an empty slice (the
isolated policy's run-window doesn't overlap with the shared dir's
contents).
Now we check the policy directory's engine_state for any non-empty
engine_*.jsonl first; if present, use it directly; else slice from
the shared one as before.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scripts/slice_engine_state.py filters a shared engine_*.jsonl by a
[t_start_unix, t_end_unix] window. Needed because the patched
scheduler appends to one file per engine across the whole sweep;
per-policy analysis requires the per-policy slice.
scripts/b3_analyze.sh drives the slice + joined_analysis loop for
every policy directory in a completed sweep, then aggregates one row
per policy (latency percentiles, APC, interference_index,
hotspot_index, reuse fractions, failure-cause counts) into
b3_policy_comparison.json.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>