Fix B3 analysis bugs from subagent audit (median + percentile + sweep)

Three fixes from the B3 audit: 1) joined_analysis.hotspot_index used sorted[n//2] as median, which returns the ~60th percentile for n=8 (even-length). Systematically under-states the hotspot index. Recomputed values: lmetric 2.238 -> 2.253 (+0.7%) load_only 1.140 -> 1.294 (+13.5%) sticky 2.349 -> 2.728 (+16.1%) unified 3.350 -> 3.667 (+9.5%) capped 1.937 -> 2.020 (+4.3%) Qualitative ranking preserved; "capped only modestly reduces hotspot" story holds with ~10% drop instead of the previously reported 13%. Added test_hotspot_index_uses_true_median_for_even_n to lock in the fix. 2) b3_analyze.sh's pct() helper used floor-indexed percentile sorted[int(p*(n-1))], inconsistent with metrics._percentile and joined_analysis._percentile which both use linear interpolation. Now matches. 3) b3_sweep.sh's capped step called run_policy "capped", but the proxy's argparse has no "capped" choice, so the hot-sweep variant would have crashed on this step. The actual capped data was produced via b3_isolated_policy.sh with --policy lmetric. Replace the broken inline call with an explicit launch_proxy lmetric + inline replayer block so the sweep script matches the data path it documents. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 01:08:37 +08:00
parent 8ac41a8684
commit 0e82612100
4 changed files with 64 additions and 4 deletions
--- a/tests/test_joined_analysis.py
+++ b/tests/test_joined_analysis.py
@@ -131,6 +131,26 @@ def test_hotspot_index_max_over_median_p90():
    assert out["hotspot_index_ttft_p90"] > 5.0


+def test_hotspot_index_uses_true_median_for_even_n():
+    """8 workers, sorted TTFT p90 [1,2,3,4,5,6,7,80].
+    True median = (4+5)/2 = 4.5; hotspot = 80/4.5 ≈ 17.78.
+    Previous buggy implementation used sorted[4] = 5, giving 80/5 = 16.0.
+    """
+    rows = []
+    ttfts = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 80.0]
+    for i, t in enumerate(ttfts):
+        for _ in range(10):
+            rows.append({
+                "request_id": f"x{i}", "routed_to": f"http://h:800{i}",
+                "endpoint_url": f"http://h:800{i}",
+                "ttft_s": t, "latency_s": 1.0, "error": None,
+            })
+    out = hotspot_index(rows)
+    assert out["status"] == "supported"
+    idx = out["hotspot_index_ttft_p90"]
+    assert abs(idx - 80.0 / 4.5) < 1e-6, f"expected ~17.78, got {idx}"
+
+
 def test_label_slow_requests_flags_overlap_and_hot_worker():
    metrics = [
        _mk_metric("slow_overlap", ttft_s=10.0,