agentic-kvc

gahow/agentic-kvc

Fork 0

Commit Graph

Author	SHA1	Message	Date
Gahow Wang	0e82612100	Fix B3 analysis bugs from subagent audit (median + percentile + sweep) Three fixes from the B3 audit: 1) joined_analysis.hotspot_index used sorted[n//2] as median, which returns the ~60th percentile for n=8 (even-length). Systematically under-states the hotspot index. Recomputed values: lmetric 2.238 -> 2.253 (+0.7%) load_only 1.140 -> 1.294 (+13.5%) sticky 2.349 -> 2.728 (+16.1%) unified 3.350 -> 3.667 (+9.5%) capped 1.937 -> 2.020 (+4.3%) Qualitative ranking preserved; "capped only modestly reduces hotspot" story holds with ~10% drop instead of the previously reported 13%. Added test_hotspot_index_uses_true_median_for_even_n to lock in the fix. 2) b3_analyze.sh's pct() helper used floor-indexed percentile sorted[int(p*(n-1))], inconsistent with metrics._percentile and joined_analysis._percentile which both use linear interpolation. Now matches. 3) b3_sweep.sh's capped step called run_policy "capped", but the proxy's argparse has no "capped" choice, so the hot-sweep variant would have crashed on this step. The actual capped data was produced via b3_isolated_policy.sh with --policy lmetric. Replace the broken inline call with an explicit launch_proxy lmetric + inline replayer block so the sweep script matches the data path it documents. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 01:08:37 +08:00
Gahow Wang	763355b825	A5 fix: worker-id resolution and vLLM cmpl- rid stripping Smoke validation on dash0 surfaced three real bugs that broke interference and failure-attribution labels end-to-end: 1. endpoint_url in metrics is the proxy URL (e.g. http://h:9200); the vLLM worker URL lives in breakdown's routed_to. The interference index and label path were taking endpoint_url first, so every request looked routed to a non-existent worker and the overlap counter stayed at zero. 2. _normalize_worker hard-coded base port 8000, so a smoke run on port 9100 resolved to engine_1100 instead of engine_0. Added a --worker-map URL=engine_id CLI flag and _resolve_worker() that prefers the explicit map and falls back to the heuristic. 3. vLLM rewrites the per-step rid as cmpl-<proxy_id>-<i>-<hash>, so the str equality check between per_req rid and our proxy request_id never matched -> every prefill step looked like "other request prefill", which would have flipped overlap to 100%. Added _vllm_rid_matches() that strips the cmpl-/chatcmpl- prefix. After the fix, the same smoke run reports interference_index = 22.9 across 24 overlap / 6 clean requests on a single instance, which is the expected shape for serial dispatch into a cold engine. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:47:23 +08:00
Gahow Wang	25445e3d18	A5: joined analysis with reuse decomp, interference, hot-spot, labels New analysis/characterization/joined_analysis.py joins replayer metrics.jsonl + proxy breakdown.json + worker_state.jsonl by request_id, plus engine_*.jsonl by worker_id, and emits: - joined.jsonl per-request merged record - reuse_decomposition.json real intra/cross/shared classification using session_id + hash_ids + cached_tokens - interference_index.json TPOT_p90(same-worker prefill overlap) / TPOT_p90(clean), per Batch 2 - hotspot_index.json max/median worker TTFT-p90, per Batch 3 - failure_label.jsonl per-slow-request cause label, per Batch 5 - failure_breakdown.json label histogram - window_summary.json SRR warmup/steady/drain aggregates Closes the analyzer side of Phase A; replaces the status: unavailable placeholders the existing scaffold emits when join sources are missing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:19:33 +08:00

Author

SHA1

Message

Date

Gahow Wang

0e82612100

Fix B3 analysis bugs from subagent audit (median + percentile + sweep)

Three fixes from the B3 audit:

1) joined_analysis.hotspot_index used sorted[n//2] as median, which
   returns the ~60th percentile for n=8 (even-length). Systematically
   under-states the hotspot index. Recomputed values:
       lmetric   2.238 -> 2.253  (+0.7%)
       load_only 1.140 -> 1.294  (+13.5%)
       sticky    2.349 -> 2.728  (+16.1%)
       unified   3.350 -> 3.667  (+9.5%)
       capped    1.937 -> 2.020  (+4.3%)
   Qualitative ranking preserved; "capped only modestly reduces hotspot"
   story holds with ~10% drop instead of the previously reported 13%.
   Added test_hotspot_index_uses_true_median_for_even_n to lock in the
   fix.

2) b3_analyze.sh's pct() helper used floor-indexed percentile
   sorted[int(p*(n-1))], inconsistent with metrics._percentile and
   joined_analysis._percentile which both use linear interpolation.
   Now matches.

3) b3_sweep.sh's capped step called run_policy "capped", but the
   proxy's argparse has no "capped" choice, so the hot-sweep variant
   would have crashed on this step. The actual capped data was
   produced via b3_isolated_policy.sh with --policy lmetric. Replace
   the broken inline call with an explicit launch_proxy lmetric +
   inline replayer block so the sweep script matches the data path
   it documents.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-26 01:08:37 +08:00

Gahow Wang

763355b825

A5 fix: worker-id resolution and vLLM cmpl- rid stripping

Smoke validation on dash0 surfaced three real bugs that broke
interference and failure-attribution labels end-to-end:

1. endpoint_url in metrics is the proxy URL (e.g. http://h:9200);
   the vLLM worker URL lives in breakdown's routed_to. The
   interference index and label path were taking endpoint_url first,
   so every request looked routed to a non-existent worker and the
   overlap counter stayed at zero.
2. _normalize_worker hard-coded base port 8000, so a smoke run on
   port 9100 resolved to engine_1100 instead of engine_0. Added a
   --worker-map URL=engine_id CLI flag and _resolve_worker() that
   prefers the explicit map and falls back to the heuristic.
3. vLLM rewrites the per-step rid as cmpl-<proxy_id>-<i>-<hash>, so
   the str equality check between per_req rid and our proxy
   request_id never matched -> every prefill step looked like
   "other request prefill", which would have flipped overlap to
   100%. Added _vllm_rid_matches() that strips the cmpl-/chatcmpl-
   prefix.

After the fix, the same smoke run reports interference_index = 22.9
across 24 overlap / 6 clean requests on a single instance, which is
the expected shape for serial dispatch into a cold engine.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 16:47:23 +08:00

Gahow Wang

25445e3d18

A5: joined analysis with reuse decomp, interference, hot-spot, labels

New analysis/characterization/joined_analysis.py joins replayer
metrics.jsonl + proxy breakdown.json + worker_state.jsonl by
request_id, plus engine_*.jsonl by worker_id, and emits:

- joined.jsonl              per-request merged record
- reuse_decomposition.json  real intra/cross/shared classification
                            using session_id + hash_ids + cached_tokens
- interference_index.json   TPOT_p90(same-worker prefill overlap)
                            / TPOT_p90(clean), per Batch 2
- hotspot_index.json        max/median worker TTFT-p90, per Batch 3
- failure_label.jsonl       per-slow-request cause label, per Batch 5
- failure_breakdown.json    label histogram
- window_summary.json       SRR warmup/steady/drain aggregates

Closes the analyzer side of Phase A; replaces the
status: unavailable placeholders the existing scaffold emits when
join sources are missing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 16:19:33 +08:00

3 Commits