agentic-kvc

Go to file

Gahow Wang 763355b825 A5 fix: worker-id resolution and vLLM cmpl- rid stripping

Smoke validation on dash0 surfaced three real bugs that broke
interference and failure-attribution labels end-to-end:

1. endpoint_url in metrics is the proxy URL (e.g. http://h:9200);
   the vLLM worker URL lives in breakdown's routed_to. The
   interference index and label path were taking endpoint_url first,
   so every request looked routed to a non-existent worker and the
   overlap counter stayed at zero.
2. _normalize_worker hard-coded base port 8000, so a smoke run on
   port 9100 resolved to engine_1100 instead of engine_0. Added a
   --worker-map URL=engine_id CLI flag and _resolve_worker() that
   prefers the explicit map and falls back to the heuristic.
3. vLLM rewrites the per-step rid as cmpl-<proxy_id>-<i>-<hash>, so
   the str equality check between per_req rid and our proxy
   request_id never matched -> every prefill step looked like
   "other request prefill", which would have flipped overlap to
   100%. Added _vllm_rid_matches() that strips the cmpl-/chatcmpl-
   prefix.

After the fix, the same smoke run reports interference_index = 22.9
across 24 overlap / 6 clean requests on a single instance, which is
the expected shape for serial dispatch into a cold engine.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 16:47:23 +08:00