b9f324f2e689461f6997b7399abfdbb8e0e12edc
The first B2 run produced metrics with ttft_s=null/tpot_s=null for every decode request because the OpenAI-style payload did not set return_token_ids: true, and the parser only inspected choices[0].token_ids. With token_ids missing the loop skipped every chunk, so no per-token timestamps were captured and the aggregator returned interference_index=null on all 10 cells. Fix: - send return_token_ids: true in the payload (matches replayer.replay) - also accept text-delta chunks as token signals (fallback for servers that drop token_ids despite the flag) vLLM engine_state was fine; only the load-gen metric capture was broken. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%