User-requested comparison of inter-turn external gap distribution between the production agentic trace (Qwen3-Coder) and a production chatbot trace (qwen3-max chat). Both computed as T_external = next_turn.start_ms - prev_turn.end_ms on the same kind of pipeline (raw input + raw output join on request_id, session structure from the formatted trace's parent_chat_id chains). The chatbot trace lives as two files on dash0: input : bailian-trace/qwen-trace-260321-260327/qwen3-max-input-032309-032311.jsonl output : bailian-trace/qwen-trace-260321-260327/qwen3-max-output-032109-032711.jsonl The raw input has no session_id (uuid is per-record, user_id has only 4 distinct tenant values for 346 k requests). We recover session structure from the formatted file (qwen_chat_blksz_64_032309-032311.jsonl, which groups requests by parent_chat_id), matching each formatted record to a raw record by (timestamp, output_length) — prompt_token_num is anonymized to 0 in this trace, so we use generate_token_num as the join key. End time is derived from time_to_finish_token (ms duration) not the "time" string field (which is the log-write time, not request completion). Numbers (chatbot, 42 228 inter-turn gaps over 32 262 multi-turn sessions): p25 4.85 s p50 7.18 s p75 8.22 s p90 15.0 s p99 43 s 4% gaps < 1 s 29% < 5 s 78% < 10 s 98% < 30 s Compare to agentic (same metric, scripts/compute_inter_turn_gap_remote.py): p25 0.69 s p50 1.6 s p75 8.6 s p90 44 s p99 738 s 39% gaps < 1 s 67% < 5 s 77% < 10 s 87% < 30 s Distributions differ in shape, not just location: - Chatbot is tight, unimodal around 5–10 s (human interaction). - Agentic is bimodal: a sub-second autonomous tool-call mode (39 % < 1 s) plus a long-pause tail (13 % > 30 s, p99 = 738 s) for sessions where the operator steps away. - The sub-second tool-call mass is where dispatch coupling lives — those turns have W_turn ≫ T_external for any current scheduler. The earlier "chatbot has T_human ≈ 30 s" hand-wave was wrong empirically. The right framing for §2.3 is "agentic has a sub-second tool-call mode that chatbot doesn't", not "chatbot has think-time and agentic doesn't". Adds: - scripts/compute_inter_turn_gap_chatbot.py: dash0-side aggregator (raw input/output join + formatted alignment by ts + output_length) - analysis/characterization/data/chatbot_inter_turn_gap.json: CDF cache - scripts/plot_inter_turn_gap.py: overlays both curves on log-x Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
132 KiB
1317x793px
132 KiB
1317x793px