Files
agentic-kvc/scripts/plot_inter_turn_gap.py
Gahow Wang 41232f49d3 Measure inter-turn T_external on the raw production trace; add f3a CDF
The earlier conversation suggested agentic might "have no human think-time"
and therefore live in a strict closed-loop regime. The user pushed back:
tool calls also take time and might restore a chatbot-like buffer between
turns. To resolve this, we go to the actual data.

The previously-published per-record formatted trace only carries arrival
timestamps, so an arrival-to-arrival diff conflates W_turn + T_external.
The raw trace (/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/
051315-051317-raw.jsonl on dash0) additionally carries request_end_time_ms,
which lets us compute the pure inter-turn external gap
T_external = next.request_ready_time_ms - prev.request_end_time_ms
for each session's consecutive turn pair.

Headline numbers (n = 783 k inter-turn gaps over 127 k multi-turn sessions):

  p25  = 0.69 s
  p50  = 1.6  s
  p75  = 8.6  s
  p90  = 44   s
  mean = 37   s   (heavy long-tail; paused/abandoned sessions)

  39 % of gaps < 1 s
  67 % of gaps < 5 s
  87 % of gaps < 30 s

The bulk of the distribution is dominated by sub-second to a-few-seconds
tool-call latencies. Under any current scheduler (e.g. unified TTFT p90 =
7.3 s, lmetric 15.7 s), W_turn is already at or above the 75th percentile
of T_external, so dispatch coupling is the dominant regime for the
majority of turns — not a corner case.

This corrects the earlier conflated arrival-to-arrival "median gap 11 s"
figure (which folded W_turn into T_external). The true T_external median
is 1.6 s.

Adds:
- scripts/compute_inter_turn_gap_remote.py: dash0-side aggregator
- analysis/characterization/data/agentic_inter_turn_gap.json: 500-point
  CDF cache + summary stats, scp'd back from dash0
- scripts/plot_inter_turn_gap.py: local figure renderer
- figs/f3a_inter_turn_gap.png: log-x CDF with p25/p50/p75/p90 anchors and
  unified/lmetric TTFT p90 reference lines

Next step (per user): pull a chatbot trace through the same pipeline and
compare distributions side by side; this will let §2.3 stop hand-waving
about "no think-time" and instead present the regime split empirically.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 12:37:32 +08:00

91 lines
3.1 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#!/usr/bin/env python3
"""Plot the production trace inter-turn gap distribution.
Inter-turn gap = next_turn.request_ready_time_ms - prev_turn.request_end_time_ms
(i.e. T_external: the wall-clock between a turn finishing and the next turn
of the same session arriving). This is the tool-call latency + any pause,
not the conflated arrival-to-arrival interval.
Data is pre-computed on dash0 by scripts/agentic_gap.py and cached under
``analysis/characterization/data/agentic_inter_turn_gap.json`` (~23 KB).
"""
from __future__ import annotations
import argparse
import json
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
def load(cache_path: Path) -> tuple[np.ndarray, np.ndarray, dict]:
d = json.loads(cache_path.read_text())
samples = d["cdf_samples"]
xs = np.array([s["gap_s"] for s in samples])
ys = np.array([s["rank_pct"] for s in samples])
return xs, ys, d
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument(
"--data",
default="analysis/characterization/data/agentic_inter_turn_gap.json",
)
parser.add_argument("--out", default="figs/f3a_inter_turn_gap.png")
args = parser.parse_args()
xs, ys, d = load(Path(args.data))
fig, ax = plt.subplots(figsize=(9, 5.2))
ax.plot(xs, ys, color="#1f77b4", lw=2.2,
label=f"agentic trace (n={d['n_gaps']:,} gaps, "
f"{d['n_sessions']:,} multi-turn sessions)")
p = d["stats_s"]
for pct, key in [(25, "p25"), (50, "p50"), (75, "p75"), (90, "p90")]:
v = p[key]
ax.scatter([v], [pct], color="#c44e52", s=55, zorder=5)
ax.annotate(f"p{pct} = {v:.2g}s",
xy=(v, pct), xytext=(8, -4),
textcoords="offset points",
fontsize=10, color="#7a1d1d")
# Reference vertical lines: scheduler W_turn (TTFT p90 from our window_1 runs)
refs = [
("lmetric TTFT p90 = 15.7s", 15.7, "#888"),
("unified TTFT p90 = 7.3s", 7.3, "#444"),
]
for label, v, color in refs:
ax.axvline(v, color=color, ls=":", lw=1.2, alpha=0.85)
ax.text(v * 1.05, 8, label, fontsize=8.5, color=color,
rotation=90, va="bottom")
ax.set_xscale("log")
ax.set_xlim(0.05, 2000)
ax.set_ylim(0, 102)
ax.set_xlabel(
"Inter-turn gap T_external (s, log scale) "
"— next_turn.ready prev_turn.end"
)
ax.set_ylabel("Cumulative % of inter-turn intervals")
ax.set_title(
"Inter-turn external gap CDF — production agentic trace\n"
f"median T_external = {p['p50']:.2g}s; "
f"{int(d['fraction_below']['1.0s']*100)}% gaps < 1s, "
f"{int(d['fraction_below']['5.0s']*100)}% < 5s, "
f"{int(d['fraction_below']['30.0s']*100)}% < 30s"
)
ax.grid(True, which="both", alpha=0.3)
ax.legend(loc="lower right", framealpha=0.92, fontsize=9)
out_path = Path(args.out)
out_path.parent.mkdir(parents=True, exist_ok=True)
fig.savefig(out_path, dpi=150, bbox_inches="tight")
print(f"wrote {out_path}")
if __name__ == "__main__":
main()