The earlier conversation suggested agentic might "have no human think-time" and therefore live in a strict closed-loop regime. The user pushed back: tool calls also take time and might restore a chatbot-like buffer between turns. To resolve this, we go to the actual data. The previously-published per-record formatted trace only carries arrival timestamps, so an arrival-to-arrival diff conflates W_turn + T_external. The raw trace (/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/ 051315-051317-raw.jsonl on dash0) additionally carries request_end_time_ms, which lets us compute the pure inter-turn external gap T_external = next.request_ready_time_ms - prev.request_end_time_ms for each session's consecutive turn pair. Headline numbers (n = 783 k inter-turn gaps over 127 k multi-turn sessions): p25 = 0.69 s p50 = 1.6 s p75 = 8.6 s p90 = 44 s mean = 37 s (heavy long-tail; paused/abandoned sessions) 39 % of gaps < 1 s 67 % of gaps < 5 s 87 % of gaps < 30 s The bulk of the distribution is dominated by sub-second to a-few-seconds tool-call latencies. Under any current scheduler (e.g. unified TTFT p90 = 7.3 s, lmetric 15.7 s), W_turn is already at or above the 75th percentile of T_external, so dispatch coupling is the dominant regime for the majority of turns — not a corner case. This corrects the earlier conflated arrival-to-arrival "median gap 11 s" figure (which folded W_turn into T_external). The true T_external median is 1.6 s. Adds: - scripts/compute_inter_turn_gap_remote.py: dash0-side aggregator - analysis/characterization/data/agentic_inter_turn_gap.json: 500-point CDF cache + summary stats, scp'd back from dash0 - scripts/plot_inter_turn_gap.py: local figure renderer - figs/f3a_inter_turn_gap.png: log-x CDF with p25/p50/p75/p90 anchors and unified/lmetric TTFT p90 reference lines Next step (per user): pull a chatbot trace through the same pipeline and compare distributions side by side; this will let §2.3 stop hand-waving about "no think-time" and instead present the regime split empirically. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
91 lines
3.1 KiB
Python
91 lines
3.1 KiB
Python
#!/usr/bin/env python3
|
||
"""Plot the production trace inter-turn gap distribution.
|
||
|
||
Inter-turn gap = next_turn.request_ready_time_ms - prev_turn.request_end_time_ms
|
||
(i.e. T_external: the wall-clock between a turn finishing and the next turn
|
||
of the same session arriving). This is the tool-call latency + any pause,
|
||
not the conflated arrival-to-arrival interval.
|
||
|
||
Data is pre-computed on dash0 by scripts/agentic_gap.py and cached under
|
||
``analysis/characterization/data/agentic_inter_turn_gap.json`` (~23 KB).
|
||
"""
|
||
from __future__ import annotations
|
||
|
||
import argparse
|
||
import json
|
||
from pathlib import Path
|
||
|
||
import matplotlib.pyplot as plt
|
||
import numpy as np
|
||
|
||
|
||
def load(cache_path: Path) -> tuple[np.ndarray, np.ndarray, dict]:
|
||
d = json.loads(cache_path.read_text())
|
||
samples = d["cdf_samples"]
|
||
xs = np.array([s["gap_s"] for s in samples])
|
||
ys = np.array([s["rank_pct"] for s in samples])
|
||
return xs, ys, d
|
||
|
||
|
||
def main() -> None:
|
||
parser = argparse.ArgumentParser()
|
||
parser.add_argument(
|
||
"--data",
|
||
default="analysis/characterization/data/agentic_inter_turn_gap.json",
|
||
)
|
||
parser.add_argument("--out", default="figs/f3a_inter_turn_gap.png")
|
||
args = parser.parse_args()
|
||
|
||
xs, ys, d = load(Path(args.data))
|
||
|
||
fig, ax = plt.subplots(figsize=(9, 5.2))
|
||
ax.plot(xs, ys, color="#1f77b4", lw=2.2,
|
||
label=f"agentic trace (n={d['n_gaps']:,} gaps, "
|
||
f"{d['n_sessions']:,} multi-turn sessions)")
|
||
|
||
p = d["stats_s"]
|
||
for pct, key in [(25, "p25"), (50, "p50"), (75, "p75"), (90, "p90")]:
|
||
v = p[key]
|
||
ax.scatter([v], [pct], color="#c44e52", s=55, zorder=5)
|
||
ax.annotate(f"p{pct} = {v:.2g}s",
|
||
xy=(v, pct), xytext=(8, -4),
|
||
textcoords="offset points",
|
||
fontsize=10, color="#7a1d1d")
|
||
|
||
# Reference vertical lines: scheduler W_turn (TTFT p90 from our window_1 runs)
|
||
refs = [
|
||
("lmetric TTFT p90 = 15.7s", 15.7, "#888"),
|
||
("unified TTFT p90 = 7.3s", 7.3, "#444"),
|
||
]
|
||
for label, v, color in refs:
|
||
ax.axvline(v, color=color, ls=":", lw=1.2, alpha=0.85)
|
||
ax.text(v * 1.05, 8, label, fontsize=8.5, color=color,
|
||
rotation=90, va="bottom")
|
||
|
||
ax.set_xscale("log")
|
||
ax.set_xlim(0.05, 2000)
|
||
ax.set_ylim(0, 102)
|
||
ax.set_xlabel(
|
||
"Inter-turn gap T_external (s, log scale) "
|
||
"— next_turn.ready − prev_turn.end"
|
||
)
|
||
ax.set_ylabel("Cumulative % of inter-turn intervals")
|
||
ax.set_title(
|
||
"Inter-turn external gap CDF — production agentic trace\n"
|
||
f"median T_external = {p['p50']:.2g}s; "
|
||
f"{int(d['fraction_below']['1.0s']*100)}% gaps < 1s, "
|
||
f"{int(d['fraction_below']['5.0s']*100)}% < 5s, "
|
||
f"{int(d['fraction_below']['30.0s']*100)}% < 30s"
|
||
)
|
||
ax.grid(True, which="both", alpha=0.3)
|
||
ax.legend(loc="lower right", framealpha=0.92, fontsize=9)
|
||
|
||
out_path = Path(args.out)
|
||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||
fig.savefig(out_path, dpi=150, bbox_inches="tight")
|
||
print(f"wrote {out_path}")
|
||
|
||
|
||
if __name__ == "__main__":
|
||
main()
|