Adds `scripts/add_ttp_streaming.py`: one streaming pass over the 522 GB raw
glm5.1 trace to build {chat_id: (ready_ms, end_ms)} and join the real
inter-turn gap onto the COMPLETE formatted trace (no early-exit, low memory).
time_to_parent_chat = (this.request_ready_time_ms - parent.request_end_time_ms)/1000
= tool-exec + agent think-time; turn-1 -> null, negatives clamped to 0.
Ships the two ttp-annotated sampled traces (same anonymized data + one
timestamp-derived field; regenerated via sample_trace.py --seed 42 so they are
row-for-row identical to the non-ttp variants on all 9 shared fields):
traces/w600_r0.0015_st30_ttp.jsonl (1214 reqs)
traces/w600_r0.0015_st30_first600s_ttp.jsonl (807 reqs)
They are needed to replay with --dispatch-mode thinktime without the
non-redistributable raw trace, so they are added to the .gitignore allowlist.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Benchmark trace
w600_r0.0015_st30.jsonl
The primary replay trace for the routing / connector experiments (1214 requests, 274 sessions). One JSON object per request:
{"chat_id": 1237198, "parent_chat_id": -1, "timestamp": 0.0,
"input_length": 8228, "output_length": 21, "type": "coder", "turn": 1,
"hash_ids": [12292995, ...], "session_id": "1237198"}
| field | meaning |
|---|---|
input_length / output_length |
token counts only |
hash_ids |
opaque integer KV-block hashes — shared ids ⇒ shared prefix (drives prefix-cache reuse in replay) |
timestamp |
arrival offset (s) from trace start |
turn / parent_chat_id / session_id |
multi-turn session structure |
No cleartext. There are no prompts, no model outputs, and no PII — only
token counts, opaque block hashes, timing, and session structure. The replayer
synthesizes dummy token sequences consistent with hash_ids so prefix-cache
hit rates match the original workload.
Provenance
Sampled from the internal Alibaba GLM-5.1-formatted production trace
(~/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl on dash0, ~2.1 M
requests, 2 h) — not redistributable; only this anonymized sample is shipped.
The filename encodes the sampling params: w=window-seconds, r=sample-ratio,
st=max-single-turn-ratio.
w600 is the 600 s window of session start times, not the trace duration.
The sampler keeps every session whose first request falls in a 600 s window,
then includes all of that session's turns. Because agentic sessions are
long-lived multi-turn (inter-turn gaps up to ~700 s), the actual trace spans
~2912 s (~48.5 min) even though all 274 sessions start within the first
598 s; 34 % of requests are later turns occurring after t=600 s.
Regenerate (requires the dash0 source):
python scripts/sample_trace.py \
--input ~/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl \
--output traces/w600_r0.0015_st30.jsonl \
--window-seconds 600 --sample-ratio 0.0015 --max-single-turn-ratio 0.30 --seed 42
Replay
python -m replayer --trace traces/w600_r0.0015_st30.jsonl ...
See replayer/ and scripts/cache_aware_proxy.py.