trace: time_to_parent_chat annotation + thinktime trace variants

Adds `scripts/add_ttp_streaming.py`: one streaming pass over the 522 GB raw
glm5.1 trace to build {chat_id: (ready_ms, end_ms)} and join the real
inter-turn gap onto the COMPLETE formatted trace (no early-exit, low memory).
  time_to_parent_chat = (this.request_ready_time_ms - parent.request_end_time_ms)/1000
  = tool-exec + agent think-time; turn-1 -> null, negatives clamped to 0.

Ships the two ttp-annotated sampled traces (same anonymized data + one
timestamp-derived field; regenerated via sample_trace.py --seed 42 so they are
row-for-row identical to the non-ttp variants on all 9 shared fields):
  traces/w600_r0.0015_st30_ttp.jsonl          (1214 reqs)
  traces/w600_r0.0015_st30_first600s_ttp.jsonl (807 reqs)
They are needed to replay with --dispatch-mode thinktime without the
non-redistributable raw trace, so they are added to the .gitignore allowlist.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-05-30 20:58:49 +08:00
parent 8a6b22c11c
commit 075f5bbc22
4 changed files with 2157 additions and 0 deletions

3
.gitignore vendored
View File

@@ -11,3 +11,6 @@ traces/*
.claude/
# third_party/vllm tracked in git for patch management
!traces/w600_r0.0015_st30_first600s.jsonl
# + time_to_parent_chat annotation (for --dispatch-mode thinktime); same anon data
!traces/w600_r0.0015_st30_ttp.jsonl
!traces/w600_r0.0015_st30_first600s_ttp.jsonl