agentic-kvc

Author	SHA1	Message	Date
Gahow Wang	075f5bbc22	trace: time_to_parent_chat annotation + thinktime trace variants Adds `scripts/add_ttp_streaming.py`: one streaming pass over the 522 GB raw glm5.1 trace to build {chat_id: (ready_ms, end_ms)} and join the real inter-turn gap onto the COMPLETE formatted trace (no early-exit, low memory). time_to_parent_chat = (this.request_ready_time_ms - parent.request_end_time_ms)/1000 = tool-exec + agent think-time; turn-1 -> null, negatives clamped to 0. Ships the two ttp-annotated sampled traces (same anonymized data + one timestamp-derived field; regenerated via sample_trace.py --seed 42 so they are row-for-row identical to the non-ttp variants on all 9 shared fields): traces/w600_r0.0015_st30_ttp.jsonl (1214 reqs) traces/w600_r0.0015_st30_first600s_ttp.jsonl (807 reqs) They are needed to replay with --dispatch-mode thinktime without the non-redistributable raw trace, so they are added to the .gitignore allowlist. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 20:58:49 +08:00
Gahow Wang	71b0747b3b	600s-truncated trace + LPWL 5-policy results traces/w600_r0.0015_st30_first600s.jsonl: first-600s cut of the shipped w600 trace (807 reqs, 274 sessions, all turn-1s + early later-turns; theoretical APC ceiling ~70% vs 80% full). Faster iteration (~18 min/arm) but a colder, lower-locality regime; whitelisted alongside the parent anonymized trace. analysis/lpwl_5policy_600s.md: LPWL vs LMetric/sticky/unified/unified+A+B on the 600s trace (dash1 8xH20, cold APC, n=1). LPWL is overall best with zero knobs — TTFT p90 7983ms vs tuned A+B 11562 (-31%), E2E p90 -16%, best request balance; APC 0.648 (emergent affinity, far above LMetric 0.507); only loss is E2E p99 from heavy-class decode concentration. Demonstrates anti-overfit: A+B was tuned on full w600 yet is beaten by the knob-free policy on this regime. Includes the run_5policy_600s.sh repro driver. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 16:08:35 +08:00
Gahow Wang	8a876e90d1	traces/README: clarify w600 is the session-start window, not span The trace actually spans ~2912 s (~48.5 min): all 274 sessions START within the 600 s --window-seconds window, but their later multi-turn requests (34% of rows, inter-turn gaps up to ~700 s) extend well past t=600 s. Remove the misleading "~600 s span". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 12:04:14 +08:00
Gahow Wang	08c3cf48aa	Ship anonymized benchmark trace w600_r0.0015_st30 + provenance Whitelist the sampled replay trace (1214 reqs / 274 sessions / ~600 s) past the traces/ ignore so the repo is runnable without dash0 access. Metadata only (token counts, opaque KV-block hashes, timing, session structure) — no prompts/outputs/PII. traces/README documents schema, provenance (sampled from the internal GLM-5.1 production trace via scripts/sample_trace.py), and the regeneration command. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 11:54:43 +08:00

4 Commits