The trace actually spans ~2912 s (~48.5 min): all 274 sessions START within the 600 s --window-seconds window, but their later multi-turn requests (34% of rows, inter-turn gaps up to ~700 s) extend well past t=600 s. Remove the misleading "~600 s span". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
53 lines
2.2 KiB
Markdown
53 lines
2.2 KiB
Markdown
# Benchmark trace
|
|
|
|
## `w600_r0.0015_st30.jsonl`
|
|
|
|
The primary replay trace for the routing / connector experiments
|
|
(1214 requests, 274 sessions). One JSON object per request:
|
|
|
|
```json
|
|
{"chat_id": 1237198, "parent_chat_id": -1, "timestamp": 0.0,
|
|
"input_length": 8228, "output_length": 21, "type": "coder", "turn": 1,
|
|
"hash_ids": [12292995, ...], "session_id": "1237198"}
|
|
```
|
|
|
|
| field | meaning |
|
|
|---|---|
|
|
| `input_length` / `output_length` | token **counts** only |
|
|
| `hash_ids` | opaque integer KV-block hashes — shared ids ⇒ shared prefix (drives prefix-cache reuse in replay) |
|
|
| `timestamp` | arrival offset (s) from trace start |
|
|
| `turn` / `parent_chat_id` / `session_id` | multi-turn session structure |
|
|
|
|
**No cleartext.** There are no prompts, no model outputs, and no PII — only
|
|
token counts, opaque block hashes, timing, and session structure. The replayer
|
|
synthesizes dummy token sequences consistent with `hash_ids` so prefix-cache
|
|
hit rates match the original workload.
|
|
|
|
### Provenance
|
|
Sampled from the internal Alibaba GLM-5.1-formatted production trace
|
|
(`~/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl` on dash0, ~2.1 M
|
|
requests, 2 h) — not redistributable; only this anonymized sample is shipped.
|
|
The filename encodes the sampling params: `w`=window-seconds, `r`=sample-ratio,
|
|
`st`=max-single-turn-ratio.
|
|
|
|
**`w600` is the 600 s window of session _start_ times, not the trace duration.**
|
|
The sampler keeps every session whose *first* request falls in a 600 s window,
|
|
then includes *all* of that session's turns. Because agentic sessions are
|
|
long-lived multi-turn (inter-turn gaps up to ~700 s), the actual trace **spans
|
|
~2912 s (~48.5 min)** even though all 274 sessions start within the first
|
|
598 s; 34 % of requests are later turns occurring after t=600 s.
|
|
|
|
Regenerate (requires the dash0 source):
|
|
```bash
|
|
python scripts/sample_trace.py \
|
|
--input ~/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl \
|
|
--output traces/w600_r0.0015_st30.jsonl \
|
|
--window-seconds 600 --sample-ratio 0.0015 --max-single-turn-ratio 0.30 --seed 42
|
|
```
|
|
|
|
### Replay
|
|
```bash
|
|
python -m replayer --trace traces/w600_r0.0015_st30.jsonl ...
|
|
```
|
|
See `replayer/` and `scripts/cache_aware_proxy.py`.
|