Files
replaysim/docs/sources.md

46 lines
2.4 KiB
Markdown

# Sources
Checked on 2026-06-24.
## Local Repositories
| Source | Local path | Commit / HEAD | Notes |
|---|---|---|---|
| Qwen Bailian usage traces | `/home/gahow/phd/qwen-bailian-usagetraces-anon` | `5f7439c51ec248a0c585f7d90a41a6f57773b912` | Primary RS0 input is `qwen_coder_blksz_16.jsonl`. |
| Frontier | `/tmp/toc-llm-sim-research/Frontier` | `d9cfeb6d8791fbf2f295dd9744c56a666171776e` | Primary RS1 simulator candidate. |
| Vidur | `/tmp/toc-llm-sim-research/vidur` | `8383d2935bc62723a212090baa9f98ada206fc14` | Baseline simulator candidate for arrival and length replay. |
| AIConfigurator | `/tmp/toc-llm-sim-research/aiconfigurator` | `e46ece7510e727fafefb8212e5846172145a30ea` | Configuration search reference, not per-request faithful replay. |
All four local repositories were present when RS0 was generated. No external
repository was cloned for RS0.
## Frontier Findings
- Frontier trace replay reads CSV columns `arrived_at`, `num_prefill_tokens`,
and `num_decode_tokens`.
- It also parses optional `session_id` and `block_hash_ids`; `block_hash_ids`
can be `|` separated, matching `examples/fixtures/prefix_cache_shared_session_trace.csv`.
- Frontier's trace replay generator can clip prefill tokens when total tokens
exceed `trace_request_generator_config_max_tokens`. ReplayServe fixtures hard
fail before Frontier sees the trace, so the RS1 smoke cannot silently clip.
- Frontier has a built-in `Qwen/Qwen3-32B` model config.
- Frontier has A800 network profiles:
`data/profiling/network/a800_dgx/` and
`data/profiling/network/a800_pairwise_nvlink/`.
- Current public A800 compute profiles in this checkout include Llama2-7B and
Qwen3 MoE / Qwen3-Next reduced variants, but no dense `Qwen/Qwen3-32B`
compute profile. RS1 Qwen3-32B A800 latency and throughput results are only
plumbing smoke until matching compute profiles or calibration data are added.
## Qwen Trace Findings
- The released JSONL rows contain `chat_id`, `parent_chat_id`, `timestamp`,
`input_length`, `output_length`, `type`, `turn`, and `hash_ids`.
- The trace README documents `hash_ids` as salted SipHash blocks with 16 tokens
per block.
- The released input lengths and hashes are already after the model-specific
chat template has been applied. ReplayServe does not apply chat templates.
- The final input block can be padded. ReplayServe records per-block token
counts in the sidecar so partial final blocks can be accounted for by true
token count.