46 lines
2.4 KiB
Markdown
46 lines
2.4 KiB
Markdown
# Sources
|
|
|
|
Checked on 2026-06-24.
|
|
|
|
## Local Repositories
|
|
|
|
| Source | Local path | Commit / HEAD | Notes |
|
|
|---|---|---|---|
|
|
| Qwen Bailian usage traces | `/home/gahow/phd/qwen-bailian-usagetraces-anon` | `5f7439c51ec248a0c585f7d90a41a6f57773b912` | Primary RS0 input is `qwen_coder_blksz_16.jsonl`. |
|
|
| Frontier | `/tmp/toc-llm-sim-research/Frontier` | `d9cfeb6d8791fbf2f295dd9744c56a666171776e` | Primary RS1 simulator candidate. |
|
|
| Vidur | `/tmp/toc-llm-sim-research/vidur` | `8383d2935bc62723a212090baa9f98ada206fc14` | Baseline simulator candidate for arrival and length replay. |
|
|
| AIConfigurator | `/tmp/toc-llm-sim-research/aiconfigurator` | `e46ece7510e727fafefb8212e5846172145a30ea` | Configuration search reference, not per-request faithful replay. |
|
|
|
|
All four local repositories were present when RS0 was generated. No external
|
|
repository was cloned for RS0.
|
|
|
|
## Frontier Findings
|
|
|
|
- Frontier trace replay reads CSV columns `arrived_at`, `num_prefill_tokens`,
|
|
and `num_decode_tokens`.
|
|
- It also parses optional `session_id` and `block_hash_ids`; `block_hash_ids`
|
|
can be `|` separated, matching `examples/fixtures/prefix_cache_shared_session_trace.csv`.
|
|
- Frontier's trace replay generator can clip prefill tokens when total tokens
|
|
exceed `trace_request_generator_config_max_tokens`. ReplayServe fixtures hard
|
|
fail before Frontier sees the trace, so the RS1 smoke cannot silently clip.
|
|
- Frontier has a built-in `Qwen/Qwen3-32B` model config.
|
|
- Frontier has A800 network profiles:
|
|
`data/profiling/network/a800_dgx/` and
|
|
`data/profiling/network/a800_pairwise_nvlink/`.
|
|
- Current public A800 compute profiles in this checkout include Llama2-7B and
|
|
Qwen3 MoE / Qwen3-Next reduced variants, but no dense `Qwen/Qwen3-32B`
|
|
compute profile. RS1 Qwen3-32B A800 latency and throughput results are only
|
|
plumbing smoke until matching compute profiles or calibration data are added.
|
|
|
|
## Qwen Trace Findings
|
|
|
|
- The released JSONL rows contain `chat_id`, `parent_chat_id`, `timestamp`,
|
|
`input_length`, `output_length`, `type`, `turn`, and `hash_ids`.
|
|
- The trace README documents `hash_ids` as salted SipHash blocks with 16 tokens
|
|
per block.
|
|
- The released input lengths and hashes are already after the model-specific
|
|
chat template has been applied. ReplayServe does not apply chat templates.
|
|
- The final input block can be padded. ReplayServe records per-block token
|
|
counts in the sidecar so partial final blocks can be accounted for by true
|
|
token count.
|