2.4 KiB
2.4 KiB
Sources
Checked on 2026-06-24.
Local Repositories
| Source | Local path | Commit / HEAD | Notes |
|---|---|---|---|
| Qwen Bailian usage traces | /home/gahow/phd/qwen-bailian-usagetraces-anon |
5f7439c51ec248a0c585f7d90a41a6f57773b912 |
Primary RS0 input is qwen_coder_blksz_16.jsonl. |
| Frontier | /tmp/toc-llm-sim-research/Frontier |
d9cfeb6d8791fbf2f295dd9744c56a666171776e |
Primary RS1 simulator candidate. |
| Vidur | /tmp/toc-llm-sim-research/vidur |
8383d2935bc62723a212090baa9f98ada206fc14 |
Baseline simulator candidate for arrival and length replay. |
| AIConfigurator | /tmp/toc-llm-sim-research/aiconfigurator |
e46ece7510e727fafefb8212e5846172145a30ea |
Configuration search reference, not per-request faithful replay. |
All four local repositories were present when RS0 was generated. No external repository was cloned for RS0.
Frontier Findings
- Frontier trace replay reads CSV columns
arrived_at,num_prefill_tokens, andnum_decode_tokens. - It also parses optional
session_idandblock_hash_ids;block_hash_idscan be|separated, matchingexamples/fixtures/prefix_cache_shared_session_trace.csv. - Frontier's trace replay generator can clip prefill tokens when total tokens
exceed
trace_request_generator_config_max_tokens. ReplayServe fixtures hard fail before Frontier sees the trace, so the RS1 smoke cannot silently clip. - Frontier has a built-in
Qwen/Qwen3-32Bmodel config. - Frontier has A800 network profiles:
data/profiling/network/a800_dgx/anddata/profiling/network/a800_pairwise_nvlink/. - Current public A800 compute profiles in this checkout include Llama2-7B and
Qwen3 MoE / Qwen3-Next reduced variants, but no dense
Qwen/Qwen3-32Bcompute profile. RS1 Qwen3-32B A800 latency and throughput results are only plumbing smoke until matching compute profiles or calibration data are added.
Qwen Trace Findings
- The released JSONL rows contain
chat_id,parent_chat_id,timestamp,input_length,output_length,type,turn, andhash_ids. - The trace README documents
hash_idsas salted SipHash blocks with 16 tokens per block. - The released input lengths and hashes are already after the model-specific chat template has been applied. ReplayServe does not apply chat templates.
- The final input block can be padded. ReplayServe records per-block token counts in the sidecar so partial final blocks can be accounted for by true token count.