Files
replaysim/docs/sources.md

2.4 KiB

Sources

Checked on 2026-06-24.

Local Repositories

Source Local path Commit / HEAD Notes
Qwen Bailian usage traces /home/gahow/phd/qwen-bailian-usagetraces-anon 5f7439c51ec248a0c585f7d90a41a6f57773b912 Primary RS0 input is qwen_coder_blksz_16.jsonl.
Frontier /tmp/toc-llm-sim-research/Frontier d9cfeb6d8791fbf2f295dd9744c56a666171776e Primary RS1 simulator candidate.
Vidur /tmp/toc-llm-sim-research/vidur 8383d2935bc62723a212090baa9f98ada206fc14 Baseline simulator candidate for arrival and length replay.
AIConfigurator /tmp/toc-llm-sim-research/aiconfigurator e46ece7510e727fafefb8212e5846172145a30ea Configuration search reference, not per-request faithful replay.

All four local repositories were present when RS0 was generated. No external repository was cloned for RS0.

Frontier Findings

  • Frontier trace replay reads CSV columns arrived_at, num_prefill_tokens, and num_decode_tokens.
  • It also parses optional session_id and block_hash_ids; block_hash_ids can be | separated, matching examples/fixtures/prefix_cache_shared_session_trace.csv.
  • Frontier's trace replay generator can clip prefill tokens when total tokens exceed trace_request_generator_config_max_tokens. ReplayServe fixtures hard fail before Frontier sees the trace, so the RS1 smoke cannot silently clip.
  • Frontier has a built-in Qwen/Qwen3-32B model config.
  • Frontier has A800 network profiles: data/profiling/network/a800_dgx/ and data/profiling/network/a800_pairwise_nvlink/.
  • Current public A800 compute profiles in this checkout include Llama2-7B and Qwen3 MoE / Qwen3-Next reduced variants, but no dense Qwen/Qwen3-32B compute profile. RS1 Qwen3-32B A800 latency and throughput results are only plumbing smoke until matching compute profiles or calibration data are added.

Qwen Trace Findings

  • The released JSONL rows contain chat_id, parent_chat_id, timestamp, input_length, output_length, type, turn, and hash_ids.
  • The trace README documents hash_ids as salted SipHash blocks with 16 tokens per block.
  • The released input lengths and hashes are already after the model-specific chat template has been applied. ReplayServe does not apply chat templates.
  • The final input block can be padded. ReplayServe records per-block token counts in the sidecar so partial final blocks can be accounted for by true token count.