# RS3 Sweep Harness RS3 adds a reproducible Frontier sweep harness and a tiny smoke. This is not the full TP/EP/DP/config scan. ## Files - Config: `configs/rs3_tiny_sweep.json` - Runner: `tools/run_frontier_sweep.py` - Aggregator: `tools/aggregate_runs.py` - Tiny smoke outputs: `runs/rs3_tiny_smoke_20260624/` The output layout is: ```text runs///// command.txt env.txt run_manifest.json run_status.json stdout.log stderr.log exit_code.txt runtime_seconds.txt frontier_metrics/... postprocess_summary.json postprocess_summary.md runs//summary.csv runs//summary.md ``` ## Config Scheme `configs/rs3_tiny_sweep.json` is intentionally small JSON: - `suite_id`: output suite under `runs/`. - `sim`: simulator/mode name used in the run path. - `frontier`: Frontier checkout metadata. The tiny smoke points at patched scratch `/tmp/replayserve-frontier-rs1b`, not canonical Frontier. - `fixtures`: fixture names under `traces/fixtures/`. - `defaults`: fixed Frontier knobs shared by each config. - `configs`: named variants with optional `overrides`. The exposed Frontier knobs include: - parallelism: `attn_tensor_parallel_size`, `attn_data_parallel_size`, `moe_tensor_parallel_size`, `moe_expert_parallel_size`, `num_pipeline_stages`, `num_replicas` - scheduler: `batch_size_cap` / max-num-seqs equivalent, `max_tokens_in_batch` / max-batch-tokens equivalent, `block_size`, `enable_prefix_caching`, `enable_chunked_prefill`, `long_prefill_token_threshold` - fixed smoke context: model, device, network device, trace max tokens, memory-planner mode, GPU memory utilization, non-KV overhead, and dummy execution time For dense `Qwen/Qwen3-32B`, the EP-like knobs stay at `1` in the tiny smoke. They are present so later MoE configs can be represented without changing the harness schema. ## Run Commands From `/home/gahow/phd/replayserve`: ```bash python3 tools/run_frontier_sweep.py \ --config configs/rs3_tiny_sweep.json \ --suite-id rs3_tiny_smoke_20260624 python3 tools/aggregate_runs.py runs/rs3_tiny_smoke_20260624 ``` The runner refuses to replace an existing selected run directory unless `--force` is passed. Use `--dry-run` to emit commands/manifests without running Frontier, and `--only-config` / `--only-fixture` to narrow the selected matrix. ## Frontier Mode The RS3 tiny smoke uses: - `frontier.root=/tmp/replayserve-frontier-rs1b` - `frontier.mode=patched_scratch` - patch file `patches/frontier-vllm-v1-prefix-cache-chunked-prefill.patch` The canonical checkout `/tmp/toc-llm-sim-research/Frontier` remains clean and is not modified by the harness. `summary.csv` records `frontier_dirty=true` for the patched scratch because the local patch is applied there; that is expected. To run canonical mode for a safe config, copy the JSON config, set `frontier.root` to `/tmp/toc-llm-sim-research/Frontier`, change `sim`, and run a small selected config. Do not use canonical fixed `coder_2000` until the prefix-cache chunked-prefill bug is fixed upstream. ## Tiny Smoke Results Command: ```bash python3 tools/run_frontier_sweep.py \ --config configs/rs3_tiny_sweep.json \ --suite-id rs3_tiny_smoke_20260624 python3 tools/aggregate_runs.py runs/rs3_tiny_smoke_20260624 ``` Results: | config | status | runtime | prefix cache | chunked prefill | Frontier block hit ratio | ReplayServe token hit ratio | preemptions | |---|---:|---:|---:|---:|---:|---:|---:| | `fixed_prefix_on` | pass | 8s | on | on | `0.049486618` | `0.049562326` | 0 | | `prefix_cache_off` | pass | 7s | off | on | n/a | n/a | 0 | Aggregated files: - `runs/rs3_tiny_smoke_20260624/summary.csv` - `runs/rs3_tiny_smoke_20260624/summary.md` The prefix-off run does not have Frontier cache columns in `request_metrics.csv`; `summary.csv` records `cache_metrics_available=false` and the missing-column reason. TTFT/TPOT/E2E/throughput fields are aggregated from Frontier `system_metrics.json` when present. In this tiny smoke they are dummy-predictor plumbing outputs, not performance results. ## Not Yet Run - No `coder_2000` sweep was run in RS3. - No TP/DP/EP matrix was swept. - No batch cap, max batch tokens, block size, chunked-prefill, or threshold matrix was swept beyond the two-config smoke. - No canonical Frontier patched-vs-unpatched comparison was rerun. - No Vidur or AIConfigurator run is part of this harness yet. ## Next Harness Work - Add a small checked-in config for a real RS3 candidate grid only after deciding the patch/upstream policy. - Add guardrails for invalid dense/MoE parallelism combinations before launching larger matrices. - Investigate `coder_2000` missing request-level cache fields before using request-level hit ratio as a headline sweep metric. - Keep latency/throughput result tables clearly separated by predictor/profile mode: dummy smoke, profiled Frontier, or calibrated run.