Scale ablation early-stop caps to the compressed window (scale=0.2)
At replay_time_scale=0.2 the 600s arrival window compresses to 120s, so the inherited 900s wall-clock elapsed cap let overloaded TP1 probes burn ~15min each (the tractability hazard the brief flagged). Scale the caps proportionately to the time axis: early_stop_max_elapsed_s 900->180, early_stop_max_lag_s 120->30. Feasible probes (~120s arrival + drain) finish well inside 180s; overloaded probes die in ~3min. Both configs still differ only in use_harness + study_id. Adds the ablation doc skeleton and a read-only trajectory-extraction helper. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
62
docs/harness-ablation/harness-vs-naive-20260616.md
Normal file
62
docs/harness-ablation/harness-vs-naive-20260616.md
Normal file
@@ -0,0 +1,62 @@
|
||||
# Harness vs naive agentic tuner — controlled ablation on dense Qwen3.5-27B — 2026-06-16
|
||||
|
||||
Branch `main`. Quantifies the value of the paper's **harness** (domain-knowledge
|
||||
knob-family guidance) by running the agentic tuning loop twice on the *same*
|
||||
workload, identical in every respect except `llm.use_harness`:
|
||||
|
||||
- **Harness ON** (`dash0_qwen27b_ablation_harness_on.json`, study
|
||||
`dash0-qwen27b-ablation-harness-on`): the prompt carries the `Harnesses:`
|
||||
section (ranked bottleneck hypotheses + per-knob-family use-when / procedure /
|
||||
guards, with an `active_now` flag), the loop can emit a deterministic
|
||||
harness-guided first probe, and a **Stop-B validator** gates the LLM's
|
||||
`should_stop` (an unauthorized stop is vetoed).
|
||||
- **Naive OFF** (`dash0_qwen27b_ablation_naive_off.json`, study
|
||||
`dash0-qwen27b-ablation-naive-off`): `use_harness=false`. No harness prompt
|
||||
section, no deterministic guided/stop proposals, and the LLM's own `should_stop`
|
||||
is honored without a validator veto. The prompt still tells the LLM that
|
||||
TP/DP/EP are tunable and gives the full study/SLO/trial-history context — so the
|
||||
difference is purely the harness guidance, this is the paper's "naive agentic
|
||||
tuner."
|
||||
|
||||
The two config files differ in **exactly two keys** (`llm.use_harness` and
|
||||
`study_id`); verified by diff.
|
||||
|
||||
## Substrate (why these knobs, and the comparability caveat)
|
||||
|
||||
This ablation measures the **tuning process** (proposal path + convergence), not
|
||||
absolute peak-rate, so a faster replay substrate is used to keep it tractable
|
||||
(at `replay_time_scale=1.0` a single TP4 trial took ~3 h — see
|
||||
`stop-b-e2e-27b-20260616.md`).
|
||||
|
||||
| knob | value | rationale |
|
||||
| --- | --- | --- |
|
||||
| `trace.replay_time_scale` | **0.2** | arrival times are multiplied by 0.2, i.e. the same request set arrives in 1/5 the wall-clock → ~5× higher effective offered load. `arrival_s = timestamp * time_scale` (`trace.py:223`). Mild arrival-time compression: the lever the brief prescribes (compress time, do **not** just cut the elapsed cap). |
|
||||
| `search.high` | 0.25 | upper bound of the sampling_u binary search |
|
||||
| `search.max_probes` | 5 | probe budget per trial |
|
||||
| `--max-trials` | 8 | iteration budget |
|
||||
| Stop-A | **enabled** (unchanged) | converged-prefix replay truncation stays on for both runs |
|
||||
| SLO | length-aware TTFT (4s + L_in/8k) + TPOT ≤ 50 ms | unchanged from base |
|
||||
| GPUs | `CUDA_VISIBLE_DEVICES=2,3,4,5,6,7` | GPUs 0/1 avoided |
|
||||
|
||||
**Comparability caveat.** Because arrival times are compressed 5×, the absolute
|
||||
`request_rate_per_gpu` values are **not** comparable to the scale=1.0 ground-truth
|
||||
climb (TP1 0.123 → TP2 0.29 → TP4 1.00). The ablation reads the **trajectory
|
||||
shape** (which knob family each iteration tries, whether the incumbent climbs
|
||||
monotonically, where each run stops) and the **relative** per-GPU ordering across
|
||||
topologies — not the absolute numbers.
|
||||
|
||||
## Run 1 — Harness ON
|
||||
|
||||
<!-- TRAJECTORY_ON -->
|
||||
|
||||
## Run 2 — Naive OFF
|
||||
|
||||
<!-- TRAJECTORY_OFF -->
|
||||
|
||||
## The five comparison metrics
|
||||
|
||||
<!-- METRICS -->
|
||||
|
||||
## Analysis & caveats
|
||||
|
||||
<!-- ANALYSIS -->
|
||||
Reference in New Issue
Block a user