Fig18 substrate: real output_length + criterion-A time_scale + Stop-A drain deadline

Replace the out=128 / scale=0.5 ablation substrate with a paper-faithful one: - Use the trace's real output_length (drop completion_tokens_override=128). The 0-8k chat window has p50=531 / p99=2436 / max=35168 output tokens, so decode (TPOT) becomes the dominant bottleneck instead of an artificial 128-token cap. - replay_time_scale=0.8775, chosen by criterion-A: binary-search the smallest scale whose A-family L-C-A similarity to the real (scale=1.0) arrivals stays >= tau (0.90). The old scale=0.5 had sim_A=0.56, distorting the arrival axis far below the tau bar used everywhere else. New calibrator: scripts/calibrate_time_scale.py. - Per-probe Stop-A-consistent drain deadline (worker._probe_drain_deadline): the wall-clock a *feasible* config needs to drain the LCA-admitted set (last_arrival + worst-case TTFT + p99_out * TPOT budget + margin). With real outputs decode dominates wall-clock, so the old fixed 320s cap would truncate the Stop-A offered window mid-decode. early_stop_max_elapsed_s (1000s) is now a hard ceiling; the per-probe deadline governs. The lag cap still cuts overload. 12-iter paired driver (both arms on dash1, removes the dash0/dash1 host confound): scripts/run_ablation_pair_d1.sh. 115 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 17:24:00 +08:00
parent 816765071f
commit 0c23285f39
6 changed files with 197 additions and 10 deletions
--- a/configs/examples/dash0_qwen27b_ablation_harness_on.json
+++ b/configs/examples/dash0_qwen27b_ablation_harness_on.json
@@ -130,9 +130,9 @@
      "min_input_tokens": 0,
      "max_input_tokens": 8192
    },
-    "replay_time_scale": 0.5,
+    "replay_time_scale": 0.8775,
    "early_stop_max_lag_s": 45.0,
-    "early_stop_max_elapsed_s": 320.0,
+    "early_stop_max_elapsed_s": 1000.0,
    "adaptive_stop": {
      "enabled": true,
      "tau": 0.9,
@@ -141,8 +141,7 @@
      "max_checks": 20,
      "min_fraction": 0.1,
      "boundary_delta": 0.02
-    },
-    "completion_tokens_override": 128
+    }
  },
  "slo": {
    "target_pass_rate": 0.95,
--- a/configs/examples/dash0_qwen27b_ablation_naive_off.json
+++ b/configs/examples/dash0_qwen27b_ablation_naive_off.json
@@ -130,9 +130,9 @@
      "min_input_tokens": 0,
      "max_input_tokens": 8192
    },
-    "replay_time_scale": 0.5,
+    "replay_time_scale": 0.8775,
    "early_stop_max_lag_s": 45.0,
-    "early_stop_max_elapsed_s": 320.0,
+    "early_stop_max_elapsed_s": 1000.0,
    "adaptive_stop": {
      "enabled": true,
      "tau": 0.9,
@@ -141,8 +141,7 @@
      "max_checks": 20,
      "min_fraction": 0.1,
      "boundary_delta": 0.02
-    },
-    "completion_tokens_override": 128
+    }
  },
  "slo": {
    "target_pass_rate": 0.95,