Files
aituner/scripts/run_ablation_pair_d1.sh
Gahow Wang 0c23285f39 Fig18 substrate: real output_length + criterion-A time_scale + Stop-A drain deadline
Replace the out=128 / scale=0.5 ablation substrate with a paper-faithful one:
- Use the trace's real output_length (drop completion_tokens_override=128). The
  0-8k chat window has p50=531 / p99=2436 / max=35168 output tokens, so decode
  (TPOT) becomes the dominant bottleneck instead of an artificial 128-token cap.
- replay_time_scale=0.8775, chosen by criterion-A: binary-search the smallest
  scale whose A-family L-C-A similarity to the real (scale=1.0) arrivals stays
  >= tau (0.90). The old scale=0.5 had sim_A=0.56, distorting the arrival axis
  far below the tau bar used everywhere else. New calibrator:
  scripts/calibrate_time_scale.py.
- Per-probe Stop-A-consistent drain deadline (worker._probe_drain_deadline): the
  wall-clock a *feasible* config needs to drain the LCA-admitted set
  (last_arrival + worst-case TTFT + p99_out * TPOT budget + margin). With real
  outputs decode dominates wall-clock, so the old fixed 320s cap would truncate
  the Stop-A offered window mid-decode. early_stop_max_elapsed_s (1000s) is now a
  hard ceiling; the per-probe deadline governs. The lag cap still cuts overload.

12-iter paired driver (both arms on dash1, removes the dash0/dash1 host confound):
scripts/run_ablation_pair_d1.sh. 115 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 17:24:00 +08:00

28 lines
1.5 KiB
Bash

#!/usr/bin/env bash
# 12-iteration harness-vs-naive ablation, both arms on dash1 (clean paired run,
# no host confound). Substrate: real output_length (no completion override),
# replay_time_scale=0.8775 (criterion-A, sim_A>=0.90), Stop-A on (LCA offered
# window), per-probe Stop-A-consistent drain deadline. Harness stops early; naive
# runs the full budget. Run from the repo root on dash1.
set -u
export OPENAI_API_KEY=$(python3 -c 'import json,pathlib;print(json.load(open(pathlib.Path.home()/".codex/auth.json"))["OPENAI_API_KEY"])')
# codex config.toml points at a dash0-local proxy (127.0.0.1:11235); on dash1 the
# LLM endpoint is reachable directly, so force a direct connection.
export http_proxy= https_proxy= all_proxy= HTTP_PROXY= HTTPS_PROXY= ALL_PROXY= no_proxy='*'
mkdir -p .aituner
rm -rf .aituner/abl12-harness .aituner/abl12-naive .aituner/ABLATION12_DONE
echo "=== harness ON (12-iter) start $(date -Is) ==="
PYTHONPATH=src python3 -m aituner.cli study tune \
--spec configs/examples/dash0_qwen27b_ablation_harness_on.json \
--store-root .aituner/abl12-harness --max-trials 12 --skip-baseline > .aituner/abl12-harness.log 2>&1
echo "=== harness ON (12-iter) done $(date -Is) ==="
echo "=== naive OFF (12-iter) start $(date -Is) ==="
PYTHONPATH=src python3 -m aituner.cli study tune \
--spec configs/examples/dash0_qwen27b_ablation_naive_off.json \
--store-root .aituner/abl12-naive --max-trials 12 --skip-baseline > .aituner/abl12-naive.log 2>&1
echo "=== naive OFF (12-iter) done $(date -Is) ==="
touch .aituner/ABLATION12_DONE