Replace the out=128 / scale=0.5 ablation substrate with a paper-faithful one: - Use the trace's real output_length (drop completion_tokens_override=128). The 0-8k chat window has p50=531 / p99=2436 / max=35168 output tokens, so decode (TPOT) becomes the dominant bottleneck instead of an artificial 128-token cap. - replay_time_scale=0.8775, chosen by criterion-A: binary-search the smallest scale whose A-family L-C-A similarity to the real (scale=1.0) arrivals stays >= tau (0.90). The old scale=0.5 had sim_A=0.56, distorting the arrival axis far below the tau bar used everywhere else. New calibrator: scripts/calibrate_time_scale.py. - Per-probe Stop-A-consistent drain deadline (worker._probe_drain_deadline): the wall-clock a *feasible* config needs to drain the LCA-admitted set (last_arrival + worst-case TTFT + p99_out * TPOT budget + margin). With real outputs decode dominates wall-clock, so the old fixed 320s cap would truncate the Stop-A offered window mid-decode. early_stop_max_elapsed_s (1000s) is now a hard ceiling; the per-probe deadline governs. The lag cap still cuts overload. 12-iter paired driver (both arms on dash1, removes the dash0/dash1 host confound): scripts/run_ablation_pair_d1.sh. 115 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
28 lines
1.5 KiB
Bash
28 lines
1.5 KiB
Bash
#!/usr/bin/env bash
|
|
# 12-iteration harness-vs-naive ablation, both arms on dash1 (clean paired run,
|
|
# no host confound). Substrate: real output_length (no completion override),
|
|
# replay_time_scale=0.8775 (criterion-A, sim_A>=0.90), Stop-A on (LCA offered
|
|
# window), per-probe Stop-A-consistent drain deadline. Harness stops early; naive
|
|
# runs the full budget. Run from the repo root on dash1.
|
|
set -u
|
|
export OPENAI_API_KEY=$(python3 -c 'import json,pathlib;print(json.load(open(pathlib.Path.home()/".codex/auth.json"))["OPENAI_API_KEY"])')
|
|
# codex config.toml points at a dash0-local proxy (127.0.0.1:11235); on dash1 the
|
|
# LLM endpoint is reachable directly, so force a direct connection.
|
|
export http_proxy= https_proxy= all_proxy= HTTP_PROXY= HTTPS_PROXY= ALL_PROXY= no_proxy='*'
|
|
mkdir -p .aituner
|
|
rm -rf .aituner/abl12-harness .aituner/abl12-naive .aituner/ABLATION12_DONE
|
|
|
|
echo "=== harness ON (12-iter) start $(date -Is) ==="
|
|
PYTHONPATH=src python3 -m aituner.cli study tune \
|
|
--spec configs/examples/dash0_qwen27b_ablation_harness_on.json \
|
|
--store-root .aituner/abl12-harness --max-trials 12 --skip-baseline > .aituner/abl12-harness.log 2>&1
|
|
echo "=== harness ON (12-iter) done $(date -Is) ==="
|
|
|
|
echo "=== naive OFF (12-iter) start $(date -Is) ==="
|
|
PYTHONPATH=src python3 -m aituner.cli study tune \
|
|
--spec configs/examples/dash0_qwen27b_ablation_naive_off.json \
|
|
--store-root .aituner/abl12-naive --max-trials 12 --skip-baseline > .aituner/abl12-naive.log 2>&1
|
|
echo "=== naive OFF (12-iter) done $(date -Is) ==="
|
|
|
|
touch .aituner/ABLATION12_DONE
|