Files
aituner/docs/harness-ablation
Gahow Wang 9f52812753 Document Stop-A validation: calibration + GPU fidelity check
CPU calibration (chat vs coder) reproduces the paper's C-slowest ordering and
shows C-convergence difficulty is driven by signal noise (low-reuse chat) not
reuse magnitude. GPU fidelity check on Qwen3-30B-A3B: truncating at the L-C-A
convergence prefix saves ~52% replay (tau_c=0.90) with 3/4 probe verdicts
preserved; the one mismatch is a boundary false-positive at the feasibility knee
(prefix 0.96 vs full 0.946), caused by second-half engine-state drift the offered
L-C-A cannot see. Argues for revisiting the SLO-boundary guard before enabling.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 16:03:16 +08:00
..