aituner

Files

Gahow Wang 9f52812753 Document Stop-A validation: calibration + GPU fidelity check

CPU calibration (chat vs coder) reproduces the paper's C-slowest ordering and
shows C-convergence difficulty is driven by signal noise (low-reuse chat) not
reuse magnitude. GPU fidelity check on Qwen3-30B-A3B: truncating at the L-C-A
convergence prefix saves ~52% replay (tau_c=0.90) with 3/4 probe verdicts
preserved; the one mismatch is a boundary false-positive at the feasibility knee
(prefix 0.96 vs full 0.946), caused by second-half engine-state drift the offered
L-C-A cannot see. Argues for revisiting the SLO-boundary guard before enabling.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-15 16:03:16 +08:00

profile-driven-harness-design.md

Document profile-driven harness design

2026-05-12 21:09:29 +08:00

profile-driven-harness-implementation-20260512.md

Document 8-GPU harness ablation results for qwen27b and qwen235b prefill

2026-05-16 21:23:16 +08:00

qwen27b-chat-0-8k-ttft4s-tpot25-20260510.md

Document 8-GPU harness ablation results for qwen27b and qwen235b prefill

2026-05-16 21:23:16 +08:00

qwen27b-chat-0-8k-ttft4s-tpot25-gpu8-20260513.md

Document 8-GPU harness ablation results for qwen27b and qwen235b prefill