aituner

Files

Gahow Wang e7d1b3ba01 Harness-vs-naive ablation result: harness steers to TP & converges; naive wanders

Controlled use_harness on/off on dense 27B (same workload/SLO/substrate, only the flag
differs). Harness ON: TP2 -> TP4 (0.34 req/s/GPU) in 2 iters, rejected two worse
refinements, premature LLM stop vetoed then honored -> converged, no regression.
Naive OFF: kept TP=1 and cranked runtime knobs (mbt 16k->65k, seqs, caching), all 5
trials infeasible (same TPOT/TTFT compute bottleneck), one engine OOM crash, no feasible
config found. The bottleneck is compute; the harness steered to the knob family that
adds compute (TP) while naive wandered in knobs that cannot. Reproduces the paper's
Fig-18 finding. Substrate is compressed (process comparison, not peak-rate); naive run
was infra-interrupted at trial-5 (already conclusive). Read from cpfs via dash1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-17 09:51:56 +08:00

harness-vs-naive-20260616.md

Harness-vs-naive ablation result: harness steers to TP & converges; naive wanders

2026-06-17 09:51:56 +08:00

profile-driven-harness-design.md

Document profile-driven harness design

2026-05-12 21:09:29 +08:00

profile-driven-harness-implementation-20260512.md

Document 8-GPU harness ablation results for qwen27b and qwen235b prefill

2026-05-16 21:23:16 +08:00

qwen27b-chat-0-8k-ttft4s-tpot25-20260510.md

Document 8-GPU harness ablation results for qwen27b and qwen235b prefill