aituner

Files

Gahow Wang 816765071f Complete harness-vs-naive ablation: harness 3x faster + stops; naive nondeterministic

Full naive run (dash1) reached the same TP4=0.34 optimum as the harness but took 6
iters (vs 2), never stopped (full budget), and spent trials 2-5 on worse TP2+runtime
detours. The other naive run (dash0) wandered runtime-only on TP1, found nothing, and
crashed the engine. Refined conclusion (matches paper §7.3): a strong model can
sometimes find the right knob unaided, so the harness's value is reliability + speed +
stop discipline, not that naive always fails. Harness: 2 iters-to-best, stopped at 4,
no regression. Naive: 3x slower at best, no stop, failed at worst.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-17 13:03:26 +08:00

harness-ablation

Complete harness-vs-naive ablation: harness 3x faster + stops; naive nondeterministic

2026-06-17 13:03:26 +08:00

qwen27b-chat-0-8k-7day-compare

docs: expand qwen27b 0-8k compare summary

2026-04-17 20:45:24 +08:00

qwen27b-chat-pd-colocation

Add qwen27b and qwen235b tuning notes

2026-04-11 12:07:42 +08:00

qwen30b-community-vllm020

Add open source project metadata