Record qwen235b harness convergence test

2026-04-27 18:59:25 +08:00
parent bc884f6701
commit 71902b9fc2
3 changed files with 82 additions and 4 deletions
--- a/docs/aituner-harness-summary.md
+++ b/docs/aituner-harness-summary.md
@@ -45,7 +45,7 @@ The speedup comes from reducing wasted proposal families, not from changing the
   - Example: qwen27b 0-8k chat reached `TP=2, DP=1` at iter 2 under harness replay, while the original run spent iter 2 on `DP=2` and iter 3 on `DP=4`.

 2. Guarded stop after a strong incumbent
-   - If the newest trial is the incumbent and improves per-GPU throughput by at least `3x` over baseline, the harness requires direct evidence before trying runtime-only tweaks.
+   - If the newest trial is the incumbent and improves per-GPU throughput by at least `1.8x` over baseline, the harness requires direct evidence before trying runtime-only tweaks.
   - Without that guard, the LLM still proposed weak MBT trials after finding the qwen27b best config.
   - With the guard, it emits `should_stop=true`.

@@ -79,5 +79,5 @@ Result:
 ## Current Risks

 - The harness is prompt-guided, not a hard verifier for every rule. If future LLM outputs ignore a fired guard, proposal validation should reject the blocked family explicitly.
- Strong-incumbent stopping is deliberately conservative for the qwen27b pattern. Workloads with narrow runtime sweet spots, such as qwen235b thinking prefill-only, may need a weaker stop threshold or a "continue local refinement" exception.
+- Strong-incumbent stopping is intentionally biased toward fewer GPU trials once a large gain is already reached. Workloads with very narrow runtime sweet spots may still need a "continue local refinement" exception when the user wants absolute best throughput rather than fastest convergence to a good config.
 - Full fresh reruns on large models are expensive. Strict replay is useful for measuring proposal-path improvements when the proposed configs already exist in prior measured runs, but publication-quality claims still need fresh no-relaunch runs when time allows.