Document decode harness one-shot mechanism

2026-05-02 06:25:06 +08:00
parent 9e5394b557
commit 6d3459c82d
8 changed files with 185 additions and 4 deletions
--- a/docs/qwen235b-thinking-decode/harness-20260428.md
+++ b/docs/qwen235b-thinking-decode/harness-20260428.md
@@ -124,4 +124,6 @@ Local verification: `PYTHONPATH=src python3 -m unittest discover -s tests` passe

 For qwen235b decode-only, the harness still accelerates convergence: before harness, the best observed 12-iter result appeared at iter 9 with 0.2817 request/s; with harness, iter 2 reached 0.3767 request/s and later validation did not find a better adjacent or same-topology runtime point.

-The remaining optimization is validation cost, not convergence quality. `trial-0005` took a long time because early-stopped decode-only probes still had to wait for in-flight long-output requests unless the engine is restarted after early stop. Future harness/study templates for long decode-only validation should use or automatically recommend `trace.restart_engine_after_early_stop=true` when repeated SLO-unrecoverable probes are expected.
+The remaining optimization is validation cost, not convergence quality. `trial-0005` took a long time because early-stopped decode-only probes still had to wait for in-flight long-output requests unless the engine is restarted after early stop. As of 2026-05-02, decode-only studies default to `trace.restart_engine_after_early_stop=true` when the field is not explicitly set, and the qwen235b decode examples set it explicitly.
+
+See `docs/qwen235b-thinking-decode/one-shot-mechanism-ablation-20260502.md` for the detailed mechanism explanation and harness ablation.