Document decode harness one-shot mechanism
This commit is contained in:
@@ -124,4 +124,6 @@ Local verification: `PYTHONPATH=src python3 -m unittest discover -s tests` passe
|
||||
|
||||
For qwen235b decode-only, the harness still accelerates convergence: before harness, the best observed 12-iter result appeared at iter 9 with 0.2817 request/s; with harness, iter 2 reached 0.3767 request/s and later validation did not find a better adjacent or same-topology runtime point.
|
||||
|
||||
The remaining optimization is validation cost, not convergence quality. `trial-0005` took a long time because early-stopped decode-only probes still had to wait for in-flight long-output requests unless the engine is restarted after early stop. Future harness/study templates for long decode-only validation should use or automatically recommend `trace.restart_engine_after_early_stop=true` when repeated SLO-unrecoverable probes are expected.
|
||||
The remaining optimization is validation cost, not convergence quality. `trial-0005` took a long time because early-stopped decode-only probes still had to wait for in-flight long-output requests unless the engine is restarted after early stop. As of 2026-05-02, decode-only studies default to `trace.restart_engine_after_early_stop=true` when the field is not explicitly set, and the qwen235b decode examples set it explicitly.
|
||||
|
||||
See `docs/qwen235b-thinking-decode/one-shot-mechanism-ablation-20260502.md` for the detailed mechanism explanation and harness ablation.
|
||||
|
||||
Reference in New Issue
Block a user