From 46e9040613dd3d2c9fa8a058df6ba61fa79f6775 Mon Sep 17 00:00:00 2001 From: Gahow Wang Date: Tue, 28 Apr 2026 21:20:41 +0800 Subject: [PATCH] Record decode validation follow-up --- docs/qwen235b-thinking-decode/harness-20260428.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/qwen235b-thinking-decode/harness-20260428.md b/docs/qwen235b-thinking-decode/harness-20260428.md index ae3c6cb..ab997fc 100644 --- a/docs/qwen235b-thinking-decode/harness-20260428.md +++ b/docs/qwen235b-thinking-decode/harness-20260428.md @@ -82,6 +82,12 @@ Follow-up implementation after this result: - The proposal rules now explicitly say not to stop solely because a strong incumbent appeared. - Proposal parsing now accepts structured `observation`/`diagnosis` by converting them to text, so a usable validation proposal is not dropped only because the LLM used an object instead of a string. +After the implementation fix, the previously rejected `proposal-0004` was resumed as a validation trial: + +- `trial-0004`: same topology validation with `max-num-seqs=160`. +- Remote tmux: `aituner_qwen235b_decode_harness_validate_20260428`. +- Status as of 2026-04-28 13:20 UTC on dash0: running; no result has been written yet. + ## Follow-up Fix The seeded prompt exposed a generic diagnosis issue: if the best feasible probe had no latency failures, the harness could miss the prior infeasible probe that showed the real bottleneck at higher load. The harness now scans the probe sequence backward and uses the nearest non-trivial bottleneck before falling back to the best feasible probe. This keeps decode-only runs focused on `decode_tpot` after a feasible low-load point, without adding testcase thresholds.