Record plateau guard verification

2026-04-25 18:50:23 +08:00
parent 6bac389aae
commit 440f5b491b
1 changed files with 10 additions and 3 deletions
--- a/docs/harness-tuning-progress.md
+++ b/docs/harness-tuning-progress.md
@@ -113,10 +113,17 @@ Improve AITuner convergence for the `dash0` internal vLLM + Qwen3.5-27B 0-8k cha
 - Added unit coverage for:
  - TTFT failure classification under `slo_pass_rate_unrecoverable`;
  - blocking a repeat of the DP family after DP4 and DP8 show no material improvement at the same sampling threshold.
 - Pulled the commit on `dash0` and reran remote verification:
  - `python3 -m compileall -q src tests`: passed.
  - `PYTHONPATH=src python3 -m unittest discover -s tests -p "test_*.py"`: passed, 62 tests.
 - Regenerated a prompt against the real smoke v2 history:
  - `convergence_guard.reason`: `data-parallel-size_plateau_on_infeasible_trials`.
  - `should_stop_if_no_harness_can_justify_a_new_adjacent_probe`: `true`.
  - blocked primary family: `data-parallel-size`.
  - latest two active bottlenecks after ignoring `probe_elapsed_s>` for voting: `ttft_prefill`, `ttft_prefill`.
 - Current status: the harness now has the mechanism needed to avoid continuing the exact DP-only direction seen in the smoke v2 plateau. The next real experiment should either switch to a bottleneck-justified mixed TP/DP candidate or return `should_stop=true`.
 Remaining next steps:
-1. Push/pull the plateau-guard commit to `dash0`.
+1. Start the next real tuning run only after deciding whether to spend a full multi-hour run on the production SLO or a shorter prefill-only confirmation of the new plateau guard.
-2. Re-run the remote unit suite.
+2. If the LLM proposes another DP-only change after this guard fires, tighten validation to reject proposals that repeat `convergence_guard.infeasible_progress.blocked_primary_family`.
 3. Start the next real tuning run only after deciding whether to spend a full multi-hour run on the production SLO or a shorter prefill-only confirmation of the new plateau guard.