diff --git a/docs/harness-tuning-progress.md b/docs/harness-tuning-progress.md index 85a7c46..c4f181d 100644 --- a/docs/harness-tuning-progress.md +++ b/docs/harness-tuning-progress.md @@ -113,10 +113,17 @@ Improve AITuner convergence for the `dash0` internal vLLM + Qwen3.5-27B 0-8k cha - Added unit coverage for: - TTFT failure classification under `slo_pass_rate_unrecoverable`; - blocking a repeat of the DP family after DP4 and DP8 show no material improvement at the same sampling threshold. +- Pulled the commit on `dash0` and reran remote verification: + - `python3 -m compileall -q src tests`: passed. + - `PYTHONPATH=src python3 -m unittest discover -s tests -p "test_*.py"`: passed, 62 tests. +- Regenerated a prompt against the real smoke v2 history: + - `convergence_guard.reason`: `data-parallel-size_plateau_on_infeasible_trials`. + - `should_stop_if_no_harness_can_justify_a_new_adjacent_probe`: `true`. + - blocked primary family: `data-parallel-size`. + - latest two active bottlenecks after ignoring `probe_elapsed_s>` for voting: `ttft_prefill`, `ttft_prefill`. - Current status: the harness now has the mechanism needed to avoid continuing the exact DP-only direction seen in the smoke v2 plateau. The next real experiment should either switch to a bottleneck-justified mixed TP/DP candidate or return `should_stop=true`. Remaining next steps: -1. Push/pull the plateau-guard commit to `dash0`. -2. Re-run the remote unit suite. -3. Start the next real tuning run only after deciding whether to spend a full multi-hour run on the production SLO or a shorter prefill-only confirmation of the new plateau guard. +1. Start the next real tuning run only after deciding whether to spend a full multi-hour run on the production SLO or a shorter prefill-only confirmation of the new plateau guard. +2. If the LLM proposes another DP-only change after this guard fires, tighten validation to reject proposals that repeat `convergence_guard.infeasible_progress.blocked_primary_family`.