Record plateau guard verification

This commit is contained in:
2026-04-25 18:50:23 +08:00
parent 6bac389aae
commit 440f5b491b

View File

@@ -113,10 +113,17 @@ Improve AITuner convergence for the `dash0` internal vLLM + Qwen3.5-27B 0-8k cha
- Added unit coverage for:
- TTFT failure classification under `slo_pass_rate_unrecoverable`;
- blocking a repeat of the DP family after DP4 and DP8 show no material improvement at the same sampling threshold.
- Pulled the commit on `dash0` and reran remote verification:
- `python3 -m compileall -q src tests`: passed.
- `PYTHONPATH=src python3 -m unittest discover -s tests -p "test_*.py"`: passed, 62 tests.
- Regenerated a prompt against the real smoke v2 history:
- `convergence_guard.reason`: `data-parallel-size_plateau_on_infeasible_trials`.
- `should_stop_if_no_harness_can_justify_a_new_adjacent_probe`: `true`.
- blocked primary family: `data-parallel-size`.
- latest two active bottlenecks after ignoring `probe_elapsed_s>` for voting: `ttft_prefill`, `ttft_prefill`.
- Current status: the harness now has the mechanism needed to avoid continuing the exact DP-only direction seen in the smoke v2 plateau. The next real experiment should either switch to a bottleneck-justified mixed TP/DP candidate or return `should_stop=true`.
Remaining next steps:
1. Push/pull the plateau-guard commit to `dash0`.
2. Re-run the remote unit suite.
3. Start the next real tuning run only after deciding whether to spend a full multi-hour run on the production SLO or a shorter prefill-only confirmation of the new plateau guard.
1. Start the next real tuning run only after deciding whether to spend a full multi-hour run on the production SLO or a shorter prefill-only confirmation of the new plateau guard.
2. If the LLM proposes another DP-only change after this guard fires, tighten validation to reject proposals that repeat `convergence_guard.infeasible_progress.blocked_primary_family`.