Record plateau guard verification
This commit is contained in:
@@ -113,10 +113,17 @@ Improve AITuner convergence for the `dash0` internal vLLM + Qwen3.5-27B 0-8k cha
|
|||||||
- Added unit coverage for:
|
- Added unit coverage for:
|
||||||
- TTFT failure classification under `slo_pass_rate_unrecoverable`;
|
- TTFT failure classification under `slo_pass_rate_unrecoverable`;
|
||||||
- blocking a repeat of the DP family after DP4 and DP8 show no material improvement at the same sampling threshold.
|
- blocking a repeat of the DP family after DP4 and DP8 show no material improvement at the same sampling threshold.
|
||||||
|
- Pulled the commit on `dash0` and reran remote verification:
|
||||||
|
- `python3 -m compileall -q src tests`: passed.
|
||||||
|
- `PYTHONPATH=src python3 -m unittest discover -s tests -p "test_*.py"`: passed, 62 tests.
|
||||||
|
- Regenerated a prompt against the real smoke v2 history:
|
||||||
|
- `convergence_guard.reason`: `data-parallel-size_plateau_on_infeasible_trials`.
|
||||||
|
- `should_stop_if_no_harness_can_justify_a_new_adjacent_probe`: `true`.
|
||||||
|
- blocked primary family: `data-parallel-size`.
|
||||||
|
- latest two active bottlenecks after ignoring `probe_elapsed_s>` for voting: `ttft_prefill`, `ttft_prefill`.
|
||||||
- Current status: the harness now has the mechanism needed to avoid continuing the exact DP-only direction seen in the smoke v2 plateau. The next real experiment should either switch to a bottleneck-justified mixed TP/DP candidate or return `should_stop=true`.
|
- Current status: the harness now has the mechanism needed to avoid continuing the exact DP-only direction seen in the smoke v2 plateau. The next real experiment should either switch to a bottleneck-justified mixed TP/DP candidate or return `should_stop=true`.
|
||||||
|
|
||||||
Remaining next steps:
|
Remaining next steps:
|
||||||
|
|
||||||
1. Push/pull the plateau-guard commit to `dash0`.
|
1. Start the next real tuning run only after deciding whether to spend a full multi-hour run on the production SLO or a shorter prefill-only confirmation of the new plateau guard.
|
||||||
2. Re-run the remote unit suite.
|
2. If the LLM proposes another DP-only change after this guard fires, tighten validation to reject proposals that repeat `convergence_guard.infeasible_progress.blocked_primary_family`.
|
||||||
3. Start the next real tuning run only after deciding whether to spend a full multi-hour run on the production SLO or a shorter prefill-only confirmation of the new plateau guard.
|
|
||||||
|
|||||||
Reference in New Issue
Block a user