Record Stop-A boundary-guard A/B: correct verdict, ~38% replay saved

With the guard enabled the binary search recovers best sampling_u=0.078125
(rate 2.30 req/s), identical to the full-replay baseline. The guard fired on
exactly the one feasibility-knee probe (0.08594, re-measured full -> infeasible);
the other three probes truncated to ~45-50%. Net ~38% replay saved on the trial
with no peak-rate overestimate. Stop-A + boundary guard is safe to enable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-15 16:57:53 +08:00
parent 03e556f0ab
commit f31e9ccfd5

View File

@@ -79,6 +79,31 @@ L-C-A converges. It targets exactly this knee case at low extra cost (it only
extends replay on probes sitting on the feasibility boundary). Recommend adding it extends replay on probes sitting on the feasibility boundary). Recommend adding it
as a small Stop-A enhancement before enabling Stop-A in production studies. as a small Stop-A enhancement before enabling Stop-A in production studies.
## 4. SLO-boundary guard (implemented + validated)
Added `trace.adaptive_stop.boundary_delta` (default 0.02): when a truncated probe's
measured pass-rate lands within ±δ of the SLO target, re-measure on the full window
and use that verdict. Re-ran the same config with `adaptive_stop` enabled
(τ=0.9, τ_c=0.90, δ=0.02):
| threshold | feasible | pass | selected | replayed | boundary_extended |
| --- | --- | --- | --- | --- | --- |
| 0.06250 | True | 1.000 | 1086 | 487 (45%) | — |
| 0.09375 | False | 0.444 | 1656 | 822 (50%) | — |
| 0.07812 | True | 0.994 | 1378 | 682 (49%) | — |
| 0.08594 | **False** | 0.947 | 1523 | **1523 (100%)** | **True** |
Result: best feasible `sampling_u=0.078125` (rate 2.30 req/s) — **identical to the
full-replay baseline**. The guard fired on exactly the one knee probe and
re-measured it to the correct infeasible verdict; the other three probes truncated
to ~4550%. Net replayed 3514/5643 requests ≈ **38% replay saved on this trial
while recovering the correct peak rate** (no one-step overestimate).
**Conclusion: Stop-A with the boundary guard is correct (verdict matches full
replay) and still saves replay time. Safe to enable.** Configs:
`dash0_qwen30b_a3b_stopA_fulldata.json` (OFF baseline) and
`dash0_qwen30b_a3b_stopA_on.json` (ON).
## Repro ## Repro
``` ```