Fig18 substrate: real output_length + criterion-A time_scale + Stop-A drain deadline
Replace the out=128 / scale=0.5 ablation substrate with a paper-faithful one: - Use the trace's real output_length (drop completion_tokens_override=128). The 0-8k chat window has p50=531 / p99=2436 / max=35168 output tokens, so decode (TPOT) becomes the dominant bottleneck instead of an artificial 128-token cap. - replay_time_scale=0.8775, chosen by criterion-A: binary-search the smallest scale whose A-family L-C-A similarity to the real (scale=1.0) arrivals stays >= tau (0.90). The old scale=0.5 had sim_A=0.56, distorting the arrival axis far below the tau bar used everywhere else. New calibrator: scripts/calibrate_time_scale.py. - Per-probe Stop-A-consistent drain deadline (worker._probe_drain_deadline): the wall-clock a *feasible* config needs to drain the LCA-admitted set (last_arrival + worst-case TTFT + p99_out * TPOT budget + margin). With real outputs decode dominates wall-clock, so the old fixed 320s cap would truncate the Stop-A offered window mid-decode. early_stop_max_elapsed_s (1000s) is now a hard ceiling; the per-probe deadline governs. The lag cap still cuts overload. 12-iter paired driver (both arms on dash1, removes the dash0/dash1 host confound): scripts/run_ablation_pair_d1.sh. 115 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -55,6 +55,7 @@ from aituner.store import StudyStore
|
||||
from aituner.trace import load_trace_requests, summarize_window
|
||||
from aituner.worker import (
|
||||
_adaptive_replay_set,
|
||||
_probe_drain_deadline,
|
||||
_install_sigterm_as_keyboardinterrupt,
|
||||
_restore_sigterm,
|
||||
_should_extend_on_boundary,
|
||||
@@ -535,6 +536,38 @@ class CoreFlowTests(unittest.TestCase):
|
||||
)
|
||||
)
|
||||
|
||||
def test_probe_drain_deadline_tracks_admitted_set_and_caps_at_ceiling(self) -> None:
|
||||
slo = SloSpec.from_dict(
|
||||
{
|
||||
"target_pass_rate": 0.95,
|
||||
"ttft_rule": {"kind": "linear_ms", "intercept_ms": 4000, "per_token_ms": 0.125},
|
||||
"tpot_rule": {"kind": "fixed_ms", "threshold_ms": 50},
|
||||
}
|
||||
)
|
||||
|
||||
def req(arrival_s: float, in_tok: int, out_tok: int) -> TraceRequest:
|
||||
return TraceRequest(
|
||||
row_id="r",
|
||||
arrival_s=arrival_s,
|
||||
sampling_u=0.1,
|
||||
body={},
|
||||
prompt_tokens_hint=in_tok,
|
||||
completion_tokens_hint=out_tok,
|
||||
metadata={},
|
||||
)
|
||||
|
||||
# 100 requests, last arrival 500s, p99 in=8000 / out=2000.
|
||||
reqs = [req(float(i * 5), 8000, 2000) for i in range(100)]
|
||||
# deadline = last_arrival + (ttft_ms + p99_out*tpot_ms)/1000 + margin
|
||||
# = 495 + (5000 + 2000*50)/1000 + 30 = 495 + 105 + 30 = 630
|
||||
self.assertAlmostEqual(
|
||||
_probe_drain_deadline(reqs, slo, ceiling=1000.0), 630.0, places=3
|
||||
)
|
||||
# Ceiling caps a deadline that would otherwise exceed it.
|
||||
self.assertEqual(_probe_drain_deadline(reqs, slo, ceiling=400.0), 400.0)
|
||||
# No requests or no TPOT rule -> fall back to the ceiling.
|
||||
self.assertEqual(_probe_drain_deadline([], slo, ceiling=400.0), 400.0)
|
||||
|
||||
def test_linear_ms_ttft_rule_scales_with_input_length(self) -> None:
|
||||
slo = SloSpec.from_dict(
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user