aituner

Files

Gahow Wang b3156a382a Harness: gate gpu-mem-util/seqs-raise on 'no untested TP increase' (frontier-closed)

The first gpt-5.5 verification run exposed a bug in the prior gate: topology_settled =
cur_tp>base_tp let gpu-memory-utilization fire on a TP2 incumbent (TP2>baseline TP1)
and preempt the still-open TP4 frontier -- the harness proposed TP2+gpu-mem-util=0.92
at iter 2 instead of climbing to TP4. The candidate path runs before the topology-
frontier check, so a score>=0.35 runtime candidate wins.

Fix: gate runtime micro-tuning (gpu-mem-util, raising max-num-seqs) on the TP frontier
being closed -- topology_settled = no untested _next_allowed_tp remains (respects GPU
count, so TP4 is the real ceiling on 6 GPUs). New regression test: TP2 incumbent with
TP4 reachable must climb TP and must NOT propose gpu-mem-util. 116 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-19 13:33:29 +08:00

conftest.py

Initial AITuner study orchestrator

2026-04-04 21:26:37 +08:00

test_core_flow.py

Harness: gate gpu-mem-util/seqs-raise on 'no untested TP increase' (frontier-closed)

2026-06-19 13:33:29 +08:00

test_prepare_trace_windows.py

Make the offered-load axis session-coherent

2026-06-15 14:16:06 +08:00