aituner

Author	SHA1	Message	Date
Gahow Wang	0c23285f39	Fig18 substrate: real output_length + criterion-A time_scale + Stop-A drain deadline Replace the out=128 / scale=0.5 ablation substrate with a paper-faithful one: - Use the trace's real output_length (drop completion_tokens_override=128). The 0-8k chat window has p50=531 / p99=2436 / max=35168 output tokens, so decode (TPOT) becomes the dominant bottleneck instead of an artificial 128-token cap. - replay_time_scale=0.8775, chosen by criterion-A: binary-search the smallest scale whose A-family L-C-A similarity to the real (scale=1.0) arrivals stays >= tau (0.90). The old scale=0.5 had sim_A=0.56, distorting the arrival axis far below the tau bar used everywhere else. New calibrator: scripts/calibrate_time_scale.py. - Per-probe Stop-A-consistent drain deadline (worker._probe_drain_deadline): the wall-clock a feasible config needs to drain the LCA-admitted set (last_arrival + worst-case TTFT + p99_out * TPOT budget + margin). With real outputs decode dominates wall-clock, so the old fixed 320s cap would truncate the Stop-A offered window mid-decode. early_stop_max_elapsed_s (1000s) is now a hard ceiling; the per-probe deadline governs. The lag cap still cuts overload. 12-iter paired driver (both arms on dash1, removes the dash0/dash1 host confound): scripts/run_ablation_pair_d1.sh. 115 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 17:24:00 +08:00
Gahow Wang	5965f4fbbc	Ablation substrate: scale=0.5 + out=128 + 6 probes (TP1 measurable, tractable) scale=0.2 made TP1 uniformly infeasible (no baseline); bound decode to 128 tokens and use mild 2x compression so TP1 registers a real, fast baseline, with 6 probes to span TP1's low and TP4's high feasibility boundaries. Both configs identical except use_harness. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 20:29:30 +08:00
Gahow Wang	0794efa249	Reduce ablation probe budget to 3 per trial for tractability First TP1 baseline probe under scale=0.2 ran ~6min (severe overload, 260 preemptions on the lighter half of the trace; TP1 is decode-bound and the arrival-lag early-stop does not cut a decode-drain-bound probe). Cut search.max_probes 5->3 to bound binary-search steps per trial. Caps stay at elapsed=180/lag=30. Both configs still differ only in use_harness + study_id. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 20:01:19 +08:00
Gahow Wang	d975e57bb5	Scale ablation early-stop caps to the compressed window (scale=0.2) At replay_time_scale=0.2 the 600s arrival window compresses to 120s, so the inherited 900s wall-clock elapsed cap let overloaded TP1 probes burn ~15min each (the tractability hazard the brief flagged). Scale the caps proportionately to the time axis: early_stop_max_elapsed_s 900->180, early_stop_max_lag_s 120->30. Feasible probes (~120s arrival + drain) finish well inside 180s; overloaded probes die in ~3min. Both configs still differ only in use_harness + study_id. Adds the ablation doc skeleton and a read-only trajectory-extraction helper. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 19:49:57 +08:00
Gahow Wang	a16016a876	Add harness vs naive ablation configs (27b, scale=0.2 substrate) Two configs identical except llm.use_harness and study_id, for the controlled harness-ON vs naive-OFF tuning-trajectory ablation on dense Qwen3.5-27B. Faster substrate (replay_time_scale=0.2, search.high=0.25, max_probes=5) keeps the ablation tractable; Stop-A stays enabled. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 19:31:23 +08:00
Gahow Wang	4a64196a99	Add 27B Stop-B agentic-loop config (harness-driven, GPUs 2-7) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 09:08:46 +08:00
Gahow Wang	b1b74318f6	Pin 27B A/B to GPUs 2-7 (route around leaked GPU0/1 memory) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 23:01:22 +08:00
Gahow Wang	3541065675	Speed up 27B TP A/B: request_timeout 180s, search.high 0.125 The wide 0.5 range made TP1 (low-capacity) waste many infeasible high-theta probes, and the 900s request timeout made overloaded probes drain hung requests for 15min each. Cap drain at 180s and bound the search to where the boundaries actually are. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 22:40:42 +08:00
Gahow Wang	7678c7d5e8	Switch 27B TP A/B to length-aware TTFT SLO (4s + L_in/8k), widen search Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 20:35:23 +08:00
Gahow Wang	4f45b546a1	Add 27B TP A/B (deterministic ground-truth: does TP2 beat TP1 per-GPU) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 18:39:54 +08:00
Gahow Wang	0b6beafeb8	Phase 5: widen search.high to 1.0 to force multi-iteration Stop-B convergence Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 17:12:32 +08:00
Gahow Wang	d4aff81691	Add Stop-B end-to-end config (agentic loop, Stop-A enabled) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 17:05:39 +08:00
Gahow Wang	03e556f0ab	Add Stop-A ON config (adaptive_stop enabled + boundary guard) for A/B Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 16:25:24 +08:00
Gahow Wang	958739027a	Fix Stop-A validation config: system vllm, cap max-model-len Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:22:48 +08:00
Gahow Wang	0f57ee96a9	Drop LLM endpoint from Stop-A full-data config (baseline-only run) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:19:46 +08:00
Gahow Wang	3af1d84ac0	Add Stop-A full-data validation config (real-time replay, no cap) A single-config baseline run with adaptive_stop disabled and replay_time_scale=1.0, so per-request probe_details capture the full 600s window for offline analysis of whether truncating at the L-C-A convergence prefix preserves the feasibility verdict. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:15:12 +08:00
Gahow Wang	ccbf24ac47	Use time-compressed community vllm ablation	2026-05-02 10:03:59 +08:00
Gahow Wang	d3d4c234f6	Bound community vllm ablation replay	2026-05-02 09:58:56 +08:00
Gahow Wang	4ef69cce78	Make harness stop conservative for ablation	2026-05-02 09:47:16 +08:00
Gahow Wang	664aeb49b2	Use local cache for qwen30b vllm runs	2026-05-02 08:47:16 +08:00
Gahow Wang	1880e859b5	Use vllm cu129 wheel on dash0	2026-05-02 08:28:23 +08:00
Gahow Wang	e215827503	Use uv auto torch backend for vllm 0.20	2026-05-02 08:21:27 +08:00
Gahow Wang	a7c9518ef6	Use local vllm venv for dash0 community run	2026-05-02 08:17:04 +08:00
Gahow Wang	1a3d628268	Add harness early stop ablation	2026-05-02 08:08:14 +08:00
Gahow Wang	6d3459c82d	Document decode harness one-shot mechanism	2026-05-02 06:25:06 +08:00
Gahow Wang	9919b9a7bd	configs: add q235b prefill 1s 2s 0-32k study	2026-04-17 19:25:32 +08:00
Gahow Wang	34eb495b3e	configs: add qwen235b prefill 0-32k study	2026-04-17 19:20:44 +08:00
Gahow Wang	26f3b46966	compare: add multi-candidate runner	2026-04-13 20:50:39 +08:00
Gahow Wang	18ff644b32	configs: add qwen235b prefill tight ttft 0323 study	2026-04-13 09:39:32 +08:00
Gahow Wang	edfd61a696	Add qwen235b prefill docs and tight TTFT spec	2026-04-12 11:24:23 +08:00
Gahow Wang	3f20ddf87e	Add qwen235b prefill-only tuning support	2026-04-11 21:00:02 +08:00
Gahow Wang	5e54e9c8f5	Add multi-window baseline vs tuned compare flow	2026-04-11 13:51:54 +08:00
Gahow Wang	31dd44c54b	Align qwen27b baseline proposal to TP1 run script	2026-04-11 00:40:05 +08:00
Gahow Wang	a4d54442db	Fix topology-aware incumbents for qwen27b tuning	2026-04-11 00:32:41 +08:00
Gahow Wang	06d4c380b3	Align qwen27b baseline proposal with topology study	2026-04-10 17:43:02 +08:00
Gahow Wang	8d0777e5e2	Add topology-aware qwen27b 0-8k tuning	2026-04-10 17:41:54 +08:00
Gahow Wang	9422d43737	Prioritize topology exploration in decode tuning	2026-04-10 10:25:41 +08:00
Gahow Wang	d582a8ed1b	Validate served model name consistency	2026-04-09 22:50:23 +08:00
Gahow Wang	ef78fe7eb5	Add topology-aware tuning constraints	2026-04-09 21:07:51 +08:00
Gahow Wang	581ef7ccea	Add qwen235b decode TPOT40 study config	2026-04-09 12:57:05 +08:00
Gahow Wang	c158807fac	Add decode-only study mode support	2026-04-09 11:23:17 +08:00
Gahow Wang	94c89e1103	Add codex and bailian LLM provider presets	2026-04-07 11:31:26 +08:00
Gahow Wang	46ed688ace	Add trace length bucket tuning support	2026-04-07 11:03:16 +08:00
Gahow Wang	e9b5e9b957	Add targeted low-threshold probe specs	2026-04-05 02:08:27 +08:00
Gahow Wang	84c5d6bd80	Add deeper infeasible probe diagnostics	2026-04-05 01:44:38 +08:00
Gahow Wang	8b024c72f1	Tighten LLM proposal schema	2026-04-04 23:24:32 +08:00
Gahow Wang	7e8523fdaa	Add probe early stop guards	2026-04-04 22:58:33 +08:00
Gahow Wang	56fa6747d2	Add replay time scaling for smoke tuning	2026-04-04 22:40:49 +08:00
Gahow Wang	dcb972014a	Enable BLADNN for dash0 fp4 smoke study	2026-04-04 22:32:55 +08:00
Gahow Wang	f192c741ed	Add study tune loop and smoke configs	2026-04-04 22:29:59 +08:00

1 2

52 Commits