aituner

Author	SHA1	Message	Date
Gahow Wang	4a64196a99	Add 27B Stop-B agentic-loop config (harness-driven, GPUs 2-7) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 09:08:46 +08:00
Gahow Wang	b1b74318f6	Pin 27B A/B to GPUs 2-7 (route around leaked GPU0/1 memory) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 23:01:22 +08:00
Gahow Wang	3541065675	Speed up 27B TP A/B: request_timeout 180s, search.high 0.125 The wide 0.5 range made TP1 (low-capacity) waste many infeasible high-theta probes, and the 900s request timeout made overloaded probes drain hung requests for 15min each. Cap drain at 180s and bound the search to where the boundaries actually are. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 22:40:42 +08:00
Gahow Wang	7678c7d5e8	Switch 27B TP A/B to length-aware TTFT SLO (4s + L_in/8k), widen search Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 20:35:23 +08:00
Gahow Wang	4f45b546a1	Add 27B TP A/B (deterministic ground-truth: does TP2 beat TP1 per-GPU) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 18:39:54 +08:00
Gahow Wang	0b6beafeb8	Phase 5: widen search.high to 1.0 to force multi-iteration Stop-B convergence Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 17:12:32 +08:00
Gahow Wang	d4aff81691	Add Stop-B end-to-end config (agentic loop, Stop-A enabled) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 17:05:39 +08:00
Gahow Wang	03e556f0ab	Add Stop-A ON config (adaptive_stop enabled + boundary guard) for A/B Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 16:25:24 +08:00
Gahow Wang	958739027a	Fix Stop-A validation config: system vllm, cap max-model-len Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:22:48 +08:00
Gahow Wang	0f57ee96a9	Drop LLM endpoint from Stop-A full-data config (baseline-only run) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:19:46 +08:00
Gahow Wang	3af1d84ac0	Add Stop-A full-data validation config (real-time replay, no cap) A single-config baseline run with adaptive_stop disabled and replay_time_scale=1.0, so per-request probe_details capture the full 600s window for offline analysis of whether truncating at the L-C-A convergence prefix preserves the feasibility verdict. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:15:12 +08:00
Gahow Wang	ccbf24ac47	Use time-compressed community vllm ablation	2026-05-02 10:03:59 +08:00
Gahow Wang	d3d4c234f6	Bound community vllm ablation replay	2026-05-02 09:58:56 +08:00
Gahow Wang	4ef69cce78	Make harness stop conservative for ablation	2026-05-02 09:47:16 +08:00
Gahow Wang	664aeb49b2	Use local cache for qwen30b vllm runs	2026-05-02 08:47:16 +08:00
Gahow Wang	1880e859b5	Use vllm cu129 wheel on dash0	2026-05-02 08:28:23 +08:00
Gahow Wang	e215827503	Use uv auto torch backend for vllm 0.20	2026-05-02 08:21:27 +08:00
Gahow Wang	a7c9518ef6	Use local vllm venv for dash0 community run	2026-05-02 08:17:04 +08:00
Gahow Wang	1a3d628268	Add harness early stop ablation	2026-05-02 08:08:14 +08:00
Gahow Wang	6d3459c82d	Document decode harness one-shot mechanism	2026-05-02 06:25:06 +08:00
Gahow Wang	9919b9a7bd	configs: add q235b prefill 1s 2s 0-32k study	2026-04-17 19:25:32 +08:00
Gahow Wang	34eb495b3e	configs: add qwen235b prefill 0-32k study	2026-04-17 19:20:44 +08:00
Gahow Wang	26f3b46966	compare: add multi-candidate runner	2026-04-13 20:50:39 +08:00
Gahow Wang	18ff644b32	configs: add qwen235b prefill tight ttft 0323 study	2026-04-13 09:39:32 +08:00
Gahow Wang	edfd61a696	Add qwen235b prefill docs and tight TTFT spec	2026-04-12 11:24:23 +08:00
Gahow Wang	3f20ddf87e	Add qwen235b prefill-only tuning support	2026-04-11 21:00:02 +08:00
Gahow Wang	5e54e9c8f5	Add multi-window baseline vs tuned compare flow	2026-04-11 13:51:54 +08:00
Gahow Wang	31dd44c54b	Align qwen27b baseline proposal to TP1 run script	2026-04-11 00:40:05 +08:00
Gahow Wang	a4d54442db	Fix topology-aware incumbents for qwen27b tuning	2026-04-11 00:32:41 +08:00
Gahow Wang	06d4c380b3	Align qwen27b baseline proposal with topology study	2026-04-10 17:43:02 +08:00
Gahow Wang	8d0777e5e2	Add topology-aware qwen27b 0-8k tuning	2026-04-10 17:41:54 +08:00
Gahow Wang	9422d43737	Prioritize topology exploration in decode tuning	2026-04-10 10:25:41 +08:00
Gahow Wang	d582a8ed1b	Validate served model name consistency	2026-04-09 22:50:23 +08:00
Gahow Wang	ef78fe7eb5	Add topology-aware tuning constraints	2026-04-09 21:07:51 +08:00
Gahow Wang	581ef7ccea	Add qwen235b decode TPOT40 study config	2026-04-09 12:57:05 +08:00
Gahow Wang	c158807fac	Add decode-only study mode support	2026-04-09 11:23:17 +08:00
Gahow Wang	94c89e1103	Add codex and bailian LLM provider presets	2026-04-07 11:31:26 +08:00
Gahow Wang	46ed688ace	Add trace length bucket tuning support	2026-04-07 11:03:16 +08:00
Gahow Wang	e9b5e9b957	Add targeted low-threshold probe specs	2026-04-05 02:08:27 +08:00
Gahow Wang	84c5d6bd80	Add deeper infeasible probe diagnostics	2026-04-05 01:44:38 +08:00
Gahow Wang	8b024c72f1	Tighten LLM proposal schema	2026-04-04 23:24:32 +08:00
Gahow Wang	7e8523fdaa	Add probe early stop guards	2026-04-04 22:58:33 +08:00
Gahow Wang	56fa6747d2	Add replay time scaling for smoke tuning	2026-04-04 22:40:49 +08:00
Gahow Wang	dcb972014a	Enable BLADNN for dash0 fp4 smoke study	2026-04-04 22:32:55 +08:00
Gahow Wang	f192c741ed	Add study tune loop and smoke configs	2026-04-04 22:29:59 +08:00
Gahow Wang	7b7eaafd78	Use time-based trace window ids	2026-04-04 22:09:43 +08:00
gahow	cdcca1d9d7	Initial AITuner study orchestrator	2026-04-04 21:26:37 +08:00

47 Commits