Commit Graph

47 Commits

Author SHA1 Message Date
4a64196a99 Add 27B Stop-B agentic-loop config (harness-driven, GPUs 2-7)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 09:08:46 +08:00
b1b74318f6 Pin 27B A/B to GPUs 2-7 (route around leaked GPU0/1 memory)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 23:01:22 +08:00
3541065675 Speed up 27B TP A/B: request_timeout 180s, search.high 0.125
The wide 0.5 range made TP1 (low-capacity) waste many infeasible high-theta probes,
and the 900s request timeout made overloaded probes drain hung requests for 15min
each. Cap drain at 180s and bound the search to where the boundaries actually are.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:40:42 +08:00
7678c7d5e8 Switch 27B TP A/B to length-aware TTFT SLO (4s + L_in/8k), widen search
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 20:35:23 +08:00
4f45b546a1 Add 27B TP A/B (deterministic ground-truth: does TP2 beat TP1 per-GPU)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 18:39:54 +08:00
0b6beafeb8 Phase 5: widen search.high to 1.0 to force multi-iteration Stop-B convergence
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 17:12:32 +08:00
d4aff81691 Add Stop-B end-to-end config (agentic loop, Stop-A enabled)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 17:05:39 +08:00
03e556f0ab Add Stop-A ON config (adaptive_stop enabled + boundary guard) for A/B
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 16:25:24 +08:00
958739027a Fix Stop-A validation config: system vllm, cap max-model-len
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 15:22:48 +08:00
0f57ee96a9 Drop LLM endpoint from Stop-A full-data config (baseline-only run)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 15:19:46 +08:00
3af1d84ac0 Add Stop-A full-data validation config (real-time replay, no cap)
A single-config baseline run with adaptive_stop disabled and replay_time_scale=1.0,
so per-request probe_details capture the full 600s window for offline analysis of
whether truncating at the L-C-A convergence prefix preserves the feasibility verdict.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 15:15:12 +08:00
ccbf24ac47 Use time-compressed community vllm ablation 2026-05-02 10:03:59 +08:00
d3d4c234f6 Bound community vllm ablation replay 2026-05-02 09:58:56 +08:00
4ef69cce78 Make harness stop conservative for ablation 2026-05-02 09:47:16 +08:00
664aeb49b2 Use local cache for qwen30b vllm runs 2026-05-02 08:47:16 +08:00
1880e859b5 Use vllm cu129 wheel on dash0 2026-05-02 08:28:23 +08:00
e215827503 Use uv auto torch backend for vllm 0.20 2026-05-02 08:21:27 +08:00
a7c9518ef6 Use local vllm venv for dash0 community run 2026-05-02 08:17:04 +08:00
1a3d628268 Add harness early stop ablation 2026-05-02 08:08:14 +08:00
6d3459c82d Document decode harness one-shot mechanism 2026-05-02 06:25:06 +08:00
9919b9a7bd configs: add q235b prefill 1s 2s 0-32k study 2026-04-17 19:25:32 +08:00
34eb495b3e configs: add qwen235b prefill 0-32k study 2026-04-17 19:20:44 +08:00
26f3b46966 compare: add multi-candidate runner 2026-04-13 20:50:39 +08:00
18ff644b32 configs: add qwen235b prefill tight ttft 0323 study 2026-04-13 09:39:32 +08:00
edfd61a696 Add qwen235b prefill docs and tight TTFT spec 2026-04-12 11:24:23 +08:00
3f20ddf87e Add qwen235b prefill-only tuning support 2026-04-11 21:00:02 +08:00
5e54e9c8f5 Add multi-window baseline vs tuned compare flow 2026-04-11 13:51:54 +08:00
31dd44c54b Align qwen27b baseline proposal to TP1 run script 2026-04-11 00:40:05 +08:00
a4d54442db Fix topology-aware incumbents for qwen27b tuning 2026-04-11 00:32:41 +08:00
06d4c380b3 Align qwen27b baseline proposal with topology study 2026-04-10 17:43:02 +08:00
8d0777e5e2 Add topology-aware qwen27b 0-8k tuning 2026-04-10 17:41:54 +08:00
9422d43737 Prioritize topology exploration in decode tuning 2026-04-10 10:25:41 +08:00
d582a8ed1b Validate served model name consistency 2026-04-09 22:50:23 +08:00
ef78fe7eb5 Add topology-aware tuning constraints 2026-04-09 21:07:51 +08:00
581ef7ccea Add qwen235b decode TPOT40 study config 2026-04-09 12:57:05 +08:00
c158807fac Add decode-only study mode support 2026-04-09 11:23:17 +08:00
94c89e1103 Add codex and bailian LLM provider presets 2026-04-07 11:31:26 +08:00
46ed688ace Add trace length bucket tuning support 2026-04-07 11:03:16 +08:00
e9b5e9b957 Add targeted low-threshold probe specs 2026-04-05 02:08:27 +08:00
84c5d6bd80 Add deeper infeasible probe diagnostics 2026-04-05 01:44:38 +08:00
8b024c72f1 Tighten LLM proposal schema 2026-04-04 23:24:32 +08:00
7e8523fdaa Add probe early stop guards 2026-04-04 22:58:33 +08:00
56fa6747d2 Add replay time scaling for smoke tuning 2026-04-04 22:40:49 +08:00
dcb972014a Enable BLADNN for dash0 fp4 smoke study 2026-04-04 22:32:55 +08:00
f192c741ed Add study tune loop and smoke configs 2026-04-04 22:29:59 +08:00
7b7eaafd78 Use time-based trace window ids 2026-04-04 22:09:43 +08:00
gahow
cdcca1d9d7 Initial AITuner study orchestrator 2026-04-04 21:26:37 +08:00