4a64196a99
Add 27B Stop-B agentic-loop config (harness-driven, GPUs 2-7)
...
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-16 09:08:46 +08:00
b1b74318f6
Pin 27B A/B to GPUs 2-7 (route around leaked GPU0/1 memory)
...
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 23:01:22 +08:00
3541065675
Speed up 27B TP A/B: request_timeout 180s, search.high 0.125
...
The wide 0.5 range made TP1 (low-capacity) waste many infeasible high-theta probes,
and the 900s request timeout made overloaded probes drain hung requests for 15min
each. Cap drain at 180s and bound the search to where the boundaries actually are.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 22:40:42 +08:00
7678c7d5e8
Switch 27B TP A/B to length-aware TTFT SLO (4s + L_in/8k), widen search
...
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 20:35:23 +08:00
4f45b546a1
Add 27B TP A/B (deterministic ground-truth: does TP2 beat TP1 per-GPU)
...
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 18:39:54 +08:00
0b6beafeb8
Phase 5: widen search.high to 1.0 to force multi-iteration Stop-B convergence
...
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 17:12:32 +08:00
d4aff81691
Add Stop-B end-to-end config (agentic loop, Stop-A enabled)
...
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 17:05:39 +08:00
03e556f0ab
Add Stop-A ON config (adaptive_stop enabled + boundary guard) for A/B
...
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 16:25:24 +08:00
958739027a
Fix Stop-A validation config: system vllm, cap max-model-len
...
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 15:22:48 +08:00
0f57ee96a9
Drop LLM endpoint from Stop-A full-data config (baseline-only run)
...
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 15:19:46 +08:00
3af1d84ac0
Add Stop-A full-data validation config (real-time replay, no cap)
...
A single-config baseline run with adaptive_stop disabled and replay_time_scale=1.0,
so per-request probe_details capture the full 600s window for offline analysis of
whether truncating at the L-C-A convergence prefix preserves the feasibility verdict.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 15:15:12 +08:00
ccbf24ac47
Use time-compressed community vllm ablation
2026-05-02 10:03:59 +08:00
d3d4c234f6
Bound community vllm ablation replay
2026-05-02 09:58:56 +08:00
4ef69cce78
Make harness stop conservative for ablation
2026-05-02 09:47:16 +08:00
664aeb49b2
Use local cache for qwen30b vllm runs
2026-05-02 08:47:16 +08:00
1880e859b5
Use vllm cu129 wheel on dash0
2026-05-02 08:28:23 +08:00
e215827503
Use uv auto torch backend for vllm 0.20
2026-05-02 08:21:27 +08:00
a7c9518ef6
Use local vllm venv for dash0 community run
2026-05-02 08:17:04 +08:00
1a3d628268
Add harness early stop ablation
2026-05-02 08:08:14 +08:00
6d3459c82d
Document decode harness one-shot mechanism
2026-05-02 06:25:06 +08:00
9919b9a7bd
configs: add q235b prefill 1s 2s 0-32k study
2026-04-17 19:25:32 +08:00
34eb495b3e
configs: add qwen235b prefill 0-32k study
2026-04-17 19:20:44 +08:00
26f3b46966
compare: add multi-candidate runner
2026-04-13 20:50:39 +08:00
18ff644b32
configs: add qwen235b prefill tight ttft 0323 study
2026-04-13 09:39:32 +08:00
edfd61a696
Add qwen235b prefill docs and tight TTFT spec
2026-04-12 11:24:23 +08:00
3f20ddf87e
Add qwen235b prefill-only tuning support
2026-04-11 21:00:02 +08:00
5e54e9c8f5
Add multi-window baseline vs tuned compare flow
2026-04-11 13:51:54 +08:00
31dd44c54b
Align qwen27b baseline proposal to TP1 run script
2026-04-11 00:40:05 +08:00
a4d54442db
Fix topology-aware incumbents for qwen27b tuning
2026-04-11 00:32:41 +08:00
06d4c380b3
Align qwen27b baseline proposal with topology study
2026-04-10 17:43:02 +08:00
8d0777e5e2
Add topology-aware qwen27b 0-8k tuning
2026-04-10 17:41:54 +08:00
9422d43737
Prioritize topology exploration in decode tuning
2026-04-10 10:25:41 +08:00
d582a8ed1b
Validate served model name consistency
2026-04-09 22:50:23 +08:00
ef78fe7eb5
Add topology-aware tuning constraints
2026-04-09 21:07:51 +08:00
581ef7ccea
Add qwen235b decode TPOT40 study config
2026-04-09 12:57:05 +08:00
c158807fac
Add decode-only study mode support
2026-04-09 11:23:17 +08:00
94c89e1103
Add codex and bailian LLM provider presets
2026-04-07 11:31:26 +08:00
46ed688ace
Add trace length bucket tuning support
2026-04-07 11:03:16 +08:00
e9b5e9b957
Add targeted low-threshold probe specs
2026-04-05 02:08:27 +08:00
84c5d6bd80
Add deeper infeasible probe diagnostics
2026-04-05 01:44:38 +08:00
8b024c72f1
Tighten LLM proposal schema
2026-04-04 23:24:32 +08:00
7e8523fdaa
Add probe early stop guards
2026-04-04 22:58:33 +08:00
56fa6747d2
Add replay time scaling for smoke tuning
2026-04-04 22:40:49 +08:00
dcb972014a
Enable BLADNN for dash0 fp4 smoke study
2026-04-04 22:32:55 +08:00
f192c741ed
Add study tune loop and smoke configs
2026-04-04 22:29:59 +08:00
7b7eaafd78
Use time-based trace window ids
2026-04-04 22:09:43 +08:00
gahow
cdcca1d9d7
Initial AITuner study orchestrator
2026-04-04 21:26:37 +08:00