8b4116fad0
Add reference paper and qwen27b tpot25 16-iter notes
...
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 14:02:30 +08:00
27d1c8fa92
Add L-C-A workload profile metric and CLI profile commands
...
Implement the paper's 10-dimensional L-C-A workload feature vector
(RobustScaler-normalized, sim=exp(-||dz||)) in lca.py, and wire it into
`aituner profile window` / `aituner profile similarity`. Covered by tests.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com >
2026-06-15 14:02:24 +08:00
984eb1f325
Document 8-GPU harness ablation results for qwen27b and qwen235b prefill
...
Add completed experiment results from dash0 runs after 2026-05-13:
- qwen27b chat 0-8k: harness +118.6% over no-harness (0.2696 vs 0.1233 req/s/GPU)
- qwen235b prefill TTFT 3s/6s/9s: harness +76.8% (0.3921 vs 0.2217 req/s/GPU)
Mark old 7-GPU and pre-5/13 docs as superseded. Update implementation
log with completed run status.
2026-05-16 21:23:16 +08:00
d0c89dac48
Clean marked trial engine processes
2026-05-16 15:51:04 +08:00
cf9b8b3f68
Clean vLLM process groups after parent exit
2026-05-16 14:52:05 +08:00
5a879a8592
Fix decode harness partial probe handling
2026-05-16 14:18:07 +08:00
f18765b235
Document eight-GPU harness rerun
2026-05-13 09:04:14 +08:00
5c2958e6c1
Constrain harness topology by visible GPUs
2026-05-13 01:25:31 +08:00
fb6d74a18c
Document harness v2 rerun criteria
2026-05-12 22:23:12 +08:00
e3ed775afd
Fix harness SLO early-stop diagnosis
2026-05-12 22:20:01 +08:00
ef359c8eea
Document profile-driven harness run
2026-05-12 21:40:19 +08:00
17e9681ca0
Add profile-driven harness planner
2026-05-12 21:28:44 +08:00
63d6a111f4
Document profile-driven harness design
2026-05-12 21:09:29 +08:00
2d03b1cd4c
Add SLO-driven topology frontier harness guard
2026-05-12 21:00:49 +08:00
e1125475ae
Minimize no-harness ablation prompt
2026-05-12 09:42:53 +08:00
ae756600ce
Support full-range and incumbent-floor search modes
2026-05-11 12:58:46 +08:00
8516cd88c0
Use full search range for every trial
2026-05-11 12:50:22 +08:00
14259fcec9
Measure lower-range performance for infeasible trials
2026-05-10 14:30:34 +08:00
bf7c02e721
Clarify qwen27b raw per-iteration performance
2026-05-10 14:24:10 +08:00
b0325ecfd9
Clarify qwen235b raw per-iteration performance
2026-05-10 14:21:49 +08:00
4cfd3757b6
Document qwen235b prefill harness ablation
2026-05-10 13:05:49 +08:00
bdb08f6edc
Handle missing streamed token metrics
2026-05-10 02:40:00 +08:00
307e2eb0e8
Document qwen27b harness ablation
2026-05-10 01:12:21 +08:00
adc4351e5d
Report latency stats for infeasible baseline
2026-05-08 11:10:34 +08:00
eb137a0b62
Document TPOT40 baseline infeasible run
2026-05-08 02:57:03 +08:00
f212673f44
Stop tuning when baseline is infeasible
2026-05-08 01:07:36 +08:00
a7a5e9ad80
Make tune trial budget resumable
2026-05-07 17:18:06 +08:00
7263587cb6
clean: ci
2026-05-06 22:56:53 +08:00
d7df1ebdac
Add open source project metadata
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
2026-05-06 21:18:21 +08:00
c1ff64381d
Harden trial measurement accounting
2026-05-06 21:18:09 +08:00
871c4cfc02
Document qwen27b chat setup audit
2026-05-06 20:32:09 +08:00
98cd6dd81a
Document qwen27b current config harness curve
2026-05-06 18:00:43 +08:00
f653af09a8
Stop harness when feasible probe reaches search high
2026-05-06 17:59:09 +08:00
5d96689ea6
Make harness runtime refinement memory safe
2026-05-06 17:37:31 +08:00
cf2e741550
Document high search rerun
2026-05-06 03:19:51 +08:00
0622e23817
Guide harness runtime refinement after TP
2026-05-06 02:46:07 +08:00
50067c926d
Add harness guided first topology probe
2026-05-06 02:28:46 +08:00
915861b706
Document community vllm harness ablation
2026-05-02 11:17:24 +08:00
4c066c4e4e
Stop harness when search high is saturated
2026-05-02 11:04:59 +08:00
ccbf24ac47
Use time-compressed community vllm ablation
2026-05-02 10:03:59 +08:00
d3d4c234f6
Bound community vllm ablation replay
2026-05-02 09:58:56 +08:00
4ef69cce78
Make harness stop conservative for ablation
2026-05-02 09:47:16 +08:00
664aeb49b2
Use local cache for qwen30b vllm runs
2026-05-02 08:47:16 +08:00
1880e859b5
Use vllm cu129 wheel on dash0
2026-05-02 08:28:23 +08:00
e215827503
Use uv auto torch backend for vllm 0.20
2026-05-02 08:21:27 +08:00
a7c9518ef6
Use local vllm venv for dash0 community run
2026-05-02 08:17:04 +08:00
1a3d628268
Add harness early stop ablation
2026-05-02 08:08:14 +08:00
6d3459c82d
Document decode harness one-shot mechanism
2026-05-02 06:25:06 +08:00
9e5394b557
Inherit incumbent topology for runtime validation
2026-04-30 09:33:49 +08:00
f59919e21c
Clarify base-relative validation patches
2026-04-30 06:52:09 +08:00