59 Commits

Author SHA1 Message Date
adc4351e5d Report latency stats for infeasible baseline 2026-05-08 11:10:34 +08:00
f212673f44 Stop tuning when baseline is infeasible 2026-05-08 01:07:36 +08:00
a7a5e9ad80 Make tune trial budget resumable 2026-05-07 17:18:06 +08:00
c1ff64381d Harden trial measurement accounting 2026-05-06 21:18:09 +08:00
f653af09a8 Stop harness when feasible probe reaches search high 2026-05-06 17:59:09 +08:00
5d96689ea6 Make harness runtime refinement memory safe 2026-05-06 17:37:31 +08:00
0622e23817 Guide harness runtime refinement after TP 2026-05-06 02:46:07 +08:00
50067c926d Add harness guided first topology probe 2026-05-06 02:28:46 +08:00
4c066c4e4e Stop harness when search high is saturated 2026-05-02 11:04:59 +08:00
4ef69cce78 Make harness stop conservative for ablation 2026-05-02 09:47:16 +08:00
1a3d628268 Add harness early stop ablation 2026-05-02 08:08:14 +08:00
6d3459c82d Document decode harness one-shot mechanism 2026-05-02 06:25:06 +08:00
9e5394b557 Inherit incumbent topology for runtime validation 2026-04-30 09:33:49 +08:00
f59919e21c Clarify base-relative validation patches 2026-04-30 06:52:09 +08:00
38ff4380e5 Make strong incumbent trigger validation phase 2026-04-28 20:54:05 +08:00
c9089cf4f0 Ignore non-SLO probe bookkeeping in bottleneck diagnosis 2026-04-28 06:58:38 +08:00
a9943e0240 Use probe sequence bottlenecks in harness 2026-04-28 06:57:45 +08:00
39aa47fbf1 Add generic decode-only harness guidance 2026-04-28 06:46:18 +08:00
29d0548e06 Stop after strong incumbent harness gains 2026-04-26 01:29:05 +08:00
6bac389aae Add infeasible plateau guard to harness 2026-04-25 18:49:23 +08:00
6c04b9dbbc Evaluate baseline before LLM tuning 2026-04-25 17:14:05 +08:00
2d7ebe50ee Drain inflight requests after early stop 2026-04-25 16:57:01 +08:00
2dc2815620 Make harness verification portable 2026-04-25 16:37:13 +08:00
2c5e9af02a Add harness-guided tuning prompts 2026-04-25 16:35:33 +08:00
4625fba487 trace: make window materialization atomic 2026-04-12 23:09:30 +08:00
631a076498 trace: include weekend legacy windows 2026-04-12 22:43:02 +08:00
3f20ddf87e Add qwen235b prefill-only tuning support 2026-04-11 21:00:02 +08:00
5e54e9c8f5 Add multi-window baseline vs tuned compare flow 2026-04-11 13:51:54 +08:00
83325b2f76 Reset new topology groups to full binary search 2026-04-11 00:36:45 +08:00
a4d54442db Fix topology-aware incumbents for qwen27b tuning 2026-04-11 00:32:41 +08:00
8d0777e5e2 Add topology-aware qwen27b 0-8k tuning 2026-04-10 17:41:54 +08:00
9422d43737 Prioritize topology exploration in decode tuning 2026-04-10 10:25:41 +08:00
d582a8ed1b Validate served model name consistency 2026-04-09 22:50:23 +08:00
ef78fe7eb5 Add topology-aware tuning constraints 2026-04-09 21:07:51 +08:00
7371d6635c Force codex stream to use chat completions 2026-04-09 14:49:40 +08:00
ceafecd8f0 Fix list flag serialization for engine launch 2026-04-09 11:52:27 +08:00
c158807fac Add decode-only study mode support 2026-04-09 11:23:17 +08:00
96140b79bb Add streaming LLM proposal support 2026-04-09 01:06:45 +08:00
46151512cd Support codex reasoning effort override 2026-04-09 00:57:33 +08:00
0990a3771e Support codex responses API 2026-04-09 00:55:05 +08:00
79ba8a50c8 Repair truncated LLM proposal JSON 2026-04-07 11:38:08 +08:00
94c89e1103 Add codex and bailian LLM provider presets 2026-04-07 11:31:26 +08:00
46ed688ace Add trace length bucket tuning support 2026-04-07 11:03:16 +08:00
84c5d6bd80 Add deeper infeasible probe diagnostics 2026-04-05 01:44:38 +08:00
0aa607a4f1 Kill engine process groups on trial cleanup 2026-04-05 01:30:05 +08:00
e00bedb466 Stop waiting on in-flight requests after early stop 2026-04-05 00:56:26 +08:00
75a9842f1a Bypass proxies for loopback engines 2026-04-04 23:50:42 +08:00
7632de8dad Record failed trial context 2026-04-04 23:35:07 +08:00
8b024c72f1 Tighten LLM proposal schema 2026-04-04 23:24:32 +08:00
00778eff42 Harden LLM proposal parsing 2026-04-04 23:19:42 +08:00