aituner

Author	SHA1	Message	Date
Gahow Wang	0f15bbc3f1	Make the offered-load axis session-coherent Phase 1 of the two-stop work. Subsampling the trace by per-request uniform score broke multi-turn sessions (a kept turn-2 could lose its turn-1), which lowered the realized KV-cache hit rate as offered load dropped — so the feasibility boundary was measured on a workload with a different C than production, contradicting the paper's scale-stationary L-C-A premise. prepare_trace_windows now resolves each row's session root via the parent_chat_id chain in a single streaming pass and assigns sampling_u per session, so thresholding keeps or drops whole sessions and preserves intra-session prefix reuse. Rows whose parent fell outside the span fall back to grouping under the parent id. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 14:16:06 +08:00
Gahow Wang	6f8e3c95c1	Unify harness L-C-A on the canonical lca.WorkloadProfile Phase 0 of the two-stop work. The prompt block labeled `workload_lca_profile` previously re-derived L-C-A from summarize_window's ad-hoc percentiles, diverging from the paper's 10-dim RobustScaler vector implemented in lca.py. Make that block authoritative: build_harness_context now accepts an optional workload_profile and renders the canonical 10-dim vector + per-family stats when present, falling back to the legacy rendering only when no profile is supplied (direct unit-test calls). Real call sites (study prompt/llm-propose/tune, run_baseline_then_llm) build the profile via lca.build_study_workload_profile and pass it through build_prompt. The heuristic regime classifiers keep reading window_summary; that is the heuristic layer, distinct from the similarity metric. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 14:12:17 +08:00
Gahow Wang	27d1c8fa92	Add L-C-A workload profile metric and CLI profile commands Implement the paper's 10-dimensional L-C-A workload feature vector (RobustScaler-normalized, sim=exp(-\|\|dz\|\|)) in lca.py, and wire it into `aituner profile window` / `aituner profile similarity`. Covered by tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 14:02:24 +08:00
Gahow Wang	d0c89dac48	Clean marked trial engine processes	2026-05-16 15:51:04 +08:00
Gahow Wang	cf9b8b3f68	Clean vLLM process groups after parent exit	2026-05-16 14:52:05 +08:00
Gahow Wang	5a879a8592	Fix decode harness partial probe handling	2026-05-16 14:18:07 +08:00
Gahow Wang	5c2958e6c1	Constrain harness topology by visible GPUs	2026-05-13 01:25:31 +08:00
Gahow Wang	e3ed775afd	Fix harness SLO early-stop diagnosis	2026-05-12 22:20:01 +08:00
Gahow Wang	17e9681ca0	Add profile-driven harness planner	2026-05-12 21:28:44 +08:00
Gahow Wang	2d03b1cd4c	Add SLO-driven topology frontier harness guard	2026-05-12 21:00:49 +08:00
Gahow Wang	e1125475ae	Minimize no-harness ablation prompt	2026-05-12 09:42:53 +08:00
Gahow Wang	ae756600ce	Support full-range and incumbent-floor search modes	2026-05-11 12:58:46 +08:00
Gahow Wang	8516cd88c0	Use full search range for every trial	2026-05-11 12:50:22 +08:00
Gahow Wang	14259fcec9	Measure lower-range performance for infeasible trials	2026-05-10 14:30:34 +08:00
Gahow Wang	bdb08f6edc	Handle missing streamed token metrics	2026-05-10 02:40:00 +08:00
Gahow Wang	adc4351e5d	Report latency stats for infeasible baseline	2026-05-08 11:10:34 +08:00
Gahow Wang	f212673f44	Stop tuning when baseline is infeasible	2026-05-08 01:07:36 +08:00
Gahow Wang	a7a5e9ad80	Make tune trial budget resumable	2026-05-07 17:18:06 +08:00
Gahow Wang	c1ff64381d	Harden trial measurement accounting	2026-05-06 21:18:09 +08:00
Gahow Wang	f653af09a8	Stop harness when feasible probe reaches search high	2026-05-06 17:59:09 +08:00
Gahow Wang	5d96689ea6	Make harness runtime refinement memory safe	2026-05-06 17:37:31 +08:00
Gahow Wang	0622e23817	Guide harness runtime refinement after TP	2026-05-06 02:46:07 +08:00
Gahow Wang	50067c926d	Add harness guided first topology probe	2026-05-06 02:28:46 +08:00
Gahow Wang	4c066c4e4e	Stop harness when search high is saturated	2026-05-02 11:04:59 +08:00
Gahow Wang	4ef69cce78	Make harness stop conservative for ablation	2026-05-02 09:47:16 +08:00
Gahow Wang	1a3d628268	Add harness early stop ablation	2026-05-02 08:08:14 +08:00
Gahow Wang	6d3459c82d	Document decode harness one-shot mechanism	2026-05-02 06:25:06 +08:00
Gahow Wang	9e5394b557	Inherit incumbent topology for runtime validation	2026-04-30 09:33:49 +08:00
Gahow Wang	f59919e21c	Clarify base-relative validation patches	2026-04-30 06:52:09 +08:00
Gahow Wang	38ff4380e5	Make strong incumbent trigger validation phase	2026-04-28 20:54:05 +08:00
Gahow Wang	c9089cf4f0	Ignore non-SLO probe bookkeeping in bottleneck diagnosis	2026-04-28 06:58:38 +08:00
Gahow Wang	a9943e0240	Use probe sequence bottlenecks in harness	2026-04-28 06:57:45 +08:00
Gahow Wang	39aa47fbf1	Add generic decode-only harness guidance	2026-04-28 06:46:18 +08:00
Gahow Wang	29d0548e06	Stop after strong incumbent harness gains	2026-04-26 01:29:05 +08:00
Gahow Wang	6bac389aae	Add infeasible plateau guard to harness	2026-04-25 18:49:23 +08:00
Gahow Wang	6c04b9dbbc	Evaluate baseline before LLM tuning	2026-04-25 17:14:05 +08:00
Gahow Wang	2d7ebe50ee	Drain inflight requests after early stop	2026-04-25 16:57:01 +08:00
Gahow Wang	2dc2815620	Make harness verification portable	2026-04-25 16:37:13 +08:00
Gahow Wang	2c5e9af02a	Add harness-guided tuning prompts	2026-04-25 16:35:33 +08:00
Gahow Wang	4625fba487	trace: make window materialization atomic	2026-04-12 23:09:30 +08:00
Gahow Wang	631a076498	trace: include weekend legacy windows	2026-04-12 22:43:02 +08:00
Gahow Wang	3f20ddf87e	Add qwen235b prefill-only tuning support	2026-04-11 21:00:02 +08:00
Gahow Wang	5e54e9c8f5	Add multi-window baseline vs tuned compare flow	2026-04-11 13:51:54 +08:00
Gahow Wang	83325b2f76	Reset new topology groups to full binary search	2026-04-11 00:36:45 +08:00
Gahow Wang	a4d54442db	Fix topology-aware incumbents for qwen27b tuning	2026-04-11 00:32:41 +08:00
Gahow Wang	8d0777e5e2	Add topology-aware qwen27b 0-8k tuning	2026-04-10 17:41:54 +08:00
Gahow Wang	9422d43737	Prioritize topology exploration in decode tuning	2026-04-10 10:25:41 +08:00
Gahow Wang	d582a8ed1b	Validate served model name consistency	2026-04-09 22:50:23 +08:00
Gahow Wang	ef78fe7eb5	Add topology-aware tuning constraints	2026-04-09 21:07:51 +08:00
Gahow Wang	7371d6635c	Force codex stream to use chat completions	2026-04-09 14:49:40 +08:00

1 2

74 Commits