aituner

Author	SHA1	Message	Date
Gahow Wang	958739027a	Fix Stop-A validation config: system vllm, cap max-model-len Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:22:48 +08:00
Gahow Wang	08e53fd897	Add Stop-A calibration script (CPU-only convergence curve) Prints the offered-L-C-A convergence curve and the stop fraction at candidate tau_c values for a raw trace window, to calibrate Stop-A thresholds and compare how late C converges across workloads. No serving required. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:10:02 +08:00
Gahow Wang	0f15bbc3f1	Make the offered-load axis session-coherent Phase 1 of the two-stop work. Subsampling the trace by per-request uniform score broke multi-turn sessions (a kept turn-2 could lose its turn-1), which lowered the realized KV-cache hit rate as offered load dropped — so the feasibility boundary was measured on a workload with a different C than production, contradicting the paper's scale-stationary L-C-A premise. prepare_trace_windows now resolves each row's session root via the parent_chat_id chain in a single streaming pass and assigns sampling_u per session, so thresholding keeps or drops whole sessions and preserves intra-session prefix reuse. Rows whose parent fell outside the span fall back to grouping under the parent id. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 14:16:06 +08:00
Gahow Wang	6f8e3c95c1	Unify harness L-C-A on the canonical lca.WorkloadProfile Phase 0 of the two-stop work. The prompt block labeled `workload_lca_profile` previously re-derived L-C-A from summarize_window's ad-hoc percentiles, diverging from the paper's 10-dim RobustScaler vector implemented in lca.py. Make that block authoritative: build_harness_context now accepts an optional workload_profile and renders the canonical 10-dim vector + per-family stats when present, falling back to the legacy rendering only when no profile is supplied (direct unit-test calls). Real call sites (study prompt/llm-propose/tune, run_baseline_then_llm) build the profile via lca.build_study_workload_profile and pass it through build_prompt. The heuristic regime classifiers keep reading window_summary; that is the heuristic layer, distinct from the similarity metric. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 14:12:17 +08:00
Gahow Wang	c1ff64381d	Harden trial measurement accounting	2026-05-06 21:18:09 +08:00
Gahow Wang	26f3b46966	compare: add multi-candidate runner	2026-04-13 20:50:39 +08:00
Gahow Wang	4625fba487	trace: make window materialization atomic	2026-04-12 23:09:30 +08:00
Gahow Wang	631a076498	trace: include weekend legacy windows	2026-04-12 22:43:02 +08:00
Gahow Wang	edfd61a696	Add qwen235b prefill docs and tight TTFT spec	2026-04-12 11:24:23 +08:00
Gahow Wang	7b7eaafd78	Use time-based trace window ids	2026-04-04 22:09:43 +08:00
Gahow Wang	4e1401f50c	Stream trace window materialization	2026-04-04 21:49:03 +08:00
Gahow Wang	69f666593e	Speed up raw trace window extraction	2026-04-04 21:42:02 +08:00
Gahow Wang	65b122fd4b	Add raw trace window preparation script	2026-04-04 21:37:51 +08:00

13 Commits