aituner

Files

Gahow Wang 0f15bbc3f1 Make the offered-load axis session-coherent

Phase 1 of the two-stop work. Subsampling the trace by per-request uniform score
broke multi-turn sessions (a kept turn-2 could lose its turn-1), which lowered the
realized KV-cache hit rate as offered load dropped — so the feasibility boundary
was measured on a workload with a different C than production, contradicting the
paper's scale-stationary L-C-A premise.

prepare_trace_windows now resolves each row's session root via the parent_chat_id
chain in a single streaming pass and assigns sampling_u per session, so thresholding
keeps or drops whole sessions and preserves intra-session prefix reuse. Rows whose
parent fell outside the span fall back to grouping under the parent id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-15 14:16:06 +08:00

prepare_trace_windows.py

Make the offered-load axis session-coherent

2026-06-15 14:16:06 +08:00

run_baseline_then_llm.py

Unify harness L-C-A on the canonical lca.WorkloadProfile

2026-06-15 14:12:17 +08:00

run_multi_compare.py

Harden trial measurement accounting

2026-05-06 21:18:09 +08:00