Gahow Wang 51a9e4a007 Add Stop-A: offered-L-C-A convergence early-stop for replay
Phase 2 of the two-stop work. The L-C-A vector is a deterministic function of the
trace's offered metadata, so the convergence of prefix-vs-full L-C-A (the paper's
Fig. 9 curve) can be computed up front rather than monitored live, with identical
result and no per-request overhead.

- lca.find_convergence_prefix: earliest arrival-ordered prefix whose L and A family
  similarities reach tau and the slow C family reaches the stricter tau_c for
  stable_checks consecutive checkpoints. Self-similarity uses the raw log-feature
  vector (same window -> identical per-dim spread; RobustScaler is reserved for the
  cross-window Stop-C). If C never converges it reports the full set, which is the
  C-gate: no early stop on a cold/under-warmed cache. The checkpoint sims double as
  Phase 3 calibration data.
- spec.AdaptiveStopSpec (trace.adaptive_stop), disabled by default until the
  thresholds are calibrated, so existing studies are unaffected.
- worker._adaptive_replay_set truncates each probe's replay to the convergence
  prefix and records a certificate (converged, fraction, family similarity) into
  probe history and probe_details. Offered request_rate at the threshold is
  unchanged; only wall-clock replay shrinks.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 14:23:49 +08:00
2026-04-13 20:50:39 +08:00
2026-04-25 16:18:28 +08:00
2026-05-06 21:18:21 +08:00
2026-05-06 21:18:21 +08:00
2026-05-06 21:18:21 +08:00

AITuner

AITuner is a small study orchestrator for OpenAI-compatible serving engines. It replays trace windows, searches for the highest feasible offered load under configured SLOs, and records enough trial context for LLM- or harness-guided configuration proposals.

Status

This repository is research tooling. Treat reported experiment numbers as valid only when the matching study spec, trial artifacts, probe history, and probe_details.jsonl files are available for audit.

Install

python3 -m pip install -e .

Test

The test suite uses the Python standard library unittest runner:

PYTHONPATH=src python3 -m unittest discover -s tests -v

If the package is installed in editable mode, PYTHONPATH=src is optional.

Basic Workflow

Initialize a study:

aituner study init --spec configs/examples/study.example.json

Run a local tuning loop:

aituner study tune --spec configs/examples/study.example.json --max-trials 2

Run a compare:

aituner compare run --spec configs/examples/compare.example.json

Remote experiment notes for this checkout live in AGENTS.md. The default remote host is dash0, and code should be synchronized through Git before remote runs.

Experiment Integrity

  • Fixed-length replay requests are scored only when completion token usage is verifiable and matches the trace expectation.
  • Each trial writes aggregate probe history and per-request probe details.
  • request_rate_per_gpu is the primary cross-topology metric: best_feasible_request_rate / (tensor_parallel_size * data_parallel_size).
  • Compare reports include failed and no-feasible window counts; do not interpret mean request rates without those counts.
  • Bounded replays using max_requests_per_probe, completion_tokens_override, or replay_time_scale are convergence tests for that bounded workload, not production benchmarks.

Configuration Notes

Example specs that use llm.endpoint.provider=codex resolve the endpoint from the local Codex configuration unless llm.endpoint.base_url or AITUNER_CODEX_BASE_URL is set. Public, reproducible examples should prefer an explicit endpoint or omit the LLM endpoint and use proposal files.

Description
No description provided
Readme MIT 6.6 MiB
Languages
Python 98.4%
Shell 1.6%