agentic-kvc/analysis/characterization/protocols.md

# Characterization Protocols For Remaining Batches

Status: implementation protocol and audit checklist
Date: 2026-05-25

This file completes the `analysis/characterization` scaffold for the TODO
list. It separates what is already implemented from what requires fresh GPU
runs or new engine/proxy instrumentation.

## Implemented Now

### Batch 0/1 Analyzer

Use:

```bash
python3 analysis/characterization/analyze.py \
  --trace traces/w600_r0.0015_st30.jsonl \
  --kv-bytes-per-token 98304 \
  --task-name w600_local_full_trace \
  --overwrite
```

The analyzer writes:

- `manifest.json`
- `summary.json`
- `summary.md`
- `audit.md`
- `session_concurrency.json`
- `session_arrival_stats.json`
- `turn_interval_stats.json`
- `trace_profile.json`
- `workload_summary.json`
- `kv_footprint_summary.json`
- `reuse_decomposition.json`
- `session_skew.json`
- `append_delta_stats.json`

Limitations:

- Actual online sequentiality requires dispatch and finish/error timestamps.
  Existing `metrics.jsonl` artifacts generally do not contain these fields.
- Actual reuse decomposition requires `cached_tokens`/`cache_hit`, `hash_ids`,
  and `session_id` in the same joinable request record.

### Existing-Run Audit

Use:

```bash
python3 analysis/characterization/summarize_runs.py
```

The script writes an audit package under:

```text
analysis/characterization/current_results/
```

It summarizes already completed runs and explicitly marks which claims are
supported, partially supported, or not yet supported.

## Batch 2 Protocol: PD-Colo Prefill/Decode Interference

Purpose:

Prove whether same-worker prefill overlap increases decode TPOT/queue delay.

Required new instrumentation:

- per-request dispatch timestamp
- per-request finish/error timestamp
- per decode step timestamp
- decode step worker id
- prefill chunk start/end timestamp
- prefill worker id
- request/session id associated with each prefill chunk

Required arms:

1. decode-only steady load
2. decode + same-worker heavy prefill injection
3. decode + different-worker heavy prefill injection
4. trace replay with overlap labels

Required sweep:

```text
uncached_prefill_tokens in {2k, 8k, 16k, 32k, 64k}
chunked_prefill_size in available engine values
```

Required outputs:

- `interference_microbench_summary.json`
- `decode_step_timeseries.csv`
- `prefill_overlap_events.jsonl`
- `interference_index.json`
- TPOT timeline figure with prefill overlays
- same-worker vs different-worker TPOT boxplot

Pass condition:

```text
TPOT_p90(overlap_same_worker) / TPOT_p90(no_overlap) > 1
```

and the effect must be materially weaker in the different-worker control.

## Batch 3 Protocol: Session Hot-Spot Residual Imbalance

Purpose:

Prove whether cache-aware/LMetric still leaves hot workers under
session-heavy skew.

Required new instrumentation:

- route decision per request
- chosen worker
- candidate worker scores
- cache hit / estimated uncached tokens per candidate
- per-worker request queue length/delay
- per-worker decode queue length/delay
- per-worker KV occupancy
- per-worker APC/cache-hit snapshot

Required arms:

1. corrected LMetric/cache-aware
2. load-only routing
3. hard sticky routing
4. current Unified hybrid
5. session-mass capped/equalized replay

Required outputs:

- `worker_balance_summary.json`
- `session_to_worker_map.json`
- `session_mass_summary.json`
- `routing_policy_comparison.json`
- `hotspot_index.json`
- per-worker queue delay bar
- APC vs queue delay scatter
- top-session contribution bar
- policy tradeoff plot: APC vs hot-spot index

Pass condition:

LMetric/cache-aware must show measurable residual worker skew, and that skew
must correlate with session token mass or locality.

GPU utilization alone is not enough for this claim.

## Batch 4 Protocol: Sustainable Request Rate

Purpose:

Measure:

```text
SRR(SLO) = max arrival rate satisfying SLO in steady state
```

Required load generator behavior:

- open-loop session arrivals, preferably Poisson
- session-internal sequentiality
- warmup window
- steady-state measurement window
- explicit attempted/completed/error counters

Provisional SLO:

```text
TTFT_p90 <= T_ttft
E2E_p90  <= T_e2e
TPOT_p90 <= T_tpot
error_rate <= epsilon
queue length stable
KV occupancy stable
```

Required arms:

1. PD-colo corrected LMetric/cache-aware
2. static PD-disagg
3. current Unified hybrid
4. optional hard sticky
5. optional load-only

Required outputs:

- `srr_curve.json`
- `lambda_runs/<lambda>/summary.json`
- `slo_violation_reason.json`
- `goodput_vs_arrival_rate.json`
- SRR bar chart
- latency vs arrival rate curves
- goodput vs arrival rate
- queue/KV stability plot near failure point

Pass condition:

Each policy has a measured max sustainable lambda under the same SLO and
same session-causal arrival process.

## Batch 5 Protocol: Failure Attribution Near SRR Boundary

Purpose:

Explain why each policy fails near SRR.

Required rates:

```text
lambda = 0.9 * SRR
lambda = 1.0 * SRR
lambda = 1.1 * SRR
```

Labels for each slow/SLO-violating request:

- same-worker prefill overlap
- hot worker queue
- high KV occupancy
- cache miss / large uncached append
- transfer wait
- P queue wait
- D admission wait
- unknown

Required outputs:

- `slow_request_attribution.jsonl`
- `failure_breakdown.json`
- `case_studies.md`
- `worker_failure_windows.json`
- violation cause stacked bar
- slow request waterfall
- worker timeline near failure

Pass condition:

The analysis must explain whether PD-colo is limited by interference,
hot-spot, KV pressure, or a mixture, and whether Unified/PUSH underperforms
because of trigger quality, transfer cost, target admission, or load regime.

## Batch 6 Protocol: Audit Package

Implemented by `summarize_runs.py` for existing runs and extended by fresh
Batch 2-5 outputs later.

Required files:

- `characterization_claim_matrix.md`
- `all_figures_index.md`
- `reviewer_risk_register.md`
- `reproduction_commands.sh`
- `main_claim_allowed_runs.md`

Current package intentionally marks Batch 2/4/5 claims as not yet supported
until fresh instrumented experiments exist.