agentic-pd-hybrid

Files

kzlin 5eac9b4f6b fix(metrics): exclude aborted requests from latency/ttft/tpot stats

The old filter `if row.latency_s is not None` accepted SGLang's fast
input-length-aborts (latency_s ~ 0.08s, finish_reason='abort/BadRequest')
as if they were successful zero-cost requests. This deflated mean/p50
of any run where the model rejected oversized inputs.

Impact on existing comparisons (ts=1 4-run validation + v2):
  KVC v2 has 40 aborts + 5 ReadTimeouts (was reported as just 5);
  DP 4w  has 67 aborts (was reported as 5).
Both runs have abort behavior; the asymmetry (40 vs 67) is purely from
SGLang's mem-fraction-derived max-input-len: KVC decode-only worker gets
~10 GB free GPU mem -> max-input=92098, DP fused worker gets ~9 GB ->
max-input=87811, because DP also needs chunked-prefill workspace.

The KVC-vs-DP latency-win direction holds and widens slightly under the
fixed filter (lat mean delta: -0.8% -> -1.4%); see V2_DEEP_ANALYSIS_ZH
§4.3 for the recomputed table.

Changes:
- metrics.py: new _is_failed_request(row) helper; latency/ttft/tpot
  stats now exclude both errors and aborts. New summary fields
  abort_count and failure_count expose the counts directly.
- scripts/analysis/recompute_summary.py: re-derives summary.json from
  existing metrics.jsonl using the fixed code, with optional --diff
  against the old buggy summary for inspection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-11 17:29:18 +08:00

__init__.py

feat: add agentic pd hybrid benchmark prototype

2026-04-24 12:17:46 +00:00

benchmark.py

feat(kvc): session migration with reset-on-success + direct-append threshold tuning

2026-05-09 01:18:13 +08:00

cli.py

feat(kvc): session migration with reset-on-success + direct-append threshold tuning

2026-05-09 01:18:13 +08:00

launcher.py

docs: KVC v1-v4 debug journey + raise session soft_cap to 16