agentic-pd-hybrid/scripts at 110bd68000c803e2a657f974af62e45860fa545c - agentic-pd-hybrid - Local Gitea

gahow/agentic-pd-hybrid

Files

History

Gahow Wang dbb9eee471 feat(analysis): paired comparison with bootstrap CI

Implements docs/EVALUATION_PROTOCOL_ZH.md §2.2 (M2 fix):
mechanism A vs B comparisons on the same trace must be
paired on same-trial-mask, with errors and aborts surfaced
rather than silently dropped.

How it differs from scripts/analysis/compare_no_error.py:
  - works on raw request-metrics.jsonl (not pre-aggregated
    summary.json) so it can recompute paired masks
  - reports 95% bootstrap CIs for mean / p50 / p90
  - exposes intersection size + per-side failure count in
    the intersection so the reader can see how many rows
    were dropped from the comparison and whether the
    candidate's win came from selection effects

stdlib only — random.Random for bootstrap, no scipy/numpy.
Default 2000 bootstrap iterations; seed is configurable
for reproducibility.

Verified locally on a synthetic 20-row pair (5s constant
delta + one candidate failure): correctly reports
paired_size=19, candidate_fail_in_common=1, mean delta
-5.000s, 19/0/0 win/loss/tie.

CLI:
  scripts/analysis/paired_compare.py \\
      --baseline outputs/run-dp/request-metrics.jsonl \\
      --candidate outputs/run-kvc/request-metrics.jsonl \\
      [--metric latency_s|ttft_s|tpot_s] \\
      [--bootstrap 5000] [--seed 42] [--json]

2026-05-12 23:57:57 +08:00

..

feat(analysis): paired comparison with bootstrap CI

2026-05-12 23:57:57 +08:00

convert_audit_to_trace.py

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

convert_inferact_to_trace.py

feat(scripts): cu12.8 env wrapper + Inferact trace converter

2026-05-12 00:10:06 +08:00

run_all_experiments.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

run_exp_a_pd_disagg.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

run_exp_b1_dp_colo_rr.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

run_exp_b2_dp_colo_cache_aware.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

run_exp_b_pd_colo.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

run_exp_c_kvcache_centric.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

sample_trace_subset.py

feat(experiments): E1 sweep on 50-session deterministic subset

2026-05-12 00:21:36 +08:00

setup_env.sh

feat(env): MC_TRANSFER_TIMEOUT=1800s default in setup_env + stack

2026-05-12 11:45:09 +08:00

smoke_test.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

sweep_backpressure_smoke.sh

feat(kvc): add backpressure smoke sweep + analyzer (and v6 p1 profile script)

2026-05-06 21:29:56 +08:00

sweep_e1_naive_1p3d.sh

feat(experiments): E1 sweep on 50-session deterministic subset

2026-05-12 00:21:36 +08:00

sweep_e2_kvc_v2_rdma.sh

feat(experiments): E2 sweep — KVC v2 + RDMA on the matched subset

2026-05-12 00:49:53 +08:00

sweep_e3_kvc_v2_loadfloor_rdma.sh

feat(experiments): E3 sweep — KVC v2 + RDMA + load-floor bonus

2026-05-12 11:45:09 +08:00

sweep_kvc_qwen3_30b.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

sweep_tp1_configs.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

sweep_tp1_v2_fixed.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

sweep_tp1_v3_kvaware.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

sweep_tp1_v4_cap16.sh

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

sweep_tp1_v5_baseline_rerun_exp2.sh

profile(kvc): rewrite v5+profile report after critic audit + P0/P1 instrument

2026-04-29 22:29:21 +08:00

sweep_tp1_v5_optD_profile.sh

profile(kvc): add D KV pool timeseries poller + analyzer for v6 root-cause

2026-04-29 20:04:21 +08:00

sweep_tp1_v5_optD.sh

feat(kvc): Option D - delegate seed/reseed admission to D worker

2026-04-28 23:40:03 +08:00

sweep_tp1_v6_p1_profile.sh

feat(kvc): add backpressure smoke sweep + analyzer (and v6 p1 profile script)

2026-05-06 21:29:56 +08:00

sweep_ts1_kvc_n3_plus_dp.sh

feat(kvc): session migration with reset-on-success + direct-append threshold tuning

2026-05-09 01:18:13 +08:00

sweep_ts1_migration_v1.sh

feat(kvc): session migration with reset-on-success + direct-append threshold tuning

2026-05-09 01:18:13 +08:00

sweep_ts1_migration_v2.sh

feat(kvc): session migration with reset-on-success + direct-append threshold tuning

2026-05-09 01:18:13 +08:00