Files
agentic-pd-hybrid/scripts/sweep_ts1_migration_v2.sh
kzlin 2ec0debef4 feat(kvc): session migration with reset-on-success + direct-append threshold tuning
KVC v2 beats 4DP at ts=1 same-scale on 7/8 metrics:
  TTFT mean -24%, p50 -54%, p90 -64%; lat mean -0.8%, p50 -12.6%, p90 -0.7%.
  Direct-to-D rate jumped 42.8% -> 91.7%. REFACTOR_PLAN_V1 scenario C achieved.

Two-knob fix:
- reset-on-success blacklist decay: clear (sess, D) reject counter on
  successful direct-to-D path. Eliminates v1 thrashing where session 6880
  was stable on decode-1 for 70 turns then collapsed to 75 D-changes after
  cumulative transient pressure tripped the permanent blacklist.
- bump --kvcache-direct-max-uncached-tokens default 2048 -> 8192 via CLI flag.
  41% of v1 fallbacks were 'real-large-append' (>2048 token append); raising
  the threshold lets these go through the direct-to-D fast path.

Code:
- policies.py: RoutingState.session_d_rejects counter + KvAwarePolicy
  migration_reject_threshold; degenerate fallback picks least-rejected D.
- replay.py: record_admission_reject + reset-on-success in _run_request;
  _fallthrough_reason classifies turn-2+ fall-throughs as session-not-resident
  / real-large-append / etc, replacing misleading 'large-append' suffix
  (TEAM_REPORT §2.7).
- cli.py + benchmark.py: --kvcache-migration-reject-threshold flag wiring.

Docs:
- REFACTOR_PLAN_V1_ZH.md: forward-looking plan after ts=1 validation.
- MIGRATION_V1_FINDINGS_ZH.md: v1 thrashing root-cause analysis.
- V2_RESULTS_ZH.md: v2 results, scenario C achievement, attribution.
- TEAM_REPORT_AGENTIC_PD_HYBRID_ZH.md: comprehensive team report.

Scripts:
- sweep_ts1_kvc_n3_plus_dp.sh: ts=1 baseline (KVC 1P3D N=3 + 4DP CA).
- sweep_ts1_migration_v1.sh / v2.sh: validation runs.
- analyze_ts1_validation.py: 4-way comparison analyzer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:18:13 +08:00

77 lines
3.4 KiB
Bash
Executable File

#!/bin/bash
# Migration v2 validation: KVC 1P3D ts=1 with BOTH:
# (1) reset-on-success blacklist decay (replay.py code change)
# (2) --kvcache-direct-max-uncached-tokens 8192 (was 2048 default)
#
# v1 results (kvc_1p3d_migration_run1) showed:
# - lat mean WORSE +11.7%, TTFT mean WORSE +71.3% — thrashing tax
# - direct-to-D rate UP +10.5pp (42.8 → 53.3%)
# - Fallback breakdown surprise: 41.3% are 'real-large-append' (>2048 tok),
# NOT 'session-not-resident' as we hypothesized
#
# v2 design (REFACTOR_PLAN_V1 + MIGRATION_V1_FINDINGS):
# (1) reset-on-success: clear (sess,D) reject counter on successful direct-to-D
# — eliminates blacklist-permanence bug → kills thrashing
# (2) bump direct-append threshold 2048 → 8192: lets more large-append turns
# go direct-to-D instead of fall through to seed (which often rejects)
set -euo pipefail
cd "$(dirname "$0")/.."
MODEL=/mnt/kzlin/workflow/pd-hybrid/simm-swe-bench/models/Qwen3-30B-A3B-Instruct-2507
TRACE=outputs/qwen35-swebench-50sess.jsonl
OUTPUT=outputs/qwen3-30b-tp1-ts1-migration-v2
VENV_PYTHON=.venv/bin/python
RESULTS_FILE=$OUTPUT/sweep_results.txt
mkdir -p $OUTPUT
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a $RESULTS_FILE; }
log "=== TS=1 MIGRATION v2: reset-on-success + threshold=8192 ==="
log "Baselines:"
log " baseline (no migration): kvc_1p3d_run1 errors=5 lat_p50=0.811s ttft_p50=0.124s direct=42.8%"
log " v1 (migration permanent): kvc_1p3d_migration_run1 errors=6 lat_p50=0.773s ttft_p50=0.057s direct=53.3% lat_mean=1.758s"
log " 4DP ts=1: errors=0 lat_p50=0.659s ttft_p50=0.090s lat_mean=1.443s"
log "Goal: kill thrashing tax (lat_mean ≤ 1.5s, p99 ≤ 9s) while preserving v1's direct-to-D gains."
label=kvc_1p3d_migration_v2_run1
log ""
log "=== [migration v2] starting ==="
PYTHONPATH=src:third_party/sglang/python \
$VENV_PYTHON -m agentic_pd_hybrid.cli benchmark-live \
--trace $TRACE \
--output-root $OUTPUT \
--mechanism kvcache-centric \
--policy kv-aware \
--model-path $MODEL \
--prefill-workers 1 --decode-workers 3 \
--prefill-tp-size 1 --decode-tp-size 1 \
--prefill-gpu-ids 0 --decode-gpu-ids 1,2,3 \
--transfer-backend mooncake \
--gpu-budget 4 \
--time-scale 1 \
--session-sample-rate 1.0 \
--target-duration-s 100000 \
--concurrency-limit 32 \
--timeout-s 900 \
--request-timeout-s 300 \
--kvcache-admission-mode worker \
--kvcache-seed-min-turn-id 1 \
--kvcache-seed-max-inflight-decode -1 \
--kvcache-prefill-backup-policy release-after-transfer \
--kvcache-prefill-priority-eviction \
--kvcache-migration-reject-threshold 3 \
--kvcache-direct-max-uncached-tokens 8192
run_dir=$(ls -td $OUTPUT/kvcache-centric-*/ 2>/dev/null | head -1)
log "=== [migration v2] $label COMPLETED ==="
if [ -f "$run_dir/request-metrics.jsonl.summary.json" ]; then
cp "$run_dir/request-metrics.jsonl.summary.json" "$OUTPUT/${label}_summary.json"
cp "$run_dir/request-metrics.jsonl" "$OUTPUT/${label}_metrics.jsonl"
errs=$($VENV_PYTHON -c "import json; d=json.load(open('$OUTPUT/${label}_summary.json')); print(d.get('error_count',0))")
p50=$($VENV_PYTHON -c "import json; d=json.load(open('$OUTPUT/${label}_summary.json')); print(d.get('latency_stats_s',{}).get('p50',0))")
log " errors=$errs lat_p50=${p50}s"
cat "$run_dir/request-metrics.jsonl.summary.json" >> $RESULTS_FILE
fi
log "=== migration v2 DONE ==="