Files
agentic-pd-hybrid/scripts/sweep_ts1_kvc_n3_plus_dp.sh
kzlin 2ec0debef4 feat(kvc): session migration with reset-on-success + direct-append threshold tuning
KVC v2 beats 4DP at ts=1 same-scale on 7/8 metrics:
  TTFT mean -24%, p50 -54%, p90 -64%; lat mean -0.8%, p50 -12.6%, p90 -0.7%.
  Direct-to-D rate jumped 42.8% -> 91.7%. REFACTOR_PLAN_V1 scenario C achieved.

Two-knob fix:
- reset-on-success blacklist decay: clear (sess, D) reject counter on
  successful direct-to-D path. Eliminates v1 thrashing where session 6880
  was stable on decode-1 for 70 turns then collapsed to 75 D-changes after
  cumulative transient pressure tripped the permanent blacklist.
- bump --kvcache-direct-max-uncached-tokens default 2048 -> 8192 via CLI flag.
  41% of v1 fallbacks were 'real-large-append' (>2048 token append); raising
  the threshold lets these go through the direct-to-D fast path.

Code:
- policies.py: RoutingState.session_d_rejects counter + KvAwarePolicy
  migration_reject_threshold; degenerate fallback picks least-rejected D.
- replay.py: record_admission_reject + reset-on-success in _run_request;
  _fallthrough_reason classifies turn-2+ fall-throughs as session-not-resident
  / real-large-append / etc, replacing misleading 'large-append' suffix
  (TEAM_REPORT §2.7).
- cli.py + benchmark.py: --kvcache-migration-reject-threshold flag wiring.

Docs:
- REFACTOR_PLAN_V1_ZH.md: forward-looking plan after ts=1 validation.
- MIGRATION_V1_FINDINGS_ZH.md: v1 thrashing root-cause analysis.
- V2_RESULTS_ZH.md: v2 results, scenario C achievement, attribution.
- TEAM_REPORT_AGENTIC_PD_HYBRID_ZH.md: comprehensive team report.

Scripts:
- sweep_ts1_kvc_n3_plus_dp.sh: ts=1 baseline (KVC 1P3D N=3 + 4DP CA).
- sweep_ts1_migration_v1.sh / v2.sh: validation runs.
- analyze_ts1_validation.py: 4-way comparison analyzer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:18:13 +08:00

147 lines
5.6 KiB
Bash
Executable File
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#!/bin/bash
# Time-scale=1 validation sweep, downscaled to 4 GPUs:
# - KVC v5 1P3D × N=3 (new data, validates §1/§2 structural claims at real timing)
# - 4-way DP cache-aware × 1 (sanity baseline at same scale + ts=1)
#
# Goal: per docs/AGENTIC_FIT_ANALYSIS_ZH.md §7 / TEAM_REPORT §2.6 — all v3-v6 KVC
# data was at time-scale=10 (inter-turn gap p50 = 0.25s, vs real 2.5s). This run
# tests whether the gap structurally reverses any conclusion.
#
# CONFIG NOTE: Original experiments used 8 GPUs (2P6D / 8-way DP). This host has
# only 4 H100s available, so we downscale proportionally to 1P3D / 4-way DP.
# Cross-compare against existing 2P6D ts=10 data is confounded by *both*
# time-scale and capacity. Internal comparison (1P3D KVC vs 4DP) at ts=1 is the
# clean signal. §5 (P-side imbalance) is NOT testable here — only 1 P.
#
# Capacity ratio: 3D × ~92K tok = 276K KV pool vs 52 sessions × ~50K peak input
# working set ≈ 1.5M → ~5.4× overload (vs 2.7× in original 2P6D).
# Pressure is HIGHER than original; partly offset by ts=1 letting D drain between turns.
#
# Output:
# outputs/qwen3-30b-tp1-ts1-validation/
# ├── kvc_1p3d_run{1,2,3}_summary.json
# ├── kvc_1p3d_run{1,2,3}_metrics.jsonl
# ├── dp4_summary.json
# ├── dp4_metrics.jsonl
# └── kvcache-centric-... / pd-colo-kv-aware-... (raw run dirs)
#
# Estimated GPU time: KVC ts=1 ≈ 100-180 min/run × 3 = 5-9h
# DP ts=1 ≈ 100-120 min × 1 = ~2h
# Total = 7-11h
set -euo pipefail
cd "$(dirname "$0")/.."
MODEL=/mnt/kzlin/workflow/pd-hybrid/simm-swe-bench/models/Qwen3-30B-A3B-Instruct-2507
TRACE=outputs/qwen35-swebench-50sess.jsonl
OUTPUT=outputs/qwen3-30b-tp1-ts1-validation
VENV_PYTHON=.venv/bin/python
RESULTS_FILE=$OUTPUT/sweep_results.txt
mkdir -p $OUTPUT
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a $RESULTS_FILE
}
run_kvc_1p3d() {
local run_idx=$1
local label="kvc_1p3d_run${run_idx}"
log ""
log "=== [KVC ${run_idx}/3] 1P3D KVC kv-aware Option D, time-scale=1 ==="
PYTHONPATH=src:third_party/sglang/python \
$VENV_PYTHON -m agentic_pd_hybrid.cli benchmark-live \
--trace $TRACE \
--output-root $OUTPUT \
--mechanism kvcache-centric \
--policy kv-aware \
--model-path $MODEL \
--prefill-workers 1 --decode-workers 3 \
--prefill-tp-size 1 --decode-tp-size 1 \
--prefill-gpu-ids 0 --decode-gpu-ids 1,2,3 \
--transfer-backend mooncake \
--gpu-budget 4 \
--time-scale 1 \
--session-sample-rate 1.0 \
--target-duration-s 100000 \
--concurrency-limit 32 \
--timeout-s 900 \
--request-timeout-s 300 \
--kvcache-admission-mode worker \
--kvcache-seed-min-turn-id 1 \
--kvcache-seed-max-inflight-decode -1 \
--kvcache-prefill-backup-policy release-after-transfer \
--kvcache-prefill-priority-eviction
local run_dir=$(ls -td $OUTPUT/kvcache-centric-*/ 2>/dev/null | head -1)
log "=== [KVC ${run_idx}/3] $label COMPLETED ==="
if [ -f "$run_dir/request-metrics.jsonl.summary.json" ]; then
cp "$run_dir/request-metrics.jsonl.summary.json" "$OUTPUT/${label}_summary.json"
cp "$run_dir/request-metrics.jsonl" "$OUTPUT/${label}_metrics.jsonl"
local errs=$($VENV_PYTHON -c "import json; d=json.load(open('$OUTPUT/${label}_summary.json')); print(d.get('error_count',0))")
log " errors = $errs"
cat "$run_dir/request-metrics.jsonl.summary.json" >> $RESULTS_FILE
echo "" >> $RESULTS_FILE
else
log "WARNING: no summary file in $run_dir"
fi
}
run_dp4_sanity() {
local label="dp4"
log ""
log "=== [DP] 4-way DP cache-aware sanity, time-scale=1 ==="
PYTHONPATH=src:third_party/sglang/python \
$VENV_PYTHON -m agentic_pd_hybrid.cli benchmark-live \
--trace $TRACE \
--output-root $OUTPUT \
--mechanism pd-colo \
--policy kv-aware \
--model-path $MODEL \
--prefill-workers 0 --decode-workers 0 \
--direct-workers 4 --direct-tp-size 1 \
--direct-gpu-ids 0,1,2,3 \
--gpu-budget 4 \
--time-scale 1 \
--session-sample-rate 1.0 \
--target-duration-s 100000 \
--concurrency-limit 32 \
--timeout-s 900 \
--request-timeout-s 300
local run_dir=$(ls -td $OUTPUT/pd-colo-kv-aware-*/ 2>/dev/null | head -1)
log "=== [DP] $label COMPLETED ==="
if [ -f "$run_dir/request-metrics.jsonl.summary.json" ]; then
cp "$run_dir/request-metrics.jsonl.summary.json" "$OUTPUT/${label}_summary.json"
cp "$run_dir/request-metrics.jsonl" "$OUTPUT/${label}_metrics.jsonl"
local errs=$($VENV_PYTHON -c "import json; d=json.load(open('$OUTPUT/${label}_summary.json')); print(d.get('error_count',0))")
log " errors = $errs"
cat "$run_dir/request-metrics.jsonl.summary.json" >> $RESULTS_FILE
echo "" >> $RESULTS_FILE
else
log "WARNING: no summary file in $run_dir"
fi
}
log "=== TS=1 VALIDATION (4-GPU): KVC 1P3D × N=3 + 4DP × 1 ==="
log "Model: $MODEL"
log "Trace: $TRACE (4449 requests, 52 sessions)"
log "Goal: validate whether ts=10 was the main distortion in v3-v6 KVC vs DP"
# KVC × 3 first (the new data we need); DP last (cheaper sanity at end)
for i in 1 2 3; do
run_kvc_1p3d $i
done
run_dp4_sanity
log ""
log "=== TS=1 SUMMARY ==="
for label in kvc_1p3d_run1 kvc_1p3d_run2 kvc_1p3d_run3 dp4; do
if [ -f "$OUTPUT/${label}_summary.json" ]; then
e=$($VENV_PYTHON -c "import json; d=json.load(open('$OUTPUT/${label}_summary.json')); print(d.get('error_count',0))")
p50=$($VENV_PYTHON -c "import json; d=json.load(open('$OUTPUT/${label}_summary.json')); print(d.get('latency_stats_s',{}).get('p50','n/a'))")
log " ${label}: errors=$e lat_p50=${p50}s"
fi
done
log "=== TS=1 ALL DONE ==="