KVC v2 beats 4DP at ts=1 same-scale on 7/8 metrics: TTFT mean -24%, p50 -54%, p90 -64%; lat mean -0.8%, p50 -12.6%, p90 -0.7%. Direct-to-D rate jumped 42.8% -> 91.7%. REFACTOR_PLAN_V1 scenario C achieved. Two-knob fix: - reset-on-success blacklist decay: clear (sess, D) reject counter on successful direct-to-D path. Eliminates v1 thrashing where session 6880 was stable on decode-1 for 70 turns then collapsed to 75 D-changes after cumulative transient pressure tripped the permanent blacklist. - bump --kvcache-direct-max-uncached-tokens default 2048 -> 8192 via CLI flag. 41% of v1 fallbacks were 'real-large-append' (>2048 token append); raising the threshold lets these go through the direct-to-D fast path. Code: - policies.py: RoutingState.session_d_rejects counter + KvAwarePolicy migration_reject_threshold; degenerate fallback picks least-rejected D. - replay.py: record_admission_reject + reset-on-success in _run_request; _fallthrough_reason classifies turn-2+ fall-throughs as session-not-resident / real-large-append / etc, replacing misleading 'large-append' suffix (TEAM_REPORT §2.7). - cli.py + benchmark.py: --kvcache-migration-reject-threshold flag wiring. Docs: - REFACTOR_PLAN_V1_ZH.md: forward-looking plan after ts=1 validation. - MIGRATION_V1_FINDINGS_ZH.md: v1 thrashing root-cause analysis. - V2_RESULTS_ZH.md: v2 results, scenario C achievement, attribution. - TEAM_REPORT_AGENTIC_PD_HYBRID_ZH.md: comprehensive team report. Scripts: - sweep_ts1_kvc_n3_plus_dp.sh: ts=1 baseline (KVC 1P3D N=3 + 4DP CA). - sweep_ts1_migration_v1.sh / v2.sh: validation runs. - analyze_ts1_validation.py: 4-way comparison analyzer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
66 lines
2.6 KiB
Bash
Executable File
66 lines
2.6 KiB
Bash
Executable File
#!/bin/bash
|
|
# Migration v1 validation: KVC 1P3D ts=1 with --kvcache-migration-reject-threshold=3
|
|
# Compare against baseline outputs/qwen3-30b-tp1-ts1-validation/kvc_1p3d_run{1,2,3}
|
|
# (all of which had no migration — runs were structurally identical).
|
|
#
|
|
# Goal: verify §1 fix changes the categorical outcome — direct-to-D % up,
|
|
# fallback-session-not-resident % down, lat mean down.
|
|
#
|
|
# ts=1 is deterministic at the categorical level, so N=1 is sufficient
|
|
# (TEAM_REPORT §2.8 revised).
|
|
set -euo pipefail
|
|
cd "$(dirname "$0")/.."
|
|
|
|
MODEL=/mnt/kzlin/workflow/pd-hybrid/simm-swe-bench/models/Qwen3-30B-A3B-Instruct-2507
|
|
TRACE=outputs/qwen35-swebench-50sess.jsonl
|
|
OUTPUT=outputs/qwen3-30b-tp1-ts1-migration-v1
|
|
VENV_PYTHON=.venv/bin/python
|
|
RESULTS_FILE=$OUTPUT/sweep_results.txt
|
|
|
|
mkdir -p $OUTPUT
|
|
|
|
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a $RESULTS_FILE; }
|
|
|
|
log "=== TS=1 MIGRATION v1: KVC 1P3D --kvcache-migration-reject-threshold=3 ==="
|
|
log "Baseline reference: outputs/qwen3-30b-tp1-ts1-validation/kvc_1p3d_run1 (errors=5, lat mean=1.574s, direct-to-D=42.8%)"
|
|
|
|
label=kvc_1p3d_migration_run1
|
|
log ""
|
|
log "=== [migration v1] starting ==="
|
|
PYTHONPATH=src:third_party/sglang/python \
|
|
$VENV_PYTHON -m agentic_pd_hybrid.cli benchmark-live \
|
|
--trace $TRACE \
|
|
--output-root $OUTPUT \
|
|
--mechanism kvcache-centric \
|
|
--policy kv-aware \
|
|
--model-path $MODEL \
|
|
--prefill-workers 1 --decode-workers 3 \
|
|
--prefill-tp-size 1 --decode-tp-size 1 \
|
|
--prefill-gpu-ids 0 --decode-gpu-ids 1,2,3 \
|
|
--transfer-backend mooncake \
|
|
--gpu-budget 4 \
|
|
--time-scale 1 \
|
|
--session-sample-rate 1.0 \
|
|
--target-duration-s 100000 \
|
|
--concurrency-limit 32 \
|
|
--timeout-s 900 \
|
|
--request-timeout-s 300 \
|
|
--kvcache-admission-mode worker \
|
|
--kvcache-seed-min-turn-id 1 \
|
|
--kvcache-seed-max-inflight-decode -1 \
|
|
--kvcache-prefill-backup-policy release-after-transfer \
|
|
--kvcache-prefill-priority-eviction \
|
|
--kvcache-migration-reject-threshold 3
|
|
|
|
run_dir=$(ls -td $OUTPUT/kvcache-centric-*/ 2>/dev/null | head -1)
|
|
log "=== [migration v1] $label COMPLETED ==="
|
|
if [ -f "$run_dir/request-metrics.jsonl.summary.json" ]; then
|
|
cp "$run_dir/request-metrics.jsonl.summary.json" "$OUTPUT/${label}_summary.json"
|
|
cp "$run_dir/request-metrics.jsonl" "$OUTPUT/${label}_metrics.jsonl"
|
|
errs=$($VENV_PYTHON -c "import json; d=json.load(open('$OUTPUT/${label}_summary.json')); print(d.get('error_count',0))")
|
|
p50=$($VENV_PYTHON -c "import json; d=json.load(open('$OUTPUT/${label}_summary.json')); print(d.get('latency_stats_s',{}).get('p50',0))")
|
|
log " errors=$errs lat_p50=${p50}s"
|
|
cat "$run_dir/request-metrics.jsonl.summary.json" >> $RESULTS_FILE
|
|
fi
|
|
log "=== migration v1 DONE ==="
|