#!/bin/bash # TP1 configuration sweep: 8-way DP, 1P7D KVC, 2P6D KVC # Qwen3-30B-A3B TP=1, single GPU per worker # Most aggressive KVC admission: inflight=-1 (off), seed-min-turn=1 set -euo pipefail cd "$(dirname "$0")/.." MODEL=/mnt/kzlin/workflow/pd-hybrid/simm-swe-bench/models/Qwen3-30B-A3B-Instruct-2507 TRACE=outputs/qwen35-swebench-50sess.jsonl OUTPUT=outputs/qwen3-30b-tp1-exps VENV_PYTHON=.venv/bin/python RESULTS_FILE=$OUTPUT/sweep_results.txt mkdir -p $OUTPUT log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a $RESULTS_FILE } save_result() { local label=$1 local run_dir=$2 log "=== $label COMPLETED ===" if [ -f "$run_dir/request-metrics.jsonl.summary.json" ]; then log "Summary:" cat "$run_dir/request-metrics.jsonl.summary.json" >> $RESULTS_FILE echo "" >> $RESULTS_FILE # Also copy summary to a named file for easy access cp "$run_dir/request-metrics.jsonl.summary.json" "$OUTPUT/${label}_summary.json" log "Saved to $OUTPUT/${label}_summary.json" else log "WARNING: No summary file found in $run_dir" fi } log "Starting TP1 configuration sweep" log "Model: $MODEL" log "Trace: $TRACE (4449 requests, 52 sessions)" ######################################## # Experiment 1: 8-way DP cache-aware ######################################## log "" log "=== [EXP1] 8-way DP cache-aware (8 direct × TP1) ===" PYTHONPATH=src:third_party/sglang/python \ $VENV_PYTHON -m agentic_pd_hybrid.cli benchmark-live \ --trace $TRACE \ --output-root $OUTPUT \ --mechanism pd-colo \ --policy kv-aware \ --model-path $MODEL \ --prefill-workers 0 --decode-workers 0 \ --direct-workers 8 --direct-tp-size 1 \ --direct-gpu-ids 0,1,2,3,4,5,6,7 \ --gpu-budget 8 \ --time-scale 10 \ --session-sample-rate 1.0 \ --target-duration-s 100000 \ --concurrency-limit 32 \ --timeout-s 900 \ --request-timeout-s 300 # Find latest run dir for this experiment EXP1_DIR=$(ls -td $OUTPUT/pd-colo-kv-aware-*/ 2>/dev/null | head -1) save_result "exp1_8way_dp_cache_aware" "$EXP1_DIR" ######################################## # Experiment 2: 1P + 7D KVC (most aggressive) ######################################## log "" log "=== [EXP2] 1P7D KVC (inflight=off, min-turn=1) ===" PYTHONPATH=src:third_party/sglang/python \ $VENV_PYTHON -m agentic_pd_hybrid.cli benchmark-live \ --trace $TRACE \ --output-root $OUTPUT \ --mechanism kvcache-centric \ --policy default \ --model-path $MODEL \ --prefill-workers 1 --decode-workers 7 \ --prefill-tp-size 1 --decode-tp-size 1 \ --prefill-gpu-ids 0 --decode-gpu-ids 1,2,3,4,5,6,7 \ --transfer-backend mooncake \ --gpu-budget 8 \ --time-scale 10 \ --session-sample-rate 1.0 \ --target-duration-s 100000 \ --concurrency-limit 32 \ --timeout-s 900 \ --request-timeout-s 300 \ --kvcache-admission-mode worker \ --kvcache-seed-min-turn-id 1 \ --kvcache-seed-max-inflight-decode -1 \ --kvcache-prefill-backup-policy release-after-transfer \ --kvcache-prefill-priority-eviction EXP2_DIR=$(ls -td $OUTPUT/kvcache-centric-*/ 2>/dev/null | head -1) save_result "exp2_1p7d_kvc_aggressive" "$EXP2_DIR" ######################################## # Experiment 3: 2P + 6D KVC (most aggressive) ######################################## log "" log "=== [EXP3] 2P6D KVC (inflight=off, min-turn=1) ===" PYTHONPATH=src:third_party/sglang/python \ $VENV_PYTHON -m agentic_pd_hybrid.cli benchmark-live \ --trace $TRACE \ --output-root $OUTPUT \ --mechanism kvcache-centric \ --policy default \ --model-path $MODEL \ --prefill-workers 2 --decode-workers 6 \ --prefill-tp-size 1 --decode-tp-size 1 \ --prefill-gpu-ids 0,1 --decode-gpu-ids 2,3,4,5,6,7 \ --transfer-backend mooncake \ --gpu-budget 8 \ --time-scale 10 \ --session-sample-rate 1.0 \ --target-duration-s 100000 \ --concurrency-limit 32 \ --timeout-s 900 \ --request-timeout-s 300 \ --kvcache-admission-mode worker \ --kvcache-seed-min-turn-id 1 \ --kvcache-seed-max-inflight-decode -1 \ --kvcache-prefill-backup-policy release-after-transfer \ --kvcache-prefill-priority-eviction EXP3_DIR=$(ls -td $OUTPUT/kvcache-centric-*/ 2>/dev/null | head -1) save_result "exp3_2p6d_kvc_aggressive" "$EXP3_DIR" ######################################## log "" log "=== ALL TP1 SWEEP EXPERIMENTS DONE ==="