feat(experiments): expose PREFILL_MEM_FRAC + plumb --prefill-mem-fraction-static

v7 with --decode-mem-fraction-static=0.8 + SGLANG_SNAPSHOT_LINK_BUF_BYTES=16GB
silently fell back to 1 GB snapshot_buf because Prefill (mem-fraction
default 0.88) left only 10.8 GB free on GPU 0. Reducing prefill
mem-fraction lets 16 GB snapshot_buf fit.
This commit is contained in:
Claude Code Agent
2026-05-13 15:31:40 +08:00
parent 5c09a3a0cb
commit 9cca2c60c9

View File

@@ -104,6 +104,7 @@ uv run --no-sync python -m agentic_pd_hybrid.cli benchmark-live \
--kvcache-direct-max-uncached-tokens 8192 \ --kvcache-direct-max-uncached-tokens 8192 \
--kvcache-load-floor-bonus "$LOAD_FLOOR_BONUS" \ --kvcache-load-floor-bonus "$LOAD_FLOOR_BONUS" \
--decode-mem-fraction-static "${DECODE_MEM_FRAC:-0.4}" \ --decode-mem-fraction-static "${DECODE_MEM_FRAC:-0.4}" \
--prefill-mem-fraction-static "${PREFILL_MEM_FRAC:-0.7}" \
--enable-d-to-p-sync 2>&1 | tee -a "$LOG" --enable-d-to-p-sync 2>&1 | tee -a "$LOG"
run_dir=$(ls -td "$OUTPUT"/kvcache-centric-*/ 2>/dev/null | head -1) run_dir=$(ls -td "$OUTPUT"/kvcache-centric-*/ 2>/dev/null | head -1)