feat(experiments): expose PREFILL_MEM_FRAC + plumb --prefill-mem-fraction-static
v7 with --decode-mem-fraction-static=0.8 + SGLANG_SNAPSHOT_LINK_BUF_BYTES=16GB silently fell back to 1 GB snapshot_buf because Prefill (mem-fraction default 0.88) left only 10.8 GB free on GPU 0. Reducing prefill mem-fraction lets 16 GB snapshot_buf fit.
This commit is contained in:
@@ -104,6 +104,7 @@ uv run --no-sync python -m agentic_pd_hybrid.cli benchmark-live \
|
||||
--kvcache-direct-max-uncached-tokens 8192 \
|
||||
--kvcache-load-floor-bonus "$LOAD_FLOOR_BONUS" \
|
||||
--decode-mem-fraction-static "${DECODE_MEM_FRAC:-0.4}" \
|
||||
--prefill-mem-fraction-static "${PREFILL_MEM_FRAC:-0.7}" \
|
||||
--enable-d-to-p-sync 2>&1 | tee -a "$LOG"
|
||||
|
||||
run_dir=$(ls -td "$OUTPUT"/kvcache-centric-*/ 2>/dev/null | head -1)
|
||||
|
||||
Reference in New Issue
Block a user