Files
agentic-pd-hybrid/scripts
Claude Code Agent f6d6dc01ea feat(cli): per-role --mem-fraction-static + use in E4-pressured
E4-v1 / v2 / pressured-v1 all failed to fire admission rejections in
this workload because the default 0.6 mem-fraction-static gives
288K-token kv_pool per decoder, more than enough to absorb the
50-session trace even at concurrency=32.

This commit adds:
  --decode-mem-fraction-static  (overrides per-decode SGLang arg)
  --prefill-mem-fraction-static (symmetric for completeness)

Plumbed via topology.{decode,prefill}_extra_server_args. The
pressured sweep now uses --decode-mem-fraction-static 0.4 which
shrinks decoder kv_pool to ~192K tokens — should force enough
admission rejections to actually exercise the D→P snapshot path.
2026-05-13 10:43:26 +08:00
..