Commit Graph

4 Commits

Author SHA1 Message Date
b1b74318f6 Pin 27B A/B to GPUs 2-7 (route around leaked GPU0/1 memory)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 23:01:22 +08:00
3541065675 Speed up 27B TP A/B: request_timeout 180s, search.high 0.125
The wide 0.5 range made TP1 (low-capacity) waste many infeasible high-theta probes,
and the 900s request timeout made overloaded probes drain hung requests for 15min
each. Cap drain at 180s and bound the search to where the boundaries actually are.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:40:42 +08:00
7678c7d5e8 Switch 27B TP A/B to length-aware TTFT SLO (4s + L_in/8k), widen search
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 20:35:23 +08:00
4f45b546a1 Add 27B TP A/B (deterministic ground-truth: does TP2 beat TP1 per-GPU)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 18:39:54 +08:00