Files
agentic-kvc/scripts
Gahow Wang 85b230455e H7 OVERLOAD_FACTOR sweep: negative result + H4 GPU profiling
H7: Sweeping OVERLOAD_FACTOR (2.0/1.5/1.3/1.0) has no effect on GPU
imbalance (~3.5-4x across all settings). Root cause: imbalance is from
workload skew at session placement (turn 1), not from routing at turn 2+.

H4 GPU profiling confirms: GPU balance improvement IS real (4.0x→2.0x),
and it directly improves HEAVY_COLO TTFT by 10.5%. But RDMA-offloaded
requests have bimodal transfer times (0.6s or 18-31s) that negate the
routing benefit.

Updated elastic_hypotheses.md with H7 results and next directions:
higher load experiments where contention amplifies routing differences.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-23 03:04:02 +08:00
..