Gahow Wang 4bf0b999ff Final GPU comparison: hybrid routing matches baseline latency with better APC
Complete 200-req comparison with GPU monitoring:

Config                       TTFT50  TPOT90  E2E50  GPU%  Active  APC
Combined (old cache-aware)    1.012   0.073  5.101  30.5%   64%   44.7%
Combined (hybrid routing)     1.064   0.072  5.131  27.7%   60%   49.4%
PD-Sep 4P+4D                  1.994   0.075  7.112  12.4%   24%   40.2%
PD-Sep 6P+2D                  1.481   0.077  5.949  16.9%   28%   ~37%

Hybrid routing: +4.7pp APC with comparable latency and GPU utilization.
PD-Sep: significantly worse on all dimensions for single-machine agentic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 03:14:05 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%