agentic-kvc

Go to file

Gahow Wang 9dee25907b Add P/D ratio ablation: 6P+2D vs 4P+4D vs Combined

6P+2D gives more GPUs to prefill, fewer to decode:
- Decode util: 7.8% (4D) -> 19.0% (2D), less waste
- TTFT: 1.99s (4P) -> 1.48s (6P), -26% from less prefill queuing
- But Combined (30.5% util, TTFT 1.01s) still best overall

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-21 22:42:20 +08:00

analysis

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

replayer

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

scripts

Add P/D ratio ablation: 6P+2D vs 4P+4D vs Combined

2026-05-21 22:42:20 +08:00

.gitignore

Agentic workload PD separation analysis with trace-driven benchmarks