obsidian/241124.md at a57afa86b47c58aeca557e7cbcb0d38b81159d78 - obsidian - Local Gitea

gahow/obsidian

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

668 B

Raw Blame History

Objective

Workload-centric KV cache scheduling
XPURemoting adaption for PhOS

Key Results

Refactor vLLM benchmark tools to get more precise metrics
Simulate different token lengths and hit rate to define hit rate's effect
Modify XPURemoting to support new architecture

Last Week

Implement a unified vLLM benchmark tool to get more precise metric results and provide a unified requests builder.
Measure the effect of cache hit rate and try to define a good hit rate for real performance improvement.
Merge XPURemoting with new features and support for PhOS.

Next Week

Define a good hit rate for KV cache scheduling.
Finish XPURemoting adaption.