668 B
668 B
Objective
- Workload-centric KV cache scheduling
- XPURemoting adaption for PhOS
Key Results
- Refactor vLLM benchmark tools to get more precise metrics
- Simulate different token lengths and hit rate to define hit rate's effect
- Modify XPURemoting to support new architecture
Last Week
- Implement a unified vLLM benchmark tool to get more precise metric results and provide a unified requests builder.
- Measure the effect of cache hit rate and try to define a good hit rate for real performance improvement.
- Merge XPURemoting with new features and support for PhOS.
Next Week
- Define a
good hit ratefor KV cache scheduling. - Finish XPURemoting adaption.