Files
obsidian/phd/weekly-report/24/241124.md

18 lines
668 B
Markdown

Objective
- Workload-centric KV cache scheduling
- XPURemoting adaption for PhOS
Key Results
- Refactor vLLM benchmark tools to get more precise metrics
- Simulate different token lengths and hit rate to define hit rate's effect
- Modify XPURemoting to support new architecture
Last Week
- Implement a unified vLLM benchmark tool to get more precise metric results and provide a unified requests builder.
- Measure the effect of cache hit rate and try to define a good hit rate for real performance improvement.
- Merge XPURemoting with new features and support for PhOS.
Next Week
- Define a `good hit rate` for KV cache scheduling.
- Finish XPURemoting adaption.