18 lines
668 B
Markdown
18 lines
668 B
Markdown
Objective
|
|
- Workload-centric KV cache scheduling
|
|
- XPURemoting adaption for PhOS
|
|
|
|
Key Results
|
|
- Refactor vLLM benchmark tools to get more precise metrics
|
|
- Simulate different token lengths and hit rate to define hit rate's effect
|
|
- Modify XPURemoting to support new architecture
|
|
|
|
Last Week
|
|
- Implement a unified vLLM benchmark tool to get more precise metric results and provide a unified requests builder.
|
|
- Measure the effect of cache hit rate and try to define a good hit rate for real performance improvement.
|
|
- Merge XPURemoting with new features and support for PhOS.
|
|
|
|
Next Week
|
|
- Define a `good hit rate` for KV cache scheduling.
|
|
- Finish XPURemoting adaption.
|