Initial commit: obsidian to gitea
This commit is contained in:
17
phd/weekly-report/24/241124.md
Normal file
17
phd/weekly-report/24/241124.md
Normal file
@@ -0,0 +1,17 @@
|
||||
Objective
|
||||
- Workload-centric KV cache scheduling
|
||||
- XPURemoting adaption for PhOS
|
||||
|
||||
Key Results
|
||||
- Refactor vLLM benchmark tools to get more precise metrics
|
||||
- Simulate different token lengths and hit rate to define hit rate's effect
|
||||
- Modify XPURemoting to support new architecture
|
||||
|
||||
Last Week
|
||||
- Implement a unified vLLM benchmark tool to get more precise metric results and provide a unified requests builder.
|
||||
- Measure the effect of cache hit rate and try to define a good hit rate for real performance improvement.
|
||||
- Merge XPURemoting with new features and support for PhOS.
|
||||
|
||||
Next Week
|
||||
- Define a `good hit rate` for KV cache scheduling.
|
||||
- Finish XPURemoting adaption.
|
||||
Reference in New Issue
Block a user