Files
obsidian/phd/weekly-report/24/241124.md

668 B

Objective

  • Workload-centric KV cache scheduling
  • XPURemoting adaption for PhOS

Key Results

  • Refactor vLLM benchmark tools to get more precise metrics
  • Simulate different token lengths and hit rate to define hit rate's effect
  • Modify XPURemoting to support new architecture

Last Week

  • Implement a unified vLLM benchmark tool to get more precise metric results and provide a unified requests builder.
  • Measure the effect of cache hit rate and try to define a good hit rate for real performance improvement.
  • Merge XPURemoting with new features and support for PhOS.

Next Week

  • Define a good hit rate for KV cache scheduling.
  • Finish XPURemoting adaption.