Files
obsidian/phd/weekly-report/24/241117.md

541 B

Objective

  • Customize vLLM(Ali ver) with new features

Key Results

  • Test modified vLLM which supports CPU KV cache
  • Profile and breakdown modified vLLM in synthetic data and real Qwen trace

Last Week

  • Merge vLLM which supports CPU KV cache and use synthetic data and real Qwen trace to measure the performance and find bugs.
  • Add a breakdown measurement support in vLLM server side to measure the time for copying of KV blocks.

Next Week

  • Run more test for vLLM which supports CPU KV cache.
  • Try to optimize current implementation.