Files
obsidian/phd/weekly-report/24/241229.md

556 B

Objective

  • Serverless KVCache cache

Key Results

  • Implement the workload aware policy in vLLM
  • Profile the workload aware policy [3/10]

Last Week

  • Implement priority-based (calculated by our policy) evictor for both GPU and CPU sides.
  • Test our policy under ralative small cache memory, and get a 30% cache hit ratio and 10% performance improvement. Prove our policy is used for limited cache memory. But for the larger cache memory, our policy still need some fine-tune.

Next Week

  • Improve our policy for larger cache memory.
  • Analysis new trace.