Files
obsidian/phd/weekly-report/25/250427.md

579 B

Objective

  • Serverless KVCache cache

Key Result

  • Refine cache policy implementation
  • Implement and test our workload-aware cache policy in vLLM
  • Write graduation thesis

Last Week

  • Refine cache policy to consider the cost of keeping cache in memory, and get about 1% to 2% hit rate improvement under 1k+1k cache blocks.
  • Implement PDF-based workload-aware cache policy in vLLM and profile LRU v.s. WA under Qwen2-7B, get 25% QTTFT reduction.
  • Finish the first draft of graduation thesis.

Next Week

  • Do full test for different cache policies and under different models.