Files
obsidian/phd/weekly-report/24/241222.md

828 B

Objective

  • Serverless KVCache cache

Key Results

  • Implement the workload aware policy in vLLM [8/10]
  • Profile the workload aware policy [3/10]
  • Supply workloads difference in Qwen trace

Last Week

  • Add new design point to cache policy, making the policy to consider cache memory size and predicted reuse distance together. To do this, add a new monitor for workloads' reuse time interval and average number of tokens.
  • Set a offline (i.e. best) scheduling policy, profile the default policy, our workload aware policy and offline policy to show the performance difference in CDF of TTFT.
  • Implement a cache block source tracker in vLLM to show where the KVCache reuse comes from. Prove that 90% of KVCache reuse comes from multi turns chat.

Next Week

  • Improve the performance of our policy.
  • Plot some formal figures.