Initial commit: obsidian to gitea
This commit is contained in:
16
phd/weekly-report/24/241222.md
Normal file
16
phd/weekly-report/24/241222.md
Normal file
@@ -0,0 +1,16 @@
|
||||
Objective
|
||||
- Serverless KVCache cache
|
||||
|
||||
Key Results
|
||||
- Implement the workload aware policy in vLLM [8/10]
|
||||
- Profile the workload aware policy [3/10]
|
||||
- Supply workloads difference in Qwen trace
|
||||
|
||||
Last Week
|
||||
- Add new design point to cache policy, making the policy to consider cache memory size and predicted reuse distance together. To do this, add a new monitor for workloads' reuse time interval and average number of tokens.
|
||||
- Set a offline (i.e. best) scheduling policy, profile the default policy, our workload aware policy and offline policy to show the performance difference in CDF of TTFT.
|
||||
- Implement a cache block source tracker in vLLM to show where the KVCache reuse comes from. Prove that 90% of KVCache reuse comes from multi turns chat.
|
||||
|
||||
Next Week
|
||||
- Improve the performance of our policy.
|
||||
- Plot some formal figures.
|
||||
Reference in New Issue
Block a user