obsidian/241222.md at a57afa86b47c58aeca557e7cbcb0d38b81159d78 - obsidian - Local Gitea

gahow/obsidian

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

828 B

Raw Blame History

Objective

Serverless KVCache cache

Key Results

Implement the workload aware policy in vLLM [8/10]
Profile the workload aware policy [3/10]
Supply workloads difference in Qwen trace

Last Week

Add new design point to cache policy, making the policy to consider cache memory size and predicted reuse distance together. To do this, add a new monitor for workloads' reuse time interval and average number of tokens.
Set a offline (i.e. best) scheduling policy, profile the default policy, our workload aware policy and offline policy to show the performance difference in CDF of TTFT.
Implement a cache block source tracker in vLLM to show where the KVCache reuse comes from. Prove that 90% of KVCache reuse comes from multi turns chat.

Next Week

Improve the performance of our policy.
Plot some formal figures.