Objective - Serverless KVCache cache Key Result - Refine cache policy implementation - Implement and test our workload-aware cache policy in vLLM - Write graduation thesis Last Week - Refine cache policy to consider the _cost_ of keeping cache in memory, and get about 1% to 2% hit rate improvement under 1k+1k cache blocks. - Implement PDF-based workload-aware cache policy in vLLM and profile LRU v.s. WA under Qwen2-7B, get 25% QTTFT reduction. - Finish the first draft of graduation thesis. Next Week - Do full test for different cache policies and under different models.