Objective - Customize vLLM(Ali ver) with new features Key Results - Test modified vLLM which supports CPU KV cache - Profile and breakdown modified vLLM in synthetic data and real Qwen trace Last Week - Merge vLLM which supports CPU KV cache and use synthetic data and real Qwen trace to measure the performance and find bugs. - Add a breakdown measurement support in vLLM server side to measure the time for copying of KV blocks. Next Week - Run more test for vLLM which supports CPU KV cache. - Try to optimize current implementation.