Objectives - Analysis of QWen trace - Customize vLLM(Ali ver) with new features Key Results - Tokenize Qwen trace with Qwen-agent and some other tools [60%] - Modify vLLM to support different KV cache block number - Profile open source dataset with different cache blocks Last Week - Use Qwen-agent to handle workloads with file, get a more precise token length for these workloads. - Modify vLLM's cache manager to support specific KVCache cache blocks, then measure the KV cache hit rate trend by block number in different workloads. Next Week - Tokenize all Qwen trace especially multimodal (image) workloads and measure with these trace. - Profile KVCache cache hit rate in actual trace and compare with other open source trace to find different.