Objectives - Analysis of QWen trace - Customize vLLM(Ali ver) with new features Key Results - Tokenize Qwen trace with Qwen-agent and some other tools - Profile Qwen trace with different cache blocks Last Week - Use Qwen-agent to handle all workloads in Qwen trace and get a precise token stream to simulate actual online environment. - Measure the performance and KVCache cache hit rate for different cache blocks using real Qwen trace running for one hour. Next Week - Check the tokenize results from Qwen trace, maybe need to modify. - Measure KV cache performance with CPU memory.