Objectives
- Analysis of QWen trace
- Customize vLLM(Ali ver) with new features

Key Results
- Tokenize Qwen trace with Qwen-agent and some other tools [60%]
- Modify vLLM to support different KV cache block number
- Profile open source dataset with different cache blocks

Last Week
- Use Qwen-agent to handle workloads with file, get a more precise token length for these workloads.
- Modify vLLM's cache manager to support specific KVCache cache blocks, then measure the KV cache hit rate trend by block number in different workloads.

Next Week
- Tokenize all Qwen trace especially multimodal (image) workloads and measure with these trace.
- Profile KVCache cache hit rate in actual trace and compare with other open source trace to find different.