obsidian/phd/weekly-report/24/241110.md

Objectives
- Analysis of QWen trace
- Customize vLLM(Ali ver) with new features

Key Results
- Tokenize Qwen trace with Qwen-agent and some other tools
- Profile Qwen trace with different cache blocks

Last Week
- Use Qwen-agent to handle all workloads in Qwen trace and get a precise token stream to simulate actual online environment.
- Measure the performance and KVCache cache hit rate for different cache blocks using real Qwen trace running for one hour.

Next Week
- Check the tokenize results from Qwen trace, maybe need to modify.
- Measure KV cache performance with CPU memory.