588 B
588 B
Objectives
- Analysis of QWen trace
- Customize vLLM(Ali ver) with new features
Key Results
- Tokenize Qwen trace with Qwen-agent and some other tools
- Profile Qwen trace with different cache blocks
Last Week
- Use Qwen-agent to handle all workloads in Qwen trace and get a precise token stream to simulate actual online environment.
- Measure the performance and KVCache cache hit rate for different cache blocks using real Qwen trace running for one hour.
Next Week
- Check the tokenize results from Qwen trace, maybe need to modify.
- Measure KV cache performance with CPU memory.