17 lines
754 B
Markdown
17 lines
754 B
Markdown
Objectives
|
|
- Analysis of QWen trace
|
|
- Customize vLLM(Ali ver) with new features
|
|
|
|
Key Results
|
|
- Tokenize Qwen trace with Qwen-agent and some other tools [60%]
|
|
- Modify vLLM to support different KV cache block number
|
|
- Profile open source dataset with different cache blocks
|
|
|
|
Last Week
|
|
- Use Qwen-agent to handle workloads with file, get a more precise token length for these workloads.
|
|
- Modify vLLM's cache manager to support specific KVCache cache blocks, then measure the KV cache hit rate trend by block number in different workloads.
|
|
|
|
Next Week
|
|
- Tokenize all Qwen trace especially multimodal (image) workloads and measure with these trace.
|
|
- Profile KVCache cache hit rate in actual trace and compare with other open source trace to find different.
|