15 lines
541 B
Markdown
15 lines
541 B
Markdown
Objective
|
|
- Customize vLLM(Ali ver) with new features
|
|
|
|
Key Results
|
|
- Test modified vLLM which supports CPU KV cache
|
|
- Profile and breakdown modified vLLM in synthetic data and real Qwen trace
|
|
|
|
Last Week
|
|
- Merge vLLM which supports CPU KV cache and use synthetic data and real Qwen trace to measure the performance and find bugs.
|
|
- Add a breakdown measurement support in vLLM server side to measure the time for copying of KV blocks.
|
|
|
|
Next Week
|
|
- Run more test for vLLM which supports CPU KV cache.
|
|
- Try to optimize current implementation.
|