541 B
541 B
Objective
- Customize vLLM(Ali ver) with new features
Key Results
- Test modified vLLM which supports CPU KV cache
- Profile and breakdown modified vLLM in synthetic data and real Qwen trace
Last Week
- Merge vLLM which supports CPU KV cache and use synthetic data and real Qwen trace to measure the performance and find bugs.
- Add a breakdown measurement support in vLLM server side to measure the time for copying of KV blocks.
Next Week
- Run more test for vLLM which supports CPU KV cache.
- Try to optimize current implementation.