Files
obsidian/phd/weekly-report/24/241117.md

15 lines
541 B
Markdown

Objective
- Customize vLLM(Ali ver) with new features
Key Results
- Test modified vLLM which supports CPU KV cache
- Profile and breakdown modified vLLM in synthetic data and real Qwen trace
Last Week
- Merge vLLM which supports CPU KV cache and use synthetic data and real Qwen trace to measure the performance and find bugs.
- Add a breakdown measurement support in vLLM server side to measure the time for copying of KV blocks.
Next Week
- Run more test for vLLM which supports CPU KV cache.
- Try to optimize current implementation.