obsidian/phd/weekly-report/24/241117.md

Objective
- Customize vLLM(Ali ver) with new features

Key Results
- Test modified vLLM which supports CPU KV cache
- Profile and breakdown modified vLLM in synthetic data and real Qwen trace

Last Week
- Merge vLLM which supports CPU KV cache and use synthetic data and real Qwen trace to measure the performance and find bugs.
- Add a breakdown measurement support in vLLM server side to measure the time for copying of KV blocks.

Next Week
- Run more test for vLLM which supports CPU KV cache.
- Try to optimize current implementation.