Files
obsidian/phd/weekly-report/24/241229.md

14 lines
556 B
Markdown

Objective
- Serverless KVCache cache
Key Results
- Implement the workload aware policy in vLLM
- Profile the workload aware policy [3/10]
Last Week
- Implement priority-based (calculated by our policy) evictor for both GPU and CPU sides.
- Test our policy under ralative small cache memory, and get a 30% cache hit ratio and 10% performance improvement. Prove our policy is used for limited cache memory. But for the larger cache memory, our policy still need some fine-tune.
Next Week
- Improve our policy for larger cache memory.
- Analysis new trace.