14 lines
556 B
Markdown
14 lines
556 B
Markdown
Objective
|
|
- Serverless KVCache cache
|
|
|
|
Key Results
|
|
- Implement the workload aware policy in vLLM
|
|
- Profile the workload aware policy [3/10]
|
|
|
|
Last Week
|
|
- Implement priority-based (calculated by our policy) evictor for both GPU and CPU sides.
|
|
- Test our policy under ralative small cache memory, and get a 30% cache hit ratio and 10% performance improvement. Prove our policy is used for limited cache memory. But for the larger cache memory, our policy still need some fine-tune.
|
|
|
|
Next Week
|
|
- Improve our policy for larger cache memory.
|
|
- Analysis new trace. |