13 lines
375 B
Markdown
13 lines
375 B
Markdown
Objective
|
|
- Serverless KVCache cache
|
|
|
|
Key Results
|
|
- Test a workload aware KVCache scheduler
|
|
- Implement the workload aware policy in vLLM
|
|
|
|
Last Week
|
|
- Design a workload aware schedule policy in simulator and profile the KVCache reuse rate.
|
|
- Implement the designed policy under vLLM.
|
|
|
|
Next Week
|
|
- Profile the real performance of new policy under vLLM and do some enhancement. |