375 B
375 B
Objective
- Serverless KVCache cache
Key Results
- Test a workload aware KVCache scheduler
- Implement the workload aware policy in vLLM
Last Week
- Design a workload aware schedule policy in simulator and profile the KVCache reuse rate.
- Implement the designed policy under vLLM.
Next Week
- Profile the real performance of new policy under vLLM and do some enhancement.