848 B
848 B
Objective
- Workload-centric KV cache scheduling
- XPURemoting adaption for PhOS
Key Results
- Define the Good KVCache hit rate in different conditions [6/10]
- Prove the interference between different workloads in current vLLM
- Modify XPURemoting to support PhOS (v1)
Last Week
- Search different KVCache schedule algorithms and sumarize something common for definition of Good KVCache hit rate.
- Profile ali trace in vLLM and group them to prove interference.
- Adaption of XPURemoting to support current PhOS's API. And fully test implementation in PhOS's open source examples. MR for XPURemoting and e80bf94 for PhOS.
Next Week
- Finish definetion of Good KVCache hit rate.