Objective - Workload-centric KV cache scheduling - XPURemoting adaption for PhOS Key Results - Refactor vLLM benchmark tools to get more precise metrics - Simulate different token lengths and hit rate to define hit rate's effect - Modify XPURemoting to support new architecture Last Week - Implement a unified vLLM benchmark tool to get more precise metric results and provide a unified requests builder. - Measure the effect of cache hit rate and try to define a good hit rate for real performance improvement. - Merge XPURemoting with new features and support for PhOS. Next Week - Define a `good hit rate` for KV cache scheduling. - Finish XPURemoting adaption.