[ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production](https://arxiv.org/pdf/2505.09999) 优先 evict M queue ![[projects/kvcachecache/Dev.figs/250414-000021.png]] | | S3FIFO | |------------|----------| | 1kGPU1kCPU | 0.095005 | | 1kGPU2kCPU | 0.136413 | | 1kGPU4kCPU | 0.213832 | 优先 evict S queue ![[projects/kvcachecache/Dev.figs/250414-000021-1.png]] | | S3FIFO | | ---------- | -------- | | 1kGPU1kCPU | 0.095005 | | 1kGPU2kCPU | 0.136413 | | 1kGPU4kCPU | 0.213832 |