33 lines
562 B
Markdown
33 lines
562 B
Markdown
|
|
[ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production](https://arxiv.org/pdf/2505.09999)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
优先 evict M queue
|
|
![[projects/kvcachecache/Dev.figs/250414-000021.png]]
|
|
|
|
| | S3FIFO |
|
|
|------------|----------|
|
|
| 1kGPU1kCPU | 0.095005 |
|
|
| 1kGPU2kCPU | 0.136413 |
|
|
| 1kGPU4kCPU | 0.213832 |
|
|
|
|
优先 evict S queue
|
|
![[projects/kvcachecache/Dev.figs/250414-000021-1.png]]
|
|
|
|
| | S3FIFO |
|
|
| ---------- | -------- |
|
|
| 1kGPU1kCPU | 0.095005 |
|
|
| 1kGPU2kCPU | 0.136413 |
|
|
| 1kGPU4kCPU | 0.213832 |
|
|
|
|
|
|
|
|
|