obsidian/241229.md at main - obsidian - Local Gitea

gahow/obsidian

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

556 B

Raw Permalink Blame History

Objective

Serverless KVCache cache

Key Results

Implement the workload aware policy in vLLM
Profile the workload aware policy [3/10]

Last Week

Implement priority-based (calculated by our policy) evictor for both GPU and CPU sides.
Test our policy under ralative small cache memory, and get a 30% cache hit ratio and 10% performance improvement. Prove our policy is used for limited cache memory. But for the larger cache memory, our policy still need some fine-tune.

Next Week

Improve our policy for larger cache memory.
Analysis new trace.