obsidian/241117.md at main - obsidian - Local Gitea

gahow/obsidian

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

541 B

Raw Permalink Blame History

Objective

Customize vLLM(Ali ver) with new features

Key Results

Test modified vLLM which supports CPU KV cache
Profile and breakdown modified vLLM in synthetic data and real Qwen trace

Last Week

Merge vLLM which supports CPU KV cache and use synthetic data and real Qwen trace to measure the performance and find bugs.
Add a breakdown measurement support in vLLM server side to measure the time for copying of KV blocks.

Next Week

Run more test for vLLM which supports CPU KV cache.
Try to optimize current implementation.