obsidian/related-works.md at a57afa86b47c58aeca557e7cbcb0d38b81159d78 - obsidian - Local Gitea

gahow/obsidian

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

1.7 KiB

Raw Blame History

LAPS: A Length-Aware-Prefill LLM Serving System https://arxiv.org/abs/2601.11589

Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading https://arxiv.org/abs/2601.19910

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference https://arxiv.org/abs/2602.21548

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving https://www.usenix.org/system/files/osdi24-zhong-yinmin.pdf

Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving https://arxiv.org/pdf/2603.13358

Efficient Multi-round LLM Inference over Disaggregated Serving https://arxiv.org/pdf/2602.14516v1

Roadmap: SGLang Distributed KVCache System For Agentic Workload https://github.com/sgl-project/sglang/issues/21846

RFC: P/D Prefill compute optimizations with bi-directional KV cache transfers between P and D nodes https://github.com/vllm-project/vllm/issues/32733