Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00
commit a57afa86b4
323 changed files with 42569 additions and 0 deletions
--- a/phd/weekly-report/24/241027.md
+++ b/phd/weekly-report/24/241027.md
@@ -0,0 +1,17 @@
+Objectives
+- Analysis of QWen trace
+- Customize vLLM(Ali ver) with new features
+- Port XPURemoting to PhOS
+
+Key Results
+- Enhance QWen trace's workloads separation
+- Get vLLM KVCache hit rate for different open source workloads
+- Build unified docker image for XPURemoting and PhOS
+
+Last Week
+- Get a unified workload taxonomy for QWen trace in both Web and App ends.
+- Run vLLM(Ali ver) and start to customize to get some features(e.x. KVCache hit rate for different workloads).
+- Build a new docker image to satisfy PhOS's base image requirement with XPURemoting env(static linked PyTorch 1.13.1).
+
+Next Week
+- Customize vLLM to support new features like KVCache schedule policy comparation.
--- a/phd/weekly-report/24/241103.md
+++ b/phd/weekly-report/24/241103.md
@@ -0,0 +1,16 @@
+Objectives
+- Analysis of QWen trace
+- Customize vLLM(Ali ver) with new features
+
+Key Results
+- Tokenize Qwen trace with Qwen-agent and some other tools [60%]
+- Modify vLLM to support different KV cache block number
+- Profile open source dataset with different cache blocks
+
+Last Week
+- Use Qwen-agent to handle workloads with file, get a more precise token length for these workloads.
+- Modify vLLM's cache manager to support specific KVCache cache blocks, then measure the KV cache hit rate trend by block number in different workloads.
+
+Next Week
+- Tokenize all Qwen trace especially multimodal (image) workloads and measure with these trace.
+- Profile KVCache cache hit rate in actual trace and compare with other open source trace to find different.
--- a/phd/weekly-report/24/241110.md
+++ b/phd/weekly-report/24/241110.md
@@ -0,0 +1,15 @@
+Objectives
+- Analysis of QWen trace
+- Customize vLLM(Ali ver) with new features
+
+Key Results
+- Tokenize Qwen trace with Qwen-agent and some other tools
+- Profile Qwen trace with different cache blocks
+
+Last Week
+- Use Qwen-agent to handle all workloads in Qwen trace and get a precise token stream to simulate actual online environment.
+- Measure the performance and KVCache cache hit rate for different cache blocks using real Qwen trace running for one hour.
+
+Next Week
+- Check the tokenize results from Qwen trace, maybe need to modify.
+- Measure KV cache performance with CPU memory.
--- a/phd/weekly-report/24/241117.md
+++ b/phd/weekly-report/24/241117.md
@@ -0,0 +1,14 @@
+Objective
+- Customize vLLM(Ali ver) with new features
+
+Key Results
+- Test modified vLLM which supports CPU KV cache
+- Profile and breakdown modified vLLM in synthetic data and real Qwen trace
+
+Last Week
+- Merge vLLM which supports CPU KV cache and use synthetic data and real Qwen trace to measure the performance and find bugs.
+- Add a breakdown measurement support in vLLM server side to measure the time for copying of KV blocks. 
+
+Next Week
+- Run more test for vLLM which supports CPU KV cache.
+- Try to optimize current implementation.
--- a/phd/weekly-report/24/241124.md
+++ b/phd/weekly-report/24/241124.md
@@ -0,0 +1,17 @@
+Objective
+- Workload-centric KV cache scheduling
+- XPURemoting adaption for PhOS
+
+Key Results
+- Refactor vLLM benchmark tools to get more precise metrics
+- Simulate different token lengths and hit rate to define hit rate's effect
+- Modify XPURemoting to support new architecture
+
+Last Week
+- Implement a unified vLLM benchmark tool to get more precise metric results and provide a unified requests builder.
+- Measure the effect of cache hit rate and try to define a good hit rate for real performance improvement.
+- Merge XPURemoting with new features and support for PhOS.
+
+Next Week
+- Define a `good hit rate` for KV cache scheduling.
+- Finish XPURemoting adaption.
--- a/phd/weekly-report/24/241201.md
+++ b/phd/weekly-report/24/241201.md
@@ -0,0 +1,16 @@
+Objective
+- Workload-centric KV cache scheduling
+- XPURemoting adaption for PhOS
+
+Key Results
+- Define the Good KVCache hit rate in different conditions [6/10]
+- Prove the interference between different workloads in current vLLM
+- Modify XPURemoting to support PhOS (v1)
+
+Last Week
+- Search different KVCache schedule algorithms and sumarize something common for definition of Good KVCache hit rate.
+- Profile ali trace in vLLM and group them to prove interference.
+- Adaption of XPURemoting to support current PhOS's API. And fully test implementation in PhOS's open source examples. [MR](https://ipads.se.sjtu.edu.cn:1312/scaleaisys/xpuremoting/-/merge_requests/25) for XPURemoting and [e80bf94](https://github.com/Gahow/PhoenixOS/commit/e80bf94075fcd6f53c97406dadfbe7f13fc16092) for PhOS.
+
+Next Week
+- Finish definetion of Good KVCache hit rate.
--- a/phd/weekly-report/24/241208.md
+++ b/phd/weekly-report/24/241208.md
@@ -0,0 +1,15 @@
+Objectives
+- Serverless KVCache cache
+- PhOS profile
+
+Key Results
+- Implement a workload aware KVCache scheduler. [3/10]
+- Provide test apps for PhOS
+
+Last Week
+- Implement a simulator for KVCache scheduler to quick test different policies.
+- Prepare and do a paper sharing in Ali.
+- Provide StableDiffusion single GPU train, Llama2-13b multi GPU train, Llama2-70b multi GPU inference script for PhOS profiling.
+
+Next Week
+- Implement a solution to reduce KVCache memory need.
--- a/phd/weekly-report/24/241215.md
+++ b/phd/weekly-report/24/241215.md
@@ -0,0 +1,13 @@
+Objective
+- Serverless KVCache cache
+
+Key Results
+- Test a workload aware KVCache scheduler
+- Implement the workload aware policy in vLLM
+
+Last Week
+- Design a workload aware schedule policy in simulator and profile the KVCache reuse rate.
+- Implement the designed policy under vLLM.
+
+Next Week
+- Profile the real performance of new policy under vLLM and do some enhancement.
--- a/phd/weekly-report/24/241222.md
+++ b/phd/weekly-report/24/241222.md
@@ -0,0 +1,16 @@
+Objective
+- Serverless KVCache cache
+
+Key Results
+- Implement the workload aware policy in vLLM [8/10]
+- Profile the workload aware policy [3/10]
+- Supply workloads difference in Qwen trace
+
+Last Week
+- Add new design point to cache policy, making the policy to consider cache memory size and predicted reuse distance together. To do this, add a new monitor for workloads' reuse time interval and average number of tokens.
+- Set a offline (i.e. best) scheduling policy, profile the default policy, our workload aware policy and offline policy to show the performance difference in CDF of  TTFT.
+- Implement a cache block source tracker in vLLM to show where the KVCache reuse comes from. Prove that 90% of KVCache reuse comes from multi turns chat.
+
+Next Week
+- Improve the performance of our policy.
+- Plot some formal figures.
--- a/phd/weekly-report/24/241229.md
+++ b/phd/weekly-report/24/241229.md
@@ -0,0 +1,14 @@
+Objective
+- Serverless KVCache cache
+
+Key Results
+- Implement the workload aware policy in vLLM
+- Profile the workload aware policy [3/10]
+
+Last Week
+- Implement priority-based (calculated by our policy) evictor for both GPU and CPU sides.
+- Test our policy under ralative small cache memory, and get a 30% cache hit ratio and 10% performance improvement. Prove our policy is used for limited cache memory. But for the larger cache memory, our policy still need some fine-tune.
+
+Next Week
+- Improve our policy for larger cache memory.
+- Analysis new trace.