obsidian/sync.md at 8036c9016c25ef801bc279902f76583bb052c820

Files

Gahow Wang a57afa86b4 Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00

2.4 KiB

Raw Blame History

分桶	请求数	SLA	实例数	estimated_ttft mean TTFT	无限空间上限
0-32k	637,142	<= 5s	64	0.502s	59.63%
32-85k	99,735	<= 10s	48	2.801s	82.35%
85-128k	23,624	<= 15s	16	9.669s	84.25%
128k+	3,226	<= 20s	6	9.572s	82.99%

分桶	最优路由	最优 TTFT / Hit / Gap	cache_score	cache_score_strong
0-32k	cache_affinity_weak_rend	0.488s / 56.11% / 3.52pp	0.536s / 54.45% / 5.18pp	0.813s / 56.97% / 2.66pp
32-85k	estimated_ttft	2.801s / 76.70% / 5.66pp	3.766s / 77.52% / 4.83pp	5.193s / 78.00% / 4.35pp
85-128k	cache_affinity_weak_rend	9.289s / 77.12% / 7.13pp	9.408s / 77.07% / 7.18pp	11.906s / 76.87% / 7.38pp
128k+	estimated_ttft	9.572s / 74.44% / 8.54pp	10.630s / 74.56% / 8.42pp	11.481s / 74.39% / 8.59pp

cache_score_strong 在 Qwen3 上并不占优。它只在 0-32k 和 32-85k 上拿到了略高的 hit ratio，但代价是更差的 TTFT；而在 85-128k 和 128k+ 上，它连命中率都没有优势，TTFT 还更差。也就是说，Qwen3 上“更激进地追 cache”并没有换来稳定收益。

cache_score 比 cache_score_strong 更稳。在四个桶里，它都比 cache_score_strong 有更好的 TTFT；命中率上和 cache_score_strong 很接近，甚至在长桶更好。如果只在 cache_score 和 cache_score_strong 之间选，Qwen3 上应优先 cache_score。

全策略最优并不统一。0-32k 和 85-128k 最优是 cache_affinity_weak_rend，32-85k 和 128k+ 最优是 estimated_ttft。这说明 Qwen3 上不存在一个单一 policy 可以统治所有长度段，分桶后做差异化路由是有价值的。

从 gap 看，真正的主要问题不在 eviction，而在 workload ceiling 本身和在线放置策略。0-32k 的 ceiling 太低，在线路由再怎么优化也只能在 60% 左右附近打转；而中长桶 ceiling 很高，但当前最优在线策略仍然比无限空间上限差 5.7pp 到 8.5pp，说明还有 routing/placement headroom，不过不是 cache_score_strong 这条路。

2.4 KiB Raw Blame History Unescape Escape

2.4 KiB

Raw Blame History