obsidian/projects/agentic-kvcache/sync.md

| 分桶      |     请求数 |    SLA | 实例数 | estimated_ttft mean TTFT | 无限空间上限 |
| ------- | ------: | -----: | --: | -----------------------: | -----: |
| 0-32k   | 637,142 |  <= 5s |  64 |                   0.502s | 59.63% |
| 32-85k  |  99,735 | <= 10s |  48 |                   2.801s | 82.35% |
| 85-128k |  23,624 | <= 15s |  16 |                   9.669s | 84.25% |
| 128k+   |   3,226 | <= 20s |   6 |                   9.572s | 82.99% |

| 分桶      | 最优路由                     | 最优 TTFT / Hit / Gap      | cache_score               | cache_score_strong        |
| ------- | ------------------------ | ------------------------ | ------------------------- | ------------------------- |
| 0-32k   | cache_affinity_weak_rend | 0.488s / 56.11% / 3.52pp | 0.536s / 54.45% / 5.18pp  | 0.813s / 56.97% / 2.66pp  |
| 32-85k  | estimated_ttft           | 2.801s / 76.70% / 5.66pp | 3.766s / 77.52% / 4.83pp  | 5.193s / 78.00% / 4.35pp  |
| 85-128k | cache_affinity_weak_rend | 9.289s / 77.12% / 7.13pp | 9.408s / 77.07% / 7.18pp  | 11.906s / 76.87% / 7.38pp |
| 128k+   | estimated_ttft           | 9.572s / 74.44% / 8.54pp | 10.630s / 74.56% / 8.42pp | 11.481s / 74.39% / 8.59pp |

  cache_score_strong 在 Qwen3 上并不占优。它只在 0-32k 和 32-85k 上拿到了略高的 hit ratio，但代价是更差的 TTFT；而在 85-128k 和 128k+ 上，它连命中率都没有优势，TTFT 还更差。也就是
  说，Qwen3 上“更激进地追 cache”并没有换来稳定收益。

  cache_score 比 cache_score_strong 更稳。在四个桶里，它都比 cache_score_strong 有更好的 TTFT；命中率上和 cache_score_strong 很接近，甚至在长桶更好。如果只在 cache_score 和
  cache_score_strong 之间选，Qwen3 上应优先 cache_score。

  全策略最优并不统一。0-32k 和 85-128k 最优是 cache_affinity_weak_rend，32-85k 和 128k+ 最优是 estimated_ttft。这说明 Qwen3 上不存在一个单一 policy 可以统治所有长度段，分桶后做差
  异化路由是有价值的。

  从 gap 看，真正的主要问题不在 eviction，而在 workload ceiling 本身和在线放置策略。0-32k 的 ceiling 太低，在线路由再怎么优化也只能在 60% 左右附近打转；而中长桶 ceiling 很高，但当
  前最优在线策略仍然比无限空间上限差 5.7pp 到 8.5pp，说明还有 routing/placement headroom，不过不是 cache_score_strong 这条路。