analysis/characterization/window_1_results.md is the headline write-up for Window 1: workload characterization (KV per request, real reuse decomposition, APC theoretical ceilings), B3 5-policy sweep with per-policy interpretation, B2 same-vs-different-worker interference microbench with causal reading, and an explicit list of what Window 1 does *not* answer (deferred to B4 SRR sweep + B5 attribution). Under window_1_results/: - 5 raw result JSONs from the B3 sweep, the B2 microbench, the APC upper bound, and the KV footprint - per-policy hotspot_index.json snapshots so render_window1_figures.py can plot per-worker TTFT p90 distributions - 8 PNG figures (figures/) covering the headline claims Three takeaways the figures pin down: 1) intra-session reuse dominates (93.2%), so session-affinity routing is the right primary lever 2) unified hybrid affinity hits 79.4% APC (97% of the 79.6% intra- session ceiling) AND cuts TTFT p90 from lmetric's 15.6s to 7.24s 3) B2 different-worker control sits at idx ≈ 1.0 across 32× prefill- size variation; same-worker TTFT idx scales 2.15× -> 218×, which is the cleanest causal evidence for same-worker prefill-decode interference Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
25 lines
923 B
JSON
25 lines
923 B
JSON
{
|
|
"hotspot_index_ttft_p90": 2.237981740718548,
|
|
"per_worker_latency_p90_s": {
|
|
"http://127.0.0.1:8000": 34.71445541951107,
|
|
"http://127.0.0.1:8001": 21.922988962882666,
|
|
"http://127.0.0.1:8002": 23.936190764518685,
|
|
"http://127.0.0.1:8003": 26.22220957049285,
|
|
"http://127.0.0.1:8004": 40.318757307820505,
|
|
"http://127.0.0.1:8005": 12.26559703698149,
|
|
"http://127.0.0.1:8006": 27.904838753980588,
|
|
"http://127.0.0.1:8007": 18.430557113309625
|
|
},
|
|
"per_worker_ttft_p90_s": {
|
|
"http://127.0.0.1:8000": 28.18261351052206,
|
|
"http://127.0.0.1:8001": 13.147308969072796,
|
|
"http://127.0.0.1:8002": 13.818959677941162,
|
|
"http://127.0.0.1:8003": 14.003642184572524,
|
|
"http://127.0.0.1:8004": 31.339895512629305,
|
|
"http://127.0.0.1:8005": 7.870992770011071,
|
|
"http://127.0.0.1:8006": 14.149156623415186,
|
|
"http://127.0.0.1:8007": 11.777357225219024
|
|
},
|
|
"status": "supported"
|
|
}
|