Commit Graph

4 Commits

Author SHA1 Message Date
22c4aa58e4 f2b: replace top-1/5/10% bars with full CDF; align all docs to replay-trace numbers
The previous f2b_session_skew.png was a 3-bar chart (top 1/5/10%) computed
from the production trace summary (which is not present locally, only its
precomputed JSON). The new figure is a continuous CDF of cumulative
input-token mass vs session rank percentile, generated directly from the
replay trace traces/w600_r0.0015_st30.jsonl so any percentile is readable.

Headline numbers update accordingly:
  replay trace (n=274 sessions): top 1% = 24.3%, top 5% = 61.9%, top 10% = 75.8%
  production trace (n=1.3M):     top 1% = 46.5%, top 5% = 66.5%, top 10% = 74.6%

Both show extreme skew well above the y=x uniform reference; the replay
trace is less extreme at top-1% because n=274 makes that bucket only
~3 sessions. We standardize §2/§3 narrative on the replay-trace numbers
so motivation matches §5 evaluation; production numbers kept as a side
note for context.

- scripts/plot_session_skew_cdf.py: reproducible figure generator
- MEETING.md / PAPER_OUTLINE.md: update narrative + caption

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:37:22 +08:00
020a5c79a7 §3.3 reframe: hot pin failure is uniformly-slow workers, not max/median ratio
User pointed out the apparent paradox: in fig_b3_per_worker_ttft_p90, unified
has hotspot index 3.67 while sticky has 2.73, yet unified e2e p90 is roughly
half of sticky's. Resolution: hotspot index (max/median) is a *ratio* and
misleading on its own. Per-worker absolute TTFT p90:

  sticky : median 20.3s, max 55.4s -> system e2e p90 34.6s
  unified: median 10.3s, max 37.7s -> system e2e p90 18.0s

Mechanism: top 1% sessions own 46.5% input mass and there are more hot
sessions than instances (8), so sticky's hash binding gives *every* worker
its own hot session and the median worker is also slow. Unified's LMetric
fallback re-routes cold/new sessions away from hot affinity instances,
preserving 7/8 worker speed. System p90 is dominated by the majority of
requests landing on fast workers, hence the 2x e2e gap.

Changes:
- Replace §3.3 figure with figs/f4c_per_worker_ttft.png (per-worker bars)
  instead of figs/f4c_apc_vs_hotspot_tradeoff.png (the ratio scatter)
- §3.3 narrative in PAPER_OUTLINE.md and MEETING.md rewritten around
  absolute median + max + system e2e p90 instead of hotspot ratio
- Add a §3.3 sub-finding: "hot pin failure must be measured with
  per-worker absolute latency, not normalized ratio"
- Keep the scatter as supplementary for §5 multi-policy summary

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:10:23 +08:00
df0ee5a02b Use PNG for KV memory wall figure; switch outline to inline image embeds
- Convert figs/f4b_pdsep_kv_wall.pdf to PNG via pdftoppm @ 150 DPI so
  MEETING.md and PAPER_OUTLINE.md render the figure inline on GitHub /
  any standard markdown viewer (PDF !() embeds don't render).
- PAPER_OUTLINE.md F2, F4, F6: switch from backtick code references to
  proper ![]() image embeds so the doc is actually viewable as a deck.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 09:13:26 +08:00
52cdb80367 EAR outline: copy reusable figures, mark migration sections deferred
- replayer/replay.py: emit trace_span_s and amplification in summary
  (Phase 1 of the wall-clock amplification measurement plan; needed for
  §2.3 dispatch coupling empirical closure)
- figs/: 8 reusable figures copied from analysis/ with paper-spec names
  (f2a/b/c workload, f4a/b/c/d failure modes, f6 e2e partial)
- PAPER_OUTLINE.md: real figure paths, explicit TBD markers for
  custom drawings and pending data; new "Validation Status" table at top
  and reorganized "Work Plan" splitting can-do-now vs migration-deferred

Migration validation deferred per user: 4 prior attempts (6b255fa,
e991960/5772149, cc6e562, 4c583f2) were reverted due to transfer
overhead; pending re-test on top of connector_tax DR-fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 01:44:13 +08:00