0881942cf326586ad23dd1bb4d7982567d85adf4
After the B3 audit bug fixes (joined_analysis hotspot median + b3_analyze percentile interp), regenerate b3_policy_comparison.json and the per-policy hotspot_index.json from the same raw run on dash0 and re-render the three affected figures (apc-vs-hotspot, latency-bars, per-worker TTFT). Key number changes in window_1_results.md: - hotspot_index magnitudes corrected (all five policies; lmetric smallest delta at +0.7%, sticky largest at +16.1%) - "capped reduces hotspot 13%" -> "~10% (2.253 -> 2.020)" - TTFT/E2E/TPOT percentiles shift by <1% from floor->interp (unified TTFT p90 7.24 -> 7.35 s) Restructured "Caveats" into "Limitations (read this before quoting B3 numbers)": 1. Agentic dispatch coupling is by design — promoted from caveat to top-level methodology framing, tied to agentic_dispatch_coupling.md 2. B3 interference_index is binary (not size-graded) — added 3. Hot-sweep cache contamination (<1%) — kept 4. Unified interference unrecoverable — kept with explicit warning not to read unified's failure attribution as causal 5. w600 is a sample, not full trace — kept 6. Reuse decomposition is per-token in expectation — added current_results/characterization_claim_matrix.md updates: - The "heavy-tail not sole cause" claim now cites the corrected ~10% drop with the median bug noted - New supported claim: "B3 saturated-replay latency gaps include an agentic dispatch-coupling feedback term, which is intentional and matches production"; cited against agentic_dispatch_coupling.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%