Prior cross-machine comparison (commit 1e86285) was invalid: dash0
baseline used warm instances with residual KV cache, inflating TTFT
by 2x. Evidence: inst_7 APC=68.3% impossible from 25 cold-start
requests; WARM TTFT p90=3.3s vs fresh=0.26s.
Fair same-machine comparison (both fresh restart on dash0):
Baseline: TTFT50=1.075 TPOT90=0.076 E2E50=5.075 OK=198/200
Elastic P2P: TTFT50=1.018 TPOT90=0.085 E2E50=6.977 OK=195/200
Elastic is WORSE due to Mooncake kv_both memory overhead.
Changes:
- REPORT.md: rewrite §3-4 with corrected results, add §3.5 errata
- pd_separation_analysis.md: update elastic TL;DR with correct numbers
- cache_aware_proxy.py: fix double-decrement bugs in offload path,
add 120s prefill timeout with co-located fallback (HEAVY_COLO_FALLBACK)
- bench.sh: standardized experiment harness with guaranteed GPU cleanup
and fresh-state verification (nvidia-smi check before start)
- run_elastic_stability_test.sh: two-phase elastic vs baseline test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
382 lines
18 KiB
Markdown
382 lines
18 KiB
Markdown
# Milestone Report: Elastic P2P vs PD-Combined Baseline
|
||
|
||
**Date**: 2026-05-22
|
||
**Author**: Gahow Wang
|
||
**Status**: Phase 1 complete — baseline + elastic validated, system-level analysis done
|
||
|
||
---
|
||
|
||
## 1. Research Question
|
||
|
||
For agentic LLM workloads (long input, short output, high KV cache reuse), is prefill-decode disaggregation beneficial? If full PD separation hurts (proven in §3), can **selective** disaggregation of only heavy requests improve serving latency while preserving KV cache locality?
|
||
|
||
## 2. Experimental Setup
|
||
|
||
### 2.1 Hardware
|
||
|
||
| Resource | Spec |
|
||
|----------|------|
|
||
| Machine | dash0 / dash1 (identical config) |
|
||
| GPU | 8× NVIDIA H20 96GB HBM, NVLink |
|
||
| Network | 4× ConnectX-7 200Gbps RDMA |
|
||
| Storage | cpfs shared storage across machines |
|
||
|
||
### 2.2 Software
|
||
|
||
| Component | Version | Notes |
|
||
|-----------|---------|-------|
|
||
| vLLM | 0.18.1 (source in `third_party/vllm/`) | Patched scheduler assert (see `patches/`) |
|
||
| Mooncake | 0.3.10 | RDMA-based KV transfer between instances |
|
||
| Python | 3.x managed by `uv` | `.venv/` at project root |
|
||
| Model | `Qwen3-Coder-30B-A3B-Instruct` | MoE 128 experts top-8, 3B active params |
|
||
| Model path | `~/models/Qwen/Qwen3-Coder-30B-A3B-Instruct` | Same on dash0 and dash1 |
|
||
|
||
### 2.3 Workload Trace
|
||
|
||
| Property | Value |
|
||
|----------|-------|
|
||
| Source | GLM-5.1 Agentic Coder, production cluster, 2h window |
|
||
| Raw trace | `~/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl` on dash0 |
|
||
| Total requests | 2,114,220 |
|
||
| Avg input tokens | 33,600 (p50=20k, p90=88k) |
|
||
| Avg output tokens | 445 (p50=80) |
|
||
| I/O ratio | 75.6× aggregate |
|
||
| Prefill token share | 98% |
|
||
| KV reuse (intra-session) | 91% of reusable blocks |
|
||
| Theoretical max APC | 71% (infinite cache, single instance) |
|
||
|
||
**Sampled trace for benchmarks**: `traces/sampled_1000req_seed42.jsonl` (1000 requests, seed=42, preserving session structure). For 200-request ablations: replayer `--request-limit 200`.
|
||
|
||
### 2.4 Two Configurations Compared
|
||
|
||
#### Baseline: PD-Combined (8× TP=1 DP=8)
|
||
|
||
```
|
||
8 independent vLLM instances, 1 GPU each, no Mooncake.
|
||
All instances do both prefill and decode.
|
||
Global scheduler (cache_aware_proxy.py --combined) handles:
|
||
- Session-sticky routing (multi-turn → same instance)
|
||
- Load-aware override (if pinned instance > 2× avg load, redirect)
|
||
- Cache-hit scoring (prefer instance with matching prefix blocks)
|
||
```
|
||
|
||
Launch:
|
||
```bash
|
||
# On dash0:
|
||
for i in $(seq 0 7); do
|
||
MASTER_PORT=$((29500+i)) CUDA_VISIBLE_DEVICES=$i \
|
||
vllm serve ~/models/Qwen/Qwen3-Coder-30B-A3B-Instruct \
|
||
--port $((8000+i)) --tp 1 \
|
||
--enable-prefix-caching --enforce-eager \
|
||
--gpu-memory-utilization 0.9 --max-model-len 200000 \
|
||
> /tmp/ab_base_$i.log 2>&1 &
|
||
done
|
||
|
||
python scripts/cache_aware_proxy.py \
|
||
--combined http://127.0.0.1:800{0..7} --port 9090
|
||
```
|
||
|
||
#### Elastic P2P Offload (8× TP=1 kv_both + selective offload)
|
||
|
||
```
|
||
8 independent vLLM instances, 1 GPU each, all kv_role=kv_both (Mooncake).
|
||
Same global scheduler, plus elastic offload logic:
|
||
- Proxy classifies each request: WARM (<5k new), MEDIUM (5-20k), HEAVY (>20k)
|
||
- WARM/MEDIUM: co-located on session-sticky instance (no KV transfer)
|
||
- HEAVY: prefill on a different instance (P), KV via Mooncake RDMA,
|
||
decode on session-sticky instance (D)
|
||
- Cap: max 4 concurrent offloads (MAX_OFFLOAD_INFLIGHT)
|
||
- P instance selection: round-robin with overload skip
|
||
```
|
||
|
||
Launch:
|
||
```bash
|
||
# On dash1 (or use scripts/launch_elastic_p2p.sh):
|
||
for i in $(seq 0 7); do
|
||
VLLM_MOONCAKE_BOOTSTRAP_PORT=$((8998+i)) \
|
||
MASTER_PORT=$((29500+i)) CUDA_VISIBLE_DEVICES=$i \
|
||
vllm serve ~/models/Qwen/Qwen3-Coder-30B-A3B-Instruct \
|
||
--port $((8000+i)) --tp 1 \
|
||
--enable-prefix-caching --enforce-eager \
|
||
--gpu-memory-utilization 0.9 --max-model-len 200000 \
|
||
--kv-transfer-config '{"kv_connector":"MooncakeConnector","kv_role":"kv_both"}' \
|
||
> /tmp/ab_elastic_$i.log 2>&1 &
|
||
sleep 2 # stagger to avoid NCCL port collision
|
||
done
|
||
|
||
# Wait for bootstrap servers
|
||
for bp in $(seq 8998 9005); do
|
||
until curl -s localhost:$bp/query > /dev/null 2>&1; do sleep 2; done
|
||
done
|
||
|
||
python scripts/cache_aware_proxy.py \
|
||
--combined http://127.0.0.1:800{0..7} \
|
||
--bootstrap-ports 8998,8999,9000,9001,9002,9003,9004,9005 \
|
||
--offload --heavy-threshold 20000 --port 9090
|
||
```
|
||
|
||
### 2.5 Benchmark Parameters
|
||
|
||
| Parameter | Value |
|
||
|-----------|-------|
|
||
| Requests | 200 (from sampled 1000-req trace, `--request-limit 200`) |
|
||
| Time scale | 20× (compress 2h trace into ~6min) |
|
||
| Max inflight sessions | 8 |
|
||
| Request timeout | 600s |
|
||
| vLLM flags | `--enforce-eager --enable-prefix-caching --max-model-len 200000` |
|
||
| GPU memory util | 0.9 |
|
||
| Fresh restart | Both configs started from cold (no warm cache) |
|
||
|
||
### 2.6 Reproducing the Benchmark
|
||
|
||
```bash
|
||
# Activate environment
|
||
cd ~/agentic-kv && source .venv/bin/activate
|
||
|
||
# Ensure sampled trace exists
|
||
python scripts/sample_trace.py \
|
||
--input ~/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl \
|
||
--output traces/sampled_1000req_seed42.jsonl \
|
||
--target-requests 1000 --seed 42
|
||
|
||
# Start GPU monitoring (in a separate terminal)
|
||
bash scripts/gpu_monitor.sh > outputs/<tag>/gpu_util.csv &
|
||
|
||
# Run replayer against proxy
|
||
python -m replayer \
|
||
--trace traces/sampled_1000req_seed42.jsonl \
|
||
--output outputs/<tag>/metrics.jsonl \
|
||
--endpoint http://localhost:9090 \
|
||
--time-scale 20 --max-inflight-sessions 8 \
|
||
--request-limit 200 -v
|
||
|
||
# Collect proxy breakdown (elastic only)
|
||
curl -s http://localhost:9090/breakdown > outputs/<tag>/breakdown.json
|
||
|
||
# Collect APC from vLLM logs
|
||
for i in $(seq 0 7); do
|
||
grep "Prefix cache hit rate\|External prefix cache hit rate" /tmp/<prefix>_$i.log | tail -2
|
||
done
|
||
```
|
||
|
||
## 3. Results
|
||
|
||
> **Errata (2026-05-22)**: The initial cross-machine A/B (dash0 baseline vs dash1 elastic) reported -44% E2E improvement. Post-hoc analysis revealed the dash0 baseline instances were **not freshly restarted** — residual KV cache from prior experiments caused 2× TTFT inflation. All results below use verified fresh-restart experiments on the same machine.
|
||
|
||
### 3.1 Fair Comparison (all fresh-restart, same machine dash0, 200 req)
|
||
|
||
| Config | OK/N | TTFT p50 | TTFT p90 | TPOT p90 | E2E p50 |
|
||
|--------|------|----------|----------|----------|---------|
|
||
| **Baseline (no Mooncake)** | **198/200** | **1.075s** | **9.384s** | **0.076s** | **5.075s** |
|
||
| LMetric routing | 198/200 | 1.099s | 9.392s | 0.073s | 5.205s |
|
||
| Elastic P2P (kv_both) | 195/200 | 1.018s | 11.312s | 0.085s | 6.977s |
|
||
|
||
### 3.2 Per-Class Breakdown
|
||
|
||
**Baseline (fresh):**
|
||
|
||
| Class | Count | % | TTFT p50 | TTFT p90 | TPOT p90 |
|
||
|-------|-------|---|----------|----------|----------|
|
||
| WARM (<5k) | 46 | 23% | 0.137s | 0.262s | 0.061s |
|
||
| MEDIUM (5-20k) | 50 | 25% | 0.921s | 1.846s | 0.079s |
|
||
| HEAVY (20-50k) | 64 | 32% | 2.660s | 6.278s | 0.076s |
|
||
| HEAVY (>50k) | 38 | 19% | 9.587s | 30.415s | 0.102s |
|
||
|
||
**Elastic P2P (fresh):**
|
||
|
||
| Class | Count | % | TTFT p50 | TTFT p90 | TPOT p90 |
|
||
|-------|-------|---|----------|----------|----------|
|
||
| WARM (<5k) | 46 | 23% | 0.142s | 0.279s | 0.072s |
|
||
| MEDIUM (5-20k) | 50 | 25% | 0.766s | 1.814s | 0.197s |
|
||
| HEAVY (>20k) | 99 | 51% | 6.390s | 22.668s | 0.085s |
|
||
|
||
### 3.3 Success Rate
|
||
|
||
| Config | OK | Total | Rate | Failure mode |
|
||
|--------|-----|-------|------|-------------|
|
||
| Baseline | 198 | 200 | 99.0% | RemoteProtocolError (replayer-side) |
|
||
| Elastic P2P | 195 | 200 | 97.5% | 2× RemoteProtocolError + 3× ReadTimeout on >60k |
|
||
|
||
Elastic's 3 extra errors are D-side KV pull failures: prefill succeeded on P, KV pushed to Mooncake, but D never produced first token (decode scheduler couldn't allocate KV cache space). Prefill timeout fallback (120s → co-located) was never triggered.
|
||
|
||
### 3.4 Routing Policy: Linear vs LMetric (OSDI'26)
|
||
|
||
LMetric (`score = P_tokens × BS`) vs linear (`score = ongoing_tokens - α·cache_hit`). Both fresh-restart, same trace.
|
||
|
||
| Policy | TTFT p50 | TTFT p90 | TPOT p90 | E2E p50 | Delta E2E |
|
||
|--------|----------|----------|----------|---------|-----------|
|
||
| Linear | 1.086s | 9.432s | 0.077s | 5.423s | — |
|
||
| LMetric | 1.099s | 9.392s | 0.073s | 5.205s | **-4.0%** |
|
||
|
||
LMetric provides modest improvement through better load balancing. Routing policy headroom is limited for this workload.
|
||
|
||
### 3.5 Errata: Why Prior Cross-Machine A/B Was Invalid
|
||
|
||
The initial comparison (commit `1e86285`) reported:
|
||
```
|
||
Baseline (dash0): TTFT50=2.383 E2E50=10.232 ← WRONG (warm instances)
|
||
Elastic (dash1): TTFT50=1.315 E2E50=5.708
|
||
Delta: -45% -44% ← INVALID
|
||
```
|
||
|
||
**Evidence that prior baseline was not fresh:**
|
||
1. `inst_7` APC = 68.3% — impossible from 25 cold-start requests (max ~25%)
|
||
2. WARM TTFT p90 = 3.327s (fresh = 0.262s, 12.7× gap) — indicates KV cache memory pressure from prior experiments
|
||
3. HEAVY TPOT p90 = 0.154s (fresh = 0.076s, 2.0× gap) — heavy prefill-decode interference from full KV cache
|
||
|
||
The elastic numbers on dash1 were genuinely fresh. The "improvement" was actually comparing fresh elastic against degraded baseline.
|
||
|
||
## 4. System-Level Analysis
|
||
|
||
### 4.1 Elastic P2P Does Not Improve Single-Machine Performance
|
||
|
||
Under fair comparison (same machine, both fresh):
|
||
|
||
| Metric | Baseline | Elastic | Delta |
|
||
|--------|----------|---------|-------|
|
||
| TTFT p50 | 1.075s | 1.018s | -5.3% |
|
||
| TTFT p90 | 9.384s | 11.312s | +20.5% |
|
||
| TPOT p90 | 0.076s | 0.085s | +11.6% |
|
||
| E2E p50 | 5.075s | 6.977s | +37.5% |
|
||
|
||
Elastic is **worse** on all metrics except TTFT p50. Root causes:
|
||
|
||
**1. Mooncake kv_both memory overhead**
|
||
|
||
Each instance with `kv_role=kv_both` maintains RDMA buffers + Mooncake bootstrap server, reducing GPU memory available for KV cache. This affects ALL requests (including WARM/MEDIUM that don't use P2P transfer), causing more cache eviction and higher TPOT.
|
||
|
||
Evidence: MEDIUM TPOT p90 = 0.197s (elastic) vs 0.079s (baseline) — **2.5× worse** despite MEDIUM requests not using P2P at all.
|
||
|
||
**2. D-side KV pull failures**
|
||
|
||
3 HEAVY requests completed prefill on P instance successfully but D-side never produced first token. The KV cache on D was too full to allocate space for the transferred blocks. These became 600s timeouts.
|
||
|
||
**3. P2P overhead without proportional benefit**
|
||
|
||
The P2P path adds: prefill queue on P (p50=6.3s) + KV transfer + decode start on D (p50=0.8s). For requests where the D instance isn't under heavy prefill load (which is the case on fresh instances), co-located execution is faster.
|
||
|
||
### 4.2 When Elastic P2P Could Help
|
||
|
||
Elastic P2P is designed for the scenario where D-instance decode is disrupted by co-located heavy prefill. On fresh instances with 200 requests, this contention is moderate. The benefit may emerge under:
|
||
- Higher sustained load (1000+ concurrent requests)
|
||
- Longer experiment duration (KV cache fills up, eviction pressure increases)
|
||
- Multi-machine deployment (P on a different node, no memory competition)
|
||
|
||
## 5. Data & Log Locations
|
||
|
||
### 5.1 Experiment Outputs (on respective machines)
|
||
|
||
| Directory | Machine | Config | Notes |
|
||
|-----------|---------|--------|-------|
|
||
| `outputs/ab_baseline/` | dash0 | Combined 8× TP=1 | ~~Initial A/B~~ (INVALIDATED: warm instances) |
|
||
| `outputs/ab_elastic/` | dash0 | Elastic P2P cap=4 | ~~Initial A/B~~ (INVALIDATED) |
|
||
| `outputs/baseline_stability_fresh/` | dash0 | Combined 8× fresh | **Canonical baseline** (§3.1) |
|
||
| `outputs/elastic_stability_*/` | dash0 | Elastic P2P kv_both fresh | **Canonical elastic** (§3.1) |
|
||
| `outputs/ab_linear/` | dash0 | Linear policy, 200 req | §3.4 routing policy comparison |
|
||
| `outputs/ab_lmetric/` | dash0 | LMetric policy, 200 req | §3.4 routing policy comparison |
|
||
| `outputs/gpu_ab_combined/` | local | Combined 8× TP=1 | Earlier run, has gpu_util.csv |
|
||
| `outputs/gpu_ab_pdsep/` | local | PD-Sep 4P+4D | Earlier run, has gpu_util.csv |
|
||
| `outputs/exp2_combined_tp1_dp8/` | local | Combined 8× TP=1 | 1000 req, cache-aware |
|
||
| `outputs/exp3_pd_sep_tp1_mooncake/` | local | PD-Sep 4P+4D Mooncake | 1000 req |
|
||
|
||
### 5.2 vLLM Instance Logs
|
||
|
||
| Path pattern | Machine | Config |
|
||
|-------------|---------|--------|
|
||
| `/tmp/ab_base_$i.log` | dash0 | Baseline instances 0-7 |
|
||
| `/tmp/ab_elastic_$i.log` | dash1 | Elastic instances 0-7 |
|
||
| `/tmp/lmetric_ab_inst_$i.log` | dash0 | Linear policy instances 0-7 (§3.6) |
|
||
| `/tmp/lmetric_inst_$i.log` | dash0 | LMetric policy instances 0-7 (§3.6) |
|
||
|
||
Logs contain `Prefix cache hit rate` and `External prefix cache hit rate` lines for APC extraction.
|
||
|
||
### 5.3 Trace Data
|
||
|
||
| Path | Machine | Description |
|
||
|------|---------|-------------|
|
||
| `~/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl` | dash0 | Full 2h production trace (2.1M requests) |
|
||
| `traces/sampled_1000req_seed42.jsonl` | all | Sampled 1000 requests (gitignored, regenerate with `sample_trace.py`) |
|
||
|
||
### 5.4 Analysis Documents
|
||
|
||
| File | Content |
|
||
|------|---------|
|
||
| `analysis/pd_separation_analysis.md` | Main report: PD-Sep vs Combined + Elastic P2P (§5) |
|
||
| `analysis/elastic_offload_design.md` | Elastic P2P design rationale |
|
||
| `analysis/kv_lifecycle_design.md` | KV cache eviction policy analysis |
|
||
| `analysis/adaptive_prefill_offload_design.md` | Initial adaptive offload design (superseded by elastic) |
|
||
|
||
## 6. Repository Structure
|
||
|
||
```
|
||
agentic-kv/
|
||
├── analysis/ # Research reports and design docs
|
||
│ ├── pd_separation_analysis.md # Main comprehensive report
|
||
│ ├── elastic_offload_design.md # Elastic P2P design
|
||
│ ├── kv_lifecycle_design.md # Cache eviction analysis
|
||
│ └── ...
|
||
├── replayer/ # Trace replay framework
|
||
│ ├── __main__.py # CLI entry: python -m replayer
|
||
│ ├── replay.py # Async replayer (session-aware, SSE streaming)
|
||
│ ├── trace.py # TraceRequest dataclass, session/hash_id handling
|
||
│ └── metrics.py # RequestMetrics, crash-safe JSONL sink
|
||
├── scripts/
|
||
│ ├── cache_aware_proxy.py # Global scheduler (combined + PD-sep + elastic offload)
|
||
│ ├── sample_trace.py # Cluster-to-machine trace sampler
|
||
│ ├── launch_vllm.sh # Launch combined TP=8
|
||
│ ├── launch_pd_mooncake.sh # Launch PD-Sep with Mooncake
|
||
│ ├── launch_elastic_p2p.sh # Launch elastic P2P (8× kv_both + offload proxy)
|
||
│ ├── run_experiments.sh # Full experiment matrix (combined/PD-sep)
|
||
│ ├── run_benchmark.sh # Single benchmark run
|
||
│ ├── gpu_monitor.sh # GPU utilization sampler (5s CSV)
|
||
│ ├── compute_roofline.py # Prefill/decode roofline analysis
|
||
│ ├── analyze_*.py # Various analysis scripts
|
||
│ └── compare_*.py # Experiment comparison scripts
|
||
├── patches/
|
||
│ ├── 0001-fix-kv-transfer-abort-race.patch
|
||
│ └── README.md
|
||
├── third_party/vllm/ # vLLM 0.18.1 source (with patch applied)
|
||
├── outputs/ # Experiment results (gitignored)
|
||
├── traces/ # Sampled traces (gitignored)
|
||
├── TODO.md # Original research goals
|
||
└── REPORT.md # This milestone report
|
||
```
|
||
|
||
## 7. Key Scripts Reference
|
||
|
||
| Script | What it does | Key flags |
|
||
|--------|-------------|-----------|
|
||
| `scripts/cache_aware_proxy.py` | Global scheduler + elastic offload proxy | `--combined`, `--offload`, `--policy {linear,lmetric}`, `--heavy-threshold`, `--bootstrap-ports` |
|
||
| `scripts/run_lmetric_ab.sh` | A/B: linear vs lmetric routing policy | Runs both experiments with fresh restart |
|
||
| `scripts/run_elastic_stability_test.sh` | Elastic vs baseline with full isolation | Fresh start/stop per experiment |
|
||
| `scripts/bench.sh` | Standard single-experiment harness | `--tag`, `--mode {baseline,elastic}` |
|
||
| `scripts/sample_trace.py` | Sample complete sessions from cluster trace | `--target-requests`, `--seed` |
|
||
| `python -m replayer` | Replay trace against vLLM endpoint | `--time-scale`, `--max-inflight-sessions`, `--request-limit` |
|
||
| `scripts/gpu_monitor.sh` | Sample nvidia-smi to CSV | Pipe to `outputs/<tag>/gpu_util.csv` |
|
||
| `scripts/launch_elastic_p2p.sh` | Launch all 8 kv_both instances + offload proxy | `HEAVY_THRESHOLD`, `MAX_OFFLOAD` env vars |
|
||
|
||
## 8. Conclusions & Next Steps
|
||
|
||
### Established findings:
|
||
1. Full PD separation is **net negative** for single-machine agentic workloads (KV cache memory wall)
|
||
2. Cache-aware session-sticky routing is the **dominant optimization** (+24pp APC, -60% TTFT vs round-robin)
|
||
3. **Elastic P2P offload does NOT improve single-machine performance** — Mooncake kv_both memory overhead (+11% TPOT, +37% E2E) outweighs prefill isolation benefit under moderate load (200 req)
|
||
4. LMetric (OSDI'26) provides modest **E2E -4%** over linear routing; routing policy headroom is limited
|
||
5. **Experimental methodology matters**: warm vs fresh instances cause 2× TTFT difference; all comparisons must use verified fresh restart
|
||
|
||
### Lessons learned:
|
||
- Prior cross-machine A/B (commit `1e86285`) was invalid — warm baseline inflated by 2× due to residual KV cache state
|
||
- `kv_role=kv_both` has non-trivial always-on overhead even when P2P transfer is not used
|
||
- Experiment isolation (kill all → verify GPU free → fresh start) is critical for reproducibility
|
||
|
||
### Open problems:
|
||
1. Elastic P2P may help under **sustained high load** (KV cache pressure makes co-located interference worse) — needs 1000-req experiment
|
||
2. Mooncake kv_both memory overhead quantification and potential lazy initialization
|
||
3. Multi-machine elastic (P on different node, no memory competition)
|
||
4. Router state accuracy: proxy shadow state vs vLLM-internal exact state (TODO: vLLM → Redis → router)
|
||
5. `scripts/bench.sh` standardized harness to prevent future warm-instance mistakes
|
||
|
||
---
|
||
|
||
*Updated 2026-05-22. Prior elastic A/B results (commit `1e86285`) invalidated — see §3.5 errata.*
|