eagle3: tree drafting with top-2 siblings — speedup_e2e = 1.17× 🎉
Implements the full tree speculative drafting loop using the
copy_kv_position primitive from the previous commit.
Tree structure per round (4 verify tokens):
[pending_prev, d0_top1, d0_top2, d1_chain_from_top1]
positions: [P, P+1, P+1, P+2]
tree_mask: row0=[1000] row1=[1100] row2=[1010] row3=[1101]
Acceptance logic:
- d0_top1 matches target → check d1 chain → commit 2 or 3 tokens.
- d0_top2 matches target → copy_kv_position(P+2→P+1) + commit 2.
- Neither → commit pending_prev only.
50 prompts × 64 tokens on dash5 (Qwen3-8B + AngelSlim EAGLE3):
acceptance_rate = 14.1% (vs 11.3% non-tree γ=2)
target_steps = 2231 (vs 2432 non-tree)
baseline_tpot_ms = 12.51, spec_tpot_ms = 10.68
speedup_e2e = 1.17× (vs 1.10× non-tree)
The top-2 sibling adds ~3% absolute acceptance, which translates to
~7% additional speedup. The copy_kv_position cost is negligible (<6μs).
CLI: bench-eagle3 --tree enables the tree path.