Files
xserv/crates
Gahow Wang 14925154a3 eagle3: γ≥2 recursive drafting + batched verify with hooks
Adds infrastructure for γ≥2 EAGLE speculative decoding:

qwen3.rs:
- New forward_verify_paged_decode_attention_with_hidden: same as the
  existing verify but also captures target hidden states at 3 hook
  layers, one per verify position. Needed to seed next round's EAGLE.

eagle3.rs:
- step split into step (unchanged public API) + step_with_aux (also
  returns final hidden state) + step_recursive (takes fused_h directly,
  no fc+3-hidden combine). This mirrors the EAGLE3 paper: γ=1 uses
  target hooks + fc; γ≥2 uses previous EAGLE aux as fused_h for
  subsequent drafts, approximating target hidden.

bench-eagle3.rs:
- New run_eagle_gamma_multi function with --gamma CLI (default 2).
- Per round: recursive EAGLE γ drafts, verify [prev_token, d0..d_{γ-1}]
  in one target forward, accept longest prefix, correction via 1 more
  target decode.
- max_seqs bumped to 16 in the paged cache so verify can batch up to
  16 rows.

γ=2 test result (5 prompts × 32 tokens, dash5):
  matched=false — sequences diverge
  acceptance_rate = 29.8% at γ=2 (~1.1 tokens accepted per draft)
  speedup_e2e = 0.52x (SLOWER than baseline)

The divergence bug is in the verify's re-writing of prev_token's K/V
at position round_pos-1. In principle matmul_batched_gemv at row-0
should be bit-exact with the seed decode's launch_gemv_bf16, but the
sequence output diverges so something is off. Investigation pending
(likely the correction decode step or seed_hooks position offset).

γ=1 path still works correctly (matched=true, acceptance 20%,
speedup 0.95x) from the previous commit. The γ≥2 path is scaffolded
but not yet correct — next step is to debug the verify-write path,
then measure real speedup.
2026-07-01 18:01:55 +08:00
..
2026-06-18 18:11:58 +08:00
2026-06-18 18:11:58 +08:00