xserv

Files

Gahow Wang 68b55fa1e6 eagle3: γ=1 speculative bench + first end-to-end measurement

bench-eagle3.rs runs the full loop: prefill → for each output token, one
EAGLE draft + one target decode with hidden state hook. Measures
acceptance rate and speedup vs pure target decode.

First numbers on dash5 (10 prompts × 32 tokens, γ=1):
  matched=true (10/10)
  acceptance_rate=1.3% (4/300)  ← should be ~60-70% per EAGLE3 paper
  speedup_e2e=0.95×             ← below 1 because γ=1 does 1 target
                                  decode per output token regardless of
                                  acceptance
  target_steps=320 for 320 tokens

Positive: the plumbing is correct — target/EAGLE both run without error,
output sequences match baseline, all shapes/dtypes check out. The
sanity check earlier showed EAGLE top-5 contains thematically-plausible
tokens (Paris/Tokyo/Madrid for "capital of France is").

Negative: 1.3% acceptance means EAGLE is not currently learning to match
target's greedy top-1. Root causes to investigate:
1. Token/hook pairing convention. Paper uses (h_that_produced_t_i, t_i)
   → predicts t_{i+1}. My bench does the same but sanity check earlier
   suggested pairing might be one off.
2. Missing "training-time test" projection: EAGLE was trained to feed
   its own prev output as fused_h for the next step (γ>1 chaining).
   Currently we always use target hooks, which is what pairing A/B do
   for γ=1, but may not be aligned with training-time behavior.
3. Hook site: I capture x AFTER the residual+MLP. Paper may want x
   BEFORE, or the "hidden_states" as used by the final norm+lm_head.
   Currently the same tensor feeds into final norm during the target
   forward, so pre/post-residual is what I have — but confirming
   against reference Python impl is needed.
4. Weight loading: transposes assume [in,out] → [out,in]. Need to
   validate at least one output layer's shape against expected.

Next step (deferred to another session): download AngelSlim reference
inference code, run same prompt through it, compare intermediate
activations at each stage to isolate the discrepancy.

2026-07-01 17:32:53 +08:00

xserv-cuda

style: format Rust workspace

2026-06-18 18:11:58 +08:00

xserv-distributed

style: format Rust workspace

2026-06-18 18:11:58 +08:00

xserv-kernels

speculative: batched-GEMV kernel for verify path (Phase 24 step 1)

2026-07-01 16:13:37 +08:00

xserv-model

eagle3: γ=1 speculative bench + first end-to-end measurement