Files
xserv/crates
Gahow Wang e04a8ffb18 speculative: EAGLE3 draft head implementation (Phase 25 step 1)
- eagle3.rs: Eagle3Head struct loads AngelSlim/Qwen3-8B_eagle3 safetensors,
  runs a single draft step via fc(concat(h_low, h_mid, h_high)) +
  concat(input_norm(emb), hidden_norm(fused_h)) → 1 midlayer → norm →
  lm_head → argmax in draft_vocab(32000) → d2t → target_vocab.
- qwen3.rs: new decode_core_with_hidden method that mirrors decode_core
  but captures hidden states at 3 configurable layer indices (default
  [11, 23, 35] for the 36-layer Qwen3-8B). Also expose embed_tokens_tensor
  and (in eagle3) map_draft_to_target as public accessors.
- loader.rs: make_tensor now pub(crate) so eagle3 can reuse it.
- bin/check-eagle3.rs: sanity binary that loads target + EAGLE, runs one
  prefill + one decode + one EAGLE step, prints the top-5 EAGLE predictions.
  Verified on dash5 with prompt "The capital of France is":
    target says: " Paris" then "."
    EAGLE top-5: "," / " Paris" / " Madrid" / "." / " Berlin"
  Weights load correctly, d2t mapping works, hidden state hooks are the
  right shape ([1, 4096]), and EAGLE produces thematically-relevant tokens.

The top-1 pick "," doesn't match target's "." at this position, but
that's expected: this test uses hidden states from a single decode step
with no recursive chaining. A full speculative loop still needs the
γ≥2 verify + accept path wired up (next step).
2026-07-01 17:23:22 +08:00
..
2026-06-18 18:11:58 +08:00
2026-06-18 18:11:58 +08:00