- eagle3.rs: Eagle3Head struct loads AngelSlim/Qwen3-8B_eagle3 safetensors,
runs a single draft step via fc(concat(h_low, h_mid, h_high)) +
concat(input_norm(emb), hidden_norm(fused_h)) → 1 midlayer → norm →
lm_head → argmax in draft_vocab(32000) → d2t → target_vocab.
- qwen3.rs: new decode_core_with_hidden method that mirrors decode_core
but captures hidden states at 3 configurable layer indices (default
[11, 23, 35] for the 36-layer Qwen3-8B). Also expose embed_tokens_tensor
and (in eagle3) map_draft_to_target as public accessors.
- loader.rs: make_tensor now pub(crate) so eagle3 can reuse it.
- bin/check-eagle3.rs: sanity binary that loads target + EAGLE, runs one
prefill + one decode + one EAGLE step, prints the top-5 EAGLE predictions.
Verified on dash5 with prompt "The capital of France is":
target says: " Paris" then "."
EAGLE top-5: "," / " Paris" / " Madrid" / "." / " Berlin"
Weights load correctly, d2t mapping works, hidden state hooks are the
right shape ([1, 4096]), and EAGLE produces thematically-relevant tokens.
The top-1 pick "," doesn't match target's "." at this position, but
that's expected: this test uses hidden states from a single decode step
with no recursive chaining. A full speculative loop still needs the
γ≥2 verify + accept path wired up (next step).