eagle3: fix EAGLE_HOOK_LAYERS to [2, 18, 33] for Qwen3-8B
The initial [11, 23, 35] (equally-spaced) guess was wrong — EAGLE3 heads are trained against specific target layer indices, and using different ones at inference gives wrong outputs. Correct values come from vLLM speculators' training config for Qwen3-8B: https://github.com/vllm-project/speculators/blob/main/examples/train/ dflash_qwen3_8b_sharegpt_online_5k.sh which pins target_layer_ids to "2 18 33". Re-running check-eagle3 with the fix produces coherent top-5 for "The capital of France is": Old ([11,23,35]): "," / " Paris" / " Madrid" / "." / " Berlin" New ([2,18,33]): " Paris" / " Tokyo" / " Madrid" / "," / "." Top-1 still differs from target's next token, but that's because EAGLE compares (state_that_produced_prev, prev_token) → next, and the exact pairing convention may need one more offset check when integrated into the full speculative loop.
This commit is contained in:
@@ -15,7 +15,11 @@ use std::path::Path;
|
||||
use xserv_kernels::*;
|
||||
use xserv_tensor::{DType, Device, Tensor};
|
||||
|
||||
pub const EAGLE_HOOK_LAYERS: [usize; 3] = [11, 23, 35];
|
||||
/// Target layers to hook for EAGLE3 auxiliary hidden states, for Qwen3-8B
|
||||
/// (36 layers). Value comes from AngelSlim/vLLM speculators training config
|
||||
/// `dflash_qwen3_8b_sharegpt_online_5k.sh` which specifies target_layer_ids
|
||||
/// = "2 18 33". Must match training-time selection or EAGLE outputs are wrong.
|
||||
pub const EAGLE_HOOK_LAYERS: [usize; 3] = [2, 18, 33];
|
||||
const DRAFT_VOCAB_SIZE: usize = 32000;
|
||||
|
||||
fn matmul_2d(a: &Tensor, b: &Tensor) -> Tensor {
|
||||
|
||||
Reference in New Issue
Block a user