xserv

Files

Gahow Wang 8f11d6e5cd eagle3: fix EAGLE_HOOK_LAYERS to [2, 18, 33] for Qwen3-8B

The initial [11, 23, 35] (equally-spaced) guess was wrong — EAGLE3 heads
are trained against specific target layer indices, and using different
ones at inference gives wrong outputs. Correct values come from vLLM
speculators' training config for Qwen3-8B:

  https://github.com/vllm-project/speculators/blob/main/examples/train/
  dflash_qwen3_8b_sharegpt_online_5k.sh

which pins target_layer_ids to "2 18 33". Re-running check-eagle3 with
the fix produces coherent top-5 for "The capital of France is":

  Old ([11,23,35]): "," / " Paris" / " Madrid" / "." / " Berlin"
  New ([2,18,33]):  " Paris" / " Tokyo" / " Madrid" / "," / "."

Top-1 still differs from target's next token, but that's because EAGLE
compares (state_that_produced_prev, prev_token) → next, and the exact
pairing convention may need one more offset check when integrated into
the full speculative loop.

2026-07-01 17:29:00 +08:00

xserv-cuda

style: format Rust workspace

2026-06-18 18:11:58 +08:00

xserv-distributed

style: format Rust workspace

2026-06-18 18:11:58 +08:00

xserv-kernels

speculative: batched-GEMV kernel for verify path (Phase 24 step 1)

2026-07-01 16:13:37 +08:00

xserv-model

eagle3: fix EAGLE_HOOK_LAYERS to [2, 18, 33] for Qwen3-8B