xserv

Files

Gahow Wang f5ec10c2c3 xserv-cli: expose sampling params and greedy repetition penalty

Interactive REPL used to always call sample_greedy_last on both the
paged and legacy KV paths, so temperature/top-k/top-p and the repetition
penalty added in the sampling module were unreachable from the CLI.

- flag() helper parses --max-tokens / --temperature / --top-k / --top-p
  / --rep-penalty / --rep-window (defaults preserve prior behavior:
  temperature 0, top-p 1, penalty 1, window 512).
- pick_next() dispatches to sample_greedy_penalized only when
  temperature==0 and rep_penalty>1, otherwise to sample().
- Both Qwen3/GPT-2 paths and the GptOss paged path share the same
  sampler and both feed the rolling history window used for the penalty.
- Prompt input now unescapes literal "\n" so multi-turn prompts can be
  typed on one line.

2026-07-01 14:16:31 +08:00

xserv-cuda

style: format Rust workspace

2026-06-18 18:11:58 +08:00

xserv-distributed

style: format Rust workspace

2026-06-18 18:11:58 +08:00

xserv-kernels

cuda: deterministic BF16 gemv + paged attention reductions

2026-07-01 14:16:28 +08:00

xserv-model

xserv-cli: expose sampling params and greedy repetition penalty