xserv

Files

Gahow Wang 377a04b81f tokenizer: read pre-tokenizer regex from tokenizer.json

Parse the model's `pre_tokenizer` section to extract its Split regex
instead of hardcoding the GPT-2 pattern.  The gpt-oss-20b model uses
a GPT-4-style regex that produces different word boundaries, causing a
1-token prompt mismatch vs HuggingFace (136 → 135 tokens, now aligned).

Unsupported lookahead `(?!\S)` is stripped — it only affects trailing
whitespace edge cases.  Falls back to the old GPT-2/Qwen heuristic if
the model regex fails to compile or is absent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-31 13:22:35 +08:00

xserv-cuda

cuda: add cached_trim() to release pooled GPU buffers

2026-05-30 12:50:04 +08:00

xserv-distributed

distributed: NCCL P2P primitives (PpContext + send/recv)

2026-05-29 18:45:42 +08:00

xserv-kernels

kernels: flash attention with gpt-oss sinks + sliding window

2026-05-31 00:56:10 +08:00

xserv-model

bench-gpt-oss: teacher-forced diagnostics + --prompt flag