Parse the model's `pre_tokenizer` section to extract its Split regex
instead of hardcoding the GPT-2 pattern. The gpt-oss-20b model uses
a GPT-4-style regex that produces different word boundaries, causing a
1-token prompt mismatch vs HuggingFace (136 → 135 tokens, now aligned).
Unsupported lookahead `(?!\S)` is stripped — it only affects trailing
whitespace edge cases. Falls back to the old GPT-2/Qwen heuristic if
the model regex fails to compile or is absent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>