The hand-rolled gpt-oss system message dropped the canonical harmony
structure (identity / knowledge cutoff / current date / Reasoning level),
putting the model out of distribution — greedy decoding then flipped into
garbage or analysis loops on ~half of single-turn requests. Emit the
canonical system message (matching the model's chat_template.jinja
build_system_message macro) with Reasoning: low, plus a today_ymd() date
helper.
Also:
- Default the repetition penalty to off (1.0). Penalizing the harmony
control tokens (<|channel|>/<|message|>/<|start|>) that must repeat to
open the final channel made gpt-oss stop right after analysis, emitting
nothing.
- Suppress the literal "assistant" role header emitted between the
analysis and final channels (only print in the final channel, moe only;
non-moe Qwen3 stays in Normal and prints as before).
Verified on dash5 (TP=2): single-turn "capital of France" is now stable
across runs with a clean final answer; Qwen3 chat unaffected.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>