partial_cmp().unwrap() in the top-k / top-p sort and softmax paths would
panic the engine thread on a single NaN logit. The greedy argmax path
is already NaN-safe. Add a one-pass NaN → -inf sweep on the extracted
last_row before temperature scaling, which is equivalent to masking the
token and keeps the sampler deterministic. Warn once when triggered so
the underlying numeric bug isn't silently hidden.