Files
xserv/crates
Gahow Wang 0c6135aea3 bench-gpt-oss: teacher-forced diagnostics + --prompt flag
Add --prompt to override the fixed prompt, and two teacher-forced
diagnostics: --forced runs prefill over prompt+oracle ids and reports
per-position top-1 agreement; --forced-decode walks the oracle trajectory
through the decode path with per-position agreement bucketed by position,
to localize long-context decode divergence from the reference.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 00:56:46 +08:00
..