// Workbook-local CUDA sketch for online softmax. // // TODO(student): // 1. Choose how one block owns one row or row tile. // 2. Keep running_max and running_sum across column tiles. // 3. Update the recurrence carefully for numerical stability. // 4. Normalize the final row.