8 lines
278 B
Plaintext
8 lines
278 B
Plaintext
// Workbook-local CUDA sketch for online softmax.
|
|
//
|
|
// TODO(student):
|
|
// 1. Choose how one block owns one row or row tile.
|
|
// 2. Keep running_max and running_sum across column tiles.
|
|
// 3. Update the recurrence carefully for numerical stability.
|
|
// 4. Normalize the final row.
|