10 lines
329 B
Plaintext
10 lines
329 B
Plaintext
// Workbook-local CUDA sketch for tiled matmul.
|
|
//
|
|
// TODO(student):
|
|
// 1. Choose a block tile size, for example 16x16 or 32x32.
|
|
// 2. Load one A tile and one B tile into shared memory.
|
|
// 3. Synchronize.
|
|
// 4. Accumulate partial products.
|
|
// 5. Synchronize before loading the next tile.
|
|
// 6. Store the final C element or tile.
|