// Workbook-local CUDA sketch for tiled matmul. // // TODO(student): // 1. Choose a block tile size, for example 16x16 or 32x32. // 2. Load one A tile and one B tile into shared memory. // 3. Synchronize. // 4. Accumulate partial products. // 5. Synchronize before loading the next tile. // 6. Store the final C element or tile.