11 lines
386 B
Plaintext
11 lines
386 B
Plaintext
// Workbook-local CUDA sketch for vector add.
|
|
//
|
|
// The repository-level implementation lives in kernels/cuda/src/vector_add.cu.
|
|
// Read this side by side with the Triton version.
|
|
|
|
// TODO(student):
|
|
// 1. Compute global_idx from blockIdx.x, blockDim.x, and threadIdx.x.
|
|
// 2. Guard the tail with if (global_idx < numel).
|
|
// 3. Load x[global_idx] and y[global_idx].
|
|
// 4. Store the sum.
|