// Workbook-local CUDA sketch for vector add. // // The repository-level implementation lives in kernels/cuda/src/vector_add.cu. // Read this side by side with the Triton version. // TODO(student): // 1. Compute global_idx from blockIdx.x, blockDim.x, and threadIdx.x. // 2. Guard the tail with if (global_idx < numel). // 3. Load x[global_idx] and y[global_idx]. // 4. Store the sum.