// Workbook-local CUDA sketch for vector add.
//
// The repository-level implementation lives in kernels/cuda/src/vector_add.cu.
// Read this side by side with the Triton version.

// TODO(student):
// 1. Compute global_idx from blockIdx.x, blockDim.x, and threadIdx.x.
// 2. Guard the tail with if (global_idx < numel).
// 3. Load x[global_idx] and y[global_idx].
// 4. Store the sum.