Initial project scaffold

2026-04-10 13:22:19 +00:00
commit 7fa69b1354
94 changed files with 3964 additions and 0 deletions
--- a/tasks/05_flash_attention_fwd/cuda_skeleton.cu
+++ b/tasks/05_flash_attention_fwd/cuda_skeleton.cu
@@ -0,0 +1,14 @@
+// Workbook-local CUDA sketch for FlashAttention forward.
+//
+// Map this against the Triton sketch:
+// - Triton program_id for query tile -> CUDA block ownership
+// - Triton block pointer loads        -> CUDA cooperative global-to-shared loads
+// - Triton masks                      -> explicit edge and causal checks
+// - Triton implicit block math        -> thread/block index arithmetic
+
+// TODO(student):
+// 1. Assign a block to one batch/head/query tile.
+// 2. Load a Q tile and loop over K/V tiles.
+// 3. Compute score tiles and causal masking.
+// 4. Update online softmax state.
+// 5. Accumulate the output tile.