Initial project scaffold

This commit is contained in:
2026-04-10 13:22:19 +00:00
commit 7fa69b1354
94 changed files with 3964 additions and 0 deletions

View File

@@ -0,0 +1,14 @@
// Workbook-local CUDA sketch for FlashAttention forward.
//
// Map this against the Triton sketch:
// - Triton program_id for query tile -> CUDA block ownership
// - Triton block pointer loads -> CUDA cooperative global-to-shared loads
// - Triton masks -> explicit edge and causal checks
// - Triton implicit block math -> thread/block index arithmetic
// TODO(student):
// 1. Assign a block to one batch/head/query tile.
// 2. Load a Q tile and loop over K/V tiles.
// 3. Compute score tiles and causal masking.
// 4. Update online softmax state.
// 5. Accumulate the output tile.