Initial project scaffold
This commit is contained in:
14
tasks/05_flash_attention_fwd/cuda_skeleton.cu
Normal file
14
tasks/05_flash_attention_fwd/cuda_skeleton.cu
Normal file
@@ -0,0 +1,14 @@
|
||||
// Workbook-local CUDA sketch for FlashAttention forward.
|
||||
//
|
||||
// Map this against the Triton sketch:
|
||||
// - Triton program_id for query tile -> CUDA block ownership
|
||||
// - Triton block pointer loads -> CUDA cooperative global-to-shared loads
|
||||
// - Triton masks -> explicit edge and causal checks
|
||||
// - Triton implicit block math -> thread/block index arithmetic
|
||||
|
||||
// TODO(student):
|
||||
// 1. Assign a block to one batch/head/query tile.
|
||||
// 2. Load a Q tile and loop over K/V tiles.
|
||||
// 3. Compute score tiles and causal masking.
|
||||
// 4. Update online softmax state.
|
||||
// 5. Accumulate the output tile.
|
||||
Reference in New Issue
Block a user