31 lines
1.9 KiB
Markdown
31 lines
1.9 KiB
Markdown
# Triton Vs CUDA
|
|
|
|
## Concept Mapping Table
|
|
|
|
| Triton concept | CUDA concept | What to notice |
|
|
| --- | --- | --- |
|
|
| `tl.program_id(axis=0)` | `blockIdx.x` and block ownership | Both assign a chunk of logical work to a block-scale unit |
|
|
| `tl.arange(0, BLOCK)` | `threadIdx.x` or manual lane-local offsets | Triton expresses vectors of indices directly |
|
|
| masked `tl.load` / `tl.store` | explicit `if (idx < n)` checks | Same boundary problem, different syntax |
|
|
| blocked tensor operations | thread/block decomposition plus loops | Triton lifts index sets into tensor expressions |
|
|
| pointer arithmetic in element units | byte-addressed pointer math and indexing | CUDA makes layout mechanics more visible |
|
|
| implicit vectorized math | manual scalar or vector intrinsics | Triton often reads like array algebra |
|
|
| autotuned launch parameters | manual block-size tuning | Both still depend on the memory hierarchy |
|
|
| block pointers and tile views | shared memory tiles and cooperative loads | The same reuse idea shows up with different APIs |
|
|
| reduction combinators | warp/block reductions | Same algorithmic structure, different implementation burden |
|
|
| masks and predicates | control flow and bounds checks | Divergence and predication still matter |
|
|
|
|
## How To Compare Side By Side
|
|
|
|
1. Start from the reference PyTorch function and identify the mathematical operator.
|
|
2. In the Triton version, ask what one program instance owns.
|
|
3. In the CUDA version, ask what one block and one thread own.
|
|
4. Match the memory reads and writes, not just the variable names.
|
|
5. Write down where reduction state lives in each version.
|
|
6. For tiled code, identify when data moves from global memory to on-chip storage.
|
|
7. Only then compare performance.
|
|
|
|
## Rule Of Thumb
|
|
|
|
Triton usually compresses the "how" so you can focus on the blocked tensor math. CUDA exposes the "how" directly, which is why it is valuable to study both.
|