Initial project scaffold
This commit is contained in:
23
tasks/07_profiling/profile_examples.md
Normal file
23
tasks/07_profiling/profile_examples.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# Profiling Examples
|
||||
|
||||
## Nsight Compute
|
||||
|
||||
```bash
|
||||
./tools/profile_ncu.sh python bench/bench_vector_add.py --device cuda --mode triton
|
||||
./tools/profile_ncu.sh python bench/bench_softmax.py --device cuda --mode torch
|
||||
```
|
||||
|
||||
## Nsight Systems
|
||||
|
||||
```bash
|
||||
./tools/profile_nsys.sh python bench/bench_matmul.py --device cuda --mode triton
|
||||
./tools/profile_nsys.sh python bench/bench_attention.py --device cuda --mode torch
|
||||
```
|
||||
|
||||
## First Things To Inspect
|
||||
|
||||
- median runtime from the benchmark harness
|
||||
- whether warmup was excluded
|
||||
- whether kernels overlap or serialize
|
||||
- whether memory throughput is near a practical ceiling
|
||||
- whether a kernel launch is tiny enough that launch overhead matters
|
||||
Reference in New Issue
Block a user