24 lines
681 B
Markdown
24 lines
681 B
Markdown
# Profiling Examples
|
|
|
|
## Nsight Compute
|
|
|
|
```bash
|
|
./tools/profile_ncu.sh python bench/bench_vector_add.py --device cuda --mode triton
|
|
./tools/profile_ncu.sh python bench/bench_softmax.py --device cuda --mode torch
|
|
```
|
|
|
|
## Nsight Systems
|
|
|
|
```bash
|
|
./tools/profile_nsys.sh python bench/bench_matmul.py --device cuda --mode triton
|
|
./tools/profile_nsys.sh python bench/bench_attention.py --device cuda --mode torch
|
|
```
|
|
|
|
## First Things To Inspect
|
|
|
|
- median runtime from the benchmark harness
|
|
- whether warmup was excluded
|
|
- whether kernels overlap or serialize
|
|
- whether memory throughput is near a practical ceiling
|
|
- whether a kernel launch is tiny enough that launch overhead matters
|