Files
kernel-lab/tasks/07_profiling/profile_examples.md
2026-04-10 13:15:06 +00:00

681 B

Profiling Examples

Nsight Compute

./tools/profile_ncu.sh python bench/bench_vector_add.py --device cuda --mode triton
./tools/profile_ncu.sh python bench/bench_softmax.py --device cuda --mode torch

Nsight Systems

./tools/profile_nsys.sh python bench/bench_matmul.py --device cuda --mode triton
./tools/profile_nsys.sh python bench/bench_attention.py --device cuda --mode torch

First Things To Inspect

  • median runtime from the benchmark harness
  • whether warmup was excluded
  • whether kernels overlap or serialize
  • whether memory throughput is near a practical ceiling
  • whether a kernel launch is tiny enough that launch overhead matters