Initial project scaffold

This commit is contained in:
2026-04-10 13:22:19 +00:00
commit 7fa69b1354
94 changed files with 3964 additions and 0 deletions

View File

@@ -0,0 +1,23 @@
# Profiling Examples
## Nsight Compute
```bash
./tools/profile_ncu.sh python bench/bench_vector_add.py --device cuda --mode triton
./tools/profile_ncu.sh python bench/bench_softmax.py --device cuda --mode torch
```
## Nsight Systems
```bash
./tools/profile_nsys.sh python bench/bench_matmul.py --device cuda --mode triton
./tools/profile_nsys.sh python bench/bench_attention.py --device cuda --mode torch
```
## First Things To Inspect
- median runtime from the benchmark harness
- whether warmup was excluded
- whether kernels overlap or serialize
- whether memory throughput is near a practical ceiling
- whether a kernel launch is tiny enough that launch overhead matters