Initial project scaffold

This commit is contained in:
wjh
2026-04-10 13:15:06 +00:00
commit a4a6b1f1c8
94 changed files with 3964 additions and 0 deletions

View File

@@ -0,0 +1,2 @@
"""Environment sanity task."""

View File

@@ -0,0 +1,13 @@
# Environment Checklist
- PyTorch imports successfully
- `torch.cuda.is_available()` is `True`
- At least one CUDA device is visible
- The GPU name matches the machine you expect to be using
- Device capability is printed and recorded
- Triton imports successfully, or you know why it does not
- `torch.version.cuda` is visible when using CUDA-enabled PyTorch
- `nvcc --version` works if you plan to build the CUDA extension
- `nvidia-smi` works if the driver stack is installed
If any line above fails, fix that before working on later tasks.

View File

@@ -0,0 +1,46 @@
# Task 00: Environment Sanity
## 1. Problem Statement
Confirm that your machine can see the GPU software stack needed for the rest of the lab.
## 2. Expected Input/Output Shapes
This task is informational rather than tensor-shaped. The outputs are environment facts:
- PyTorch version
- CUDA availability
- Triton import status
- GPU name
- device capability
- toolkit and driver hints when available
## 3. Performance Intuition
Do not benchmark anything yet. First confirm that the environment is what you think it is.
## 4. Memory Access Discussion
Not applicable yet. The point is to avoid debugging kernels when the real problem is a mismatched driver or toolkit.
## 5. What Triton Is Abstracting
Even importing Triton depends on a compatible Python, PyTorch, driver, and GPU stack.
## 6. What CUDA Makes Explicit
CUDA makes the toolkit and architecture targeting explicit. Keep that explicit throughout this repo.
## 7. Reflection Questions
- What exact GPU name does the system report?
- What device capability does PyTorch report?
- Does Triton import cleanly?
- Which part of the stack would you inspect first if CUDA is unavailable?
## 8. Implementation Checklist
- Run `python tools/check_env.py`
- Run `python tools/print_device_info.py`
- Write down the reported capability
- Set `KERNEL_LAB_CUDA_ARCH` explicitly if you need to change architecture targeting