New crate xtrain-model: a from-scratch decoder built entirely from the
autodiff op set.
- Config (tiny: dim=32, 2 layers, 2 heads, head_dim=16, ffn=64).
- TinyTransformer: embedding -> N x {pre-RMSNorm -> multi-head causal
attention (RoPE, additive causal mask, per-head SDPA) -> residual;
pre-RMSNorm -> SwiGLU MLP -> residual} -> final RMSNorm -> LM head.
x@W weight convention (engine GEMM is plain A@B); dim=n_heads*head_dim.
- params()/zero_grad-able leaves for the optimizer; param_to_host export.
- overfit test: char-level bring-up (embedded text -> vocab -> shifted
targets), minimal hand-written GD (p -= lr*grad) memorises one fixed
batch -> loss ~0 + greedy argmax matches targets. End-to-end fwd+bwd
correctness signal. Gated #![cfg(not(no_cuda))].
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
27 lines
867 B
Rust
27 lines
867 B
Rust
use std::env;
|
|
use std::path::Path;
|
|
use std::process::Command;
|
|
|
|
// Same per-crate convention as the other crates: this crate's tiny-transformer
|
|
// forward/backward calls GPU ops (via xtrain-autodiff / xtrain-tensor), so it
|
|
// gates GPU code + tests behind `not(no_cuda)`. cfg does not propagate across
|
|
// crates, so each crate re-detects nvcc. No CUDA is compiled here.
|
|
fn main() {
|
|
println!("cargo:rustc-check-cfg=cfg(no_cuda)");
|
|
|
|
let cuda_path = env::var("CUDA_HOME")
|
|
.or_else(|_| env::var("CUDA_PATH"))
|
|
.unwrap_or_else(|_| "/usr/local/cuda".to_string());
|
|
|
|
if !nvcc_available(&cuda_path) {
|
|
println!("cargo:rustc-cfg=no_cuda");
|
|
}
|
|
}
|
|
|
|
fn nvcc_available(cuda_path: &str) -> bool {
|
|
if Command::new("nvcc").arg("--version").output().is_ok() {
|
|
return true;
|
|
}
|
|
Path::new(&format!("{cuda_path}/bin/nvcc")).exists()
|
|
}
|