Commit Graph

3 Commits

Author SHA1 Message Date
a83971fa25 phase 2: tensor abstraction layer
- DType enum (F32, F16, BF16) with TensorDType trait
- Shape utilities: contiguous_strides, broadcast_shape, broadcast_strides
- Storage with Arc reference counting (CPU Vec<u8> or GPU GpuBuffer)
- Device enum (Cpu, Cuda(id)) with to_device transfer
- Tensor type with strided layout: reshape, transpose, squeeze, unsqueeze
- contiguous() copies non-contiguous views to contiguous layout
- from_slice, zeros, ones constructors
- as_slice<T> for typed CPU read access, data_ptr for GPU kernel launch
- CPU↔GPU roundtrip verified
- All 27 tests pass (12 cuda + 4 shape + 11 tensor)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
phase2
2026-05-21 19:45:22 +08:00
c8f7bc0c3c phase 0+1: fix Rust 2024 edition compat + memory query
- unsafe extern "C" blocks (Rust 2024 requirement)
- unsafe blocks inside unsafe fn bodies
- Use cudaMemGetInfo for accurate GPU memory reporting
- Remove cc "cuda" feature (doesn't exist, built-in)
- All 12 tests pass on RTX 5090 (CC 12.0, 170 SMs, 32GB)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
phase0-1
2026-05-21 19:40:49 +08:00
9806b4db35 phase 0+1: project scaffold + xserv-cuda crate
- Cargo workspace with xserv-cuda crate
- CUDA FFI bindings (cudart: memory, stream, device, error)
- GpuBuffer RAII wrapper with H2D/D2H/D2D copy
- CudaStream wrapper with RAII Drop
- CachingAllocator with size-bucketed free lists
- PinnedBuffer for page-locked host memory
- Device info query via cudaDeviceGetAttribute
- Vector-add CUDA kernel smoke test
- Integration test suite (11 tests)
- build.rs: cc crate compiles .cu for SM 12.0
- sync-and-build.sh for remote build on dash5
- Roadmap doc (docs/00-roadmap.md) and Phase 0+1 design doc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-21 18:40:22 +08:00