phase 2: tensor abstraction layer

- DType enum (F32, F16, BF16) with TensorDType trait
- Shape utilities: contiguous_strides, broadcast_shape, broadcast_strides
- Storage with Arc reference counting (CPU Vec<u8> or GPU GpuBuffer)
- Device enum (Cpu, Cuda(id)) with to_device transfer
- Tensor type with strided layout: reshape, transpose, squeeze, unsqueeze
- contiguous() copies non-contiguous views to contiguous layout
- from_slice, zeros, ones constructors
- as_slice<T> for typed CPU read access, data_ptr for GPU kernel launch
- CPU↔GPU roundtrip verified
- All 27 tests pass (12 cuda + 4 shape + 11 tensor)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This commit is contained in:

Gahow Wang

2026-05-21 19:45:22 +08:00

parent c8f7bc0c3c

commit a83971fa25

8 changed files with 654 additions and 0 deletions

									
										1

Cargo.toml
									
												View File
												
				@@ -2,6 +2,7 @@

				resolver = "2"

				members = [

				    "crates/xserv-cuda",

				    "crates/xserv-tensor",

				]

				[workspace.package]

phase 2: tensor abstraction layer

1 Cargo.toml Unescape Escape View File

1

Cargo.toml

View File