Add ncclSend/ncclRecv FFI and a PpContext that initializes a NCCL
communicator across P pipeline stages and hands the hidden state to
neighbour stages on the null stream. Mirrors TpContext; the collective
differs (point-to-point hand-off vs in-layer AllReduce).
tests/sendrecv.rs: 2-GPU stage0->stage1 send/recv smoke test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>