Add ncclSend/ncclRecv FFI and a PpContext that initializes a NCCL communicator across P pipeline stages and hands the hidden state to neighbour stages on the null stream. Mirrors TpContext; the collective differs (point-to-point hand-off vs in-layer AllReduce). tests/sendrecv.rs: 2-GPU stage0->stage1 send/recv smoke test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>