Files
agentic-pd-hybrid/docs
Claude Code Agent 7216507773 feat(snapshot): D→P RDMA Phase 1b — GPU pointer path verified
Confirms snapshot_link works for cuda device pointers, not just host
memory. Sender on cuda:0 pushes to receiver on cuda:1 via RDMA over
mlx5_60. All 5 sizes (16K, 1M, 16M, 64M, 256M) pass SHA verification.

  16 KB     8.3 ms   0.016 Gbps  (cold openSegment)
  1 MB      0.10 ms  87.6 Gbps
  16 MB     0.84 ms  159 Gbps
  64 MB     2.52 ms  213 Gbps
  256 MB    8.54 ms  251 Gbps    (~60% NDR400 line rate)

For Inferact-scale sessions (~50K tokens × ~80 KB layer-per-token =
~4 GB), this projects D→P transfer time at ~130 ms — within the
"reseed-savings" envelope sketched in design doc §3.2.

Files:
  scripts/snapshot_link_receiver_gpu.py
  scripts/smoke_snapshot_link_gpu.py

Next: SGLang scheduler integration for D-side dump + P-side ingest.
2026-05-13 00:59:43 +08:00
..