Captures the architectural fix for the P-side alloc-failed problem
that killed every D→P sync attempt in E4-v4/v5. Designs a dedicated
GPU snapshot_buf with a slab allocator, decoupling reception from
kv_pool, and defers kv_pool alloc to finalize_ingest time when the
snapshot bytes are already in hand. ~365 LOC across controller,
io_struct, agentic. Smoke + E4-v6 expected to show first non-zero
D→P OK rate.