Files
agentic-pd-hybrid/third_party
Claude Code Agent 2dfe22ab20 refactor(snapshot): dedicated GPU snapshot_buf replaces kv_pool alloc
Implements the design in docs/SNAPSHOT_STORE_REFACTOR_ZH.md to fix
the alloc-failed death loop that killed D→P in E4-v4/v5 (167 sync
attempts, 0 OK because P's kv_pool was busy with its own prefill).

Mechanism change:
  OLD prepare_receive: token_to_kv_pool_allocator.alloc(N) — 90%+ failure
  NEW prepare_receive: SnapshotBufAllocator.alloc(slab_bytes) carves a
                       range from an 8 GB GPU buffer dedicated to
                       snapshot reception, decoupled from kv_pool

  OLD finalize_ingest: just radix.insert with pre-alloc'd slots
  NEW finalize_ingest: kv_pool.alloc NOW + GPU memcpy snapshot_buf →
                       k_buffer/v_buffer + radix.insert

Wire schema changed (clean break, no back-compat):
  PrepareReceiveReqOutput  swaps k/v_base_ptrs + slot_indices  for
                           snapshot_buf_base_ptr + k/v_layer_offsets +
                           num_tokens
  DumpReqInput             swaps target_k/v_base_ptrs + target_slot_indices
                           for target_snapshot_buf_base +
                           target_k/v_layer_offsets
  FinalizeIngestReqInput   drops slot_indices (P resolves at ingest)

Controller adds:
  SnapshotBufAllocator: first-fit free-list with 4 KB alignment
  ingest_snapshot_into_kvpool: GPU→GPU copy + radix insert

Configurable buffer size via SGLANG_SNAPSHOT_LINK_BUF_BYTES env
(default 8 GB, scales down to 1 GB if alloc fails).

Removed runtime leak-check accommodation since prepare_receive no
longer touches kv_pool.

Total: ~365 LOC including alloc helper; smoke-test verification next.
2026-05-13 14:18:23 +08:00
..