Files
agentic-kvc/microbench
Gahow Wang 4242bba034 Chunk-safe + concurrent layer-wise connector (per-step incremental shipping)
Scheduler tracks per-producer block_ids (accumulated from scheduler_output)
and emits per-step LWSendMeta with cumulative computed_tokens. Worker
lw_wait_for_save records a CUDA event per step and enqueues progress; the
sender-loop ship loop drains it, shipping only computed+dst-wanted+unshipped
blocks in order (correct under chunked prefill). Per-transfer state =
concurrent-safe. Keeps v1 single-transfer version as reference.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 17:15:54 +08:00
..