- known-issues.md: new "DDP-dropout wiring" Fixed entry (gap + fix +
regression test), with the meta-lesson that op/single-GPU unit tests can
miss launcher-level integration gaps — only the V9-PILOT end-to-end run on
the real launcher path exposed it.
- 17-dropout.md: annotate the DDP-combination note with the T18 wiring gap
and its T21 fix.
- evolution.md: T21 row (Infra) recording the fix + meta-lesson.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Counter-based (stateless) RNG → Bernoulli(keep=1-p) mask, inverted 1/(1-p)
scaling at train, identity at eval. New autodiff `dropout` op (fwd generates +
applies mask, bwd applies the SAME cached mask). Wired at the two residual-path
sites (attn / ffn outputs); attention-probs dropout deliberately skipped (fused
SDPA doesn't materialise probs). Documents the RNG choice, per-site deterministic
seed (so T13 recompute reproduces the same mask), train/eval switch, p=0
bit-identity, and the acceptance gates.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>