Analogue of the ddp_dropout_is_live_and_p0_bit_identical test (T21, thread-per-
GPU) for the process-per-GPU launcher. Runs launch_processes twice on the same
corpus / init / config with the ONLY difference being cfg.dropout (passed
launcher→worker via a new XTRAIN_TEST_DROPOUT env — worker re-execs cannot
inherit argv changes), reads rank 0's loss trajectory from both runs, and
asserts GATE B: max |loss diff| > 1e-3.
The threshold sits ~4 orders of magnitude above this box's KI-5 cross-rank NCCL
noise floor (~1e-7), so it is an unambiguous "dropout mask is applied" signal,
not a noise measurement. Pre-fix (missing cfg.dropout = ... in the worker /
launcher, exactly the gap the paired launcher commit closes) both traces are
bit-identical and this test FAILs.
Also wires ENV_DROPOUT into the shared worker entry so the existing correctness
test's contract is unchanged (absent env → 0.0 → same synth run as before).
p0/ and p02/ subdirs isolate the two invocations' dumps.