Gahow Wang gahow
  • Joined on 2026-04-03
gahow pushed to main at gahow/aituner 2026-06-17 01:55:41 +00:00
8e58b4033d Note dash1 lacks LLM gateway access (naive-completion deferred to dash0)
gahow pushed to main at gahow/aituner 2026-06-17 01:52:56 +00:00
b779f6e56a Add dash1 naive-completion driver for the ablation
gahow pushed to main at gahow/aituner 2026-06-17 01:51:59 +00:00
e7d1b3ba01 Harness-vs-naive ablation result: harness steers to TP & converges; naive wanders
gahow pushed to main at gahow/xtrain 2026-06-17 01:50:38 +00:00
0150263055 perf: KI-3 fixed — dim1024 batch32 fits, mem 31.1→14.6GB, tok/s 39.7K→31.5K
gahow pushed to main at gahow/xtrain 2026-06-17 01:45:16 +00:00
69c5f07359 docs: Phase T13 — activation recompute
gahow pushed to main at gahow/xtrain 2026-06-17 01:44:03 +00:00
a12dcf18d0 docs: Phase T13 — activation recompute
f202351be5 model: per-block activation recompute (--recompute)
c396b39483 autodiff: checkpoint primitive (recompute-on-backward)
Compare 3 commits »
gahow pushed to main at gahow/xtrain 2026-06-16 19:55:53 +00:00
9c557f0609 docs: run v7 — FineWeb subset near-ceiling at dim768 (val 3.01)
gahow pushed to main at gahow/xtrain 2026-06-16 14:21:49 +00:00
b4bb426d48 docs: run v6 — FineWeb-edu graduation (val 3.07, new distribution)
gahow pushed to main at gahow/aituner 2026-06-16 12:59:48 +00:00
579dd86698 Ablation: --skip-baseline so loops climb from first proposal
gahow pushed to main at gahow/aituner 2026-06-16 12:30:42 +00:00
37342a5749 Add chained harness-vs-naive ablation driver (sequential runs + DONE marker)
gahow pushed to main at gahow/aituner 2026-06-16 12:29:31 +00:00
5965f4fbbc Ablation substrate: scale=0.5 + out=128 + 6 probes (TP1 measurable, tractable)
gahow pushed to main at gahow/aituner 2026-06-16 12:16:32 +00:00
a1cbab0e69 Document harness-vs-naive ablation: setup, substrate calibration, blocker
gahow pushed to main at gahow/aituner 2026-06-16 12:01:20 +00:00
0794efa249 Reduce ablation probe budget to 3 per trial for tractability
gahow pushed to main at gahow/aituner 2026-06-16 11:50:07 +00:00
d975e57bb5 Scale ablation early-stop caps to the compressed window (scale=0.2)
gahow pushed to main at gahow/aituner 2026-06-16 11:31:27 +00:00
a16016a876 Add harness vs naive ablation configs (27b, scale=0.2 substrate)
gahow pushed to main at gahow/xtrain 2026-06-16 11:30:54 +00:00
88bec270af docs: evolution overview — per-milestone changes across algorithm/arch/infra/dataset axes
gahow pushed to main at gahow/aituner 2026-06-16 11:16:43 +00:00
07f5d92e1d Add consolidated two-stop summary doc
f2ff0faebd Document Stop-B end-to-end on dense 27B: the improving climb + no-regression
4a64196a99 Add 27B Stop-B agentic-loop config (harness-driven, GPUs 2-7)
b17b213575 Tear down the engine on SIGTERM instead of orphaning it
93ce339d61 Document 27B TP sweep: per-GPU rises sharply with TP (dense), opposite of MoE
Compare 28 commits »
gahow pushed to main at gahow/xtrain 2026-06-16 11:04:47 +00:00
7e5ea9976b data: FineWeb-edu parquet->txt prep script (Scaling v6)
gahow pushed to feat/two-stop at gahow/aituner 2026-06-16 10:07:01 +00:00
f2ff0faebd Document Stop-B end-to-end on dense 27B: the improving climb + no-regression
gahow pushed to main at gahow/xtrain 2026-06-16 09:56:32 +00:00
579365f4a0 docs: run v5 — TinyStories saturation at dim768 (val 1.11)
8a1e29543b run: v5 archive + export (dim768, bf16, 5.33ep, val 1.11)
Compare 2 commits »