This website requires JavaScript.
Explore
Help
Sign In
Gahow Wang
gahow
0 Followers
·
0 Following
Joined on
2026-04-03
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
17
Projects
Packages
Public Activity
Starred Repositories
gahow
pushed to
t16-grad-accum
at
gahow/xtrain
2026-06-17 15:49:18 +00:00
b06b553f99
test: drop unused Var import in grad_accum
gahow
pushed to
t16-grad-accum
at
gahow/xtrain
2026-06-17 15:45:48 +00:00
abe5ceb913
test: grad-accum equivalence + accum=1 bit-identity + DDP+accum
7a03b0054a
train+ddp: micro-batch gradient accumulation (--accum-steps)
d01fec6639
docs: Phase T16 — gradient accumulation design
Compare 3 commits »
gahow
created branch
t16-grad-accum
in
gahow/xtrain
2026-06-17 15:45:48 +00:00
gahow
pushed to
t14-flash-attention
at
gahow/xtrain
2026-06-17 15:34:17 +00:00
9064ced4c2
docs: T14 flash-attention results + evolution/README rows
gahow
pushed to
t14-flash-attention
at
gahow/xtrain
2026-06-17 15:27:34 +00:00
d217f4fbd3
perf: spread flash bwd dK/dV atomics across all threads
gahow
pushed to
t14-flash-attention
at
gahow/xtrain
2026-06-17 15:24:58 +00:00
4d7b69f8d4
perf: cache softmax weights in shared mem (drop hd× redundant expf)
gahow
pushed to
t14-flash-attention
at
gahow/xtrain
2026-06-17 15:19:09 +00:00
9b05f4f93f
test: flash==composed bf16 uses robust mean/p99 metric (repo convention)
gahow
pushed to
t14-flash-attention
at
gahow/xtrain
2026-06-17 15:17:46 +00:00
c0f0b67510
test: eps=2e-3 for flash dQ/dK finite-diff (cuts f32 rounding term)
gahow
pushed to
t14-flash-attention
at
gahow/xtrain
2026-06-17 15:17:07 +00:00
80602099dc
test: scale Q/K in flash grad-check for well-conditioned grads
gahow
pushed to
t14-flash-attention
at
gahow/xtrain
2026-06-17 15:16:22 +00:00
f38beb0346
test: flash finite-diff grad-check uses single-tile clean regime
gahow
pushed to
t14-flash-attention
at
gahow/xtrain
2026-06-17 15:12:30 +00:00
01fb22d114
test: flash bwd vs composed bwd (sharper than finite-diff)
gahow
pushed to
t14-flash-attention
at
gahow/xtrain
2026-06-17 15:10:47 +00:00
5f3b81ac96
test+bins: flash grad-check, flash==composed, PyTorch parity, --flash flag
0e20821633
autodiff+model: flash-attention op + --flash opt-in wiring
326a6fadfe
cuda: fused flash-attention kernel (fwd + flash-style bwd)
65a2264227
docs: Phase T14 — fused flash-attention design
Compare 4 commits »
gahow
created branch
t14-flash-attention
in
gahow/xtrain
2026-06-17 15:10:47 +00:00
gahow
pushed to
feat/fig18-real-output-lca-substrate
at
gahow/aituner
2026-06-17 14:11:53 +00:00
a1b804f879
Ablation: search.high 0.25 -> 0.15 (skip wildly-infeasible top probes)
gahow
pushed to
feat/fig18-real-output-lca-substrate
at
gahow/aituner
2026-06-17 09:24:06 +00:00
0c23285f39
Fig18 substrate: real output_length + criterion-A time_scale + Stop-A drain deadline
gahow
created branch
feat/fig18-real-output-lca-substrate
in
gahow/aituner
2026-06-17 09:24:06 +00:00
gahow
pushed to
main
at
gahow/xtrain
2026-06-17 08:17:27 +00:00
31cc2bf745
docs: capstone README — full-stack + scaling study (v0-v8) writeup
gahow
pushed to
main
at
gahow/xtrain
2026-06-17 07:12:05 +00:00
511f35d40c
docs: run v8 — dim1024 capacity helps (val 2.98)
gahow
pushed to
main
at
gahow/aituner
2026-06-17 05:03:27 +00:00
816765071f
Complete harness-vs-naive ablation: harness 3x faster + stops; naive nondeterministic
gahow
pushed to
main
at
gahow/aituner
2026-06-17 02:05:46 +00:00
97d2ddabb1
Ablation driver: force direct LLM connection (codex proxy is dash0-local)
First
Previous
...
6
7
8
9
10
...
Next
Last