This website requires JavaScript.
Explore
Help
Sign In
Gahow Wang
gahow
0 Followers
·
0 Following
Joined on
2026-04-03
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
17
Projects
Packages
Public Activity
Starred Repositories
gahow
pushed to
main
at
gahow/xserv
2026-07-01 07:13:52 +00:00
fcf531a9b2
style: rustfmt server engine files
d96ee0766c
server: sampling-param validation, finish_reason normalization, backpressure
ce10e4a998
sampling: NaN-safe sample() top-k/top-p path
5f060902f6
cuda: fix remaining int32-address and nondeterministic-reduction bugs
Compare 4 commits »
gahow
pushed to
main
at
gahow/aituner
2026-07-01 06:43:31 +00:00
d8899c50ce
Add interaction screening matrix generator
gahow
pushed to
main
at
gahow/aituner
2026-07-01 06:34:09 +00:00
407c082e6e
Add interaction screening matrix generator
gahow
pushed to
main
at
gahow/aituner
2026-07-01 06:32:04 +00:00
49296359a7
Add interaction screening matrix generator
gahow
pushed to
main
at
gahow/aituner
2026-07-01 06:28:35 +00:00
8a3c0d5f4c
Add interaction screening matrix generator
gahow
pushed to
main
at
gahow/xserv
2026-07-01 06:16:51 +00:00
a67753f516
softmax: cap block size at 512 threads
f5ec10c2c3
xserv-cli: expose sampling params and greedy repetition penalty
ce7229f4fe
speculative: Qwen3 draft-model v0 with paged verify parity
5b350ee5f0
cuda: deterministic BF16 gemv + paged attention reductions
Compare 4 commits »
gahow
pushed to
main
at
gahow/xtrain
2026-07-01 06:09:54 +00:00
6465a2d5ce
test: T21-for-proc — clear ENV_DROPOUT across tests to sever ordering coupling
gahow
pushed to
main
at
gahow/xtrain
2026-07-01 05:51:55 +00:00
33a1aee9ec
test: T21-for-proc — dropout-live regression under process-per-GPU
86de6bfb51
distributed: T21-for-proc — wire --dropout into the process-per-GPU launcher
Compare 2 commits »
gahow
pushed to
main
at
gahow/xserv
2026-07-01 05:48:29 +00:00
0314b4f3ac
server: non-blocking stream send — stop one slow client stalling the batch
cfbd64d206
cuda: fix int32 overflow in MoE dense kernels; surface launch errors in release
Compare 2 commits »
gahow
pushed to
main
at
gahow/aituner
2026-07-01 03:12:59 +00:00
46b477f48e
Add initial config preflight review
gahow
pushed to
main
at
gahow/xtrain
2026-06-30 15:03:35 +00:00
4379868f2d
docs: M2d — ragged-batching lever, 9× measured, step bottleneck → rollout
0e82b2438e
test: M2d — ragged-forward + batched-op equivalence gates + throughput bench
c2ebf62ae1
post-train: M2d — batch the GRPO training-side forwards (op + module + wiring)
Compare 3 commits »
gahow
pushed to
main
at
gahow/xtrain
2026-06-30 09:39:11 +00:00
41d46208a6
docs: M2c — device KV cache + the bottleneck-shift finding
3a3425960c
post-train: M2c — device-side KV cache (cat_seq), profile-first bottleneck shift
Compare 2 commits »
gahow
pushed to
main
at
gahow/xtrain
2026-06-30 09:20:04 +00:00
0f76c0fdb0
docs: M2b — batched decode results (token-identical + ~1.7x rollout, device-cache next)
361c5290fa
post-train: M4 — use M2b batched rollout in GRPO (~1.7× step)
2c9b58cb3b
post-train: M2b — batched KV-cache decode (G-way, token-identical)
Compare 3 commits »
gahow
pushed to
main
at
gahow/xtrain
2026-06-30 09:01:24 +00:00
096e45b845
docs: M4 — GRPO results (infra + memory/rollout walls + capability-wall negative result)
7fb3b32fd9
post-train: M4 — GRPO actor-learner loop + cached temperature rollout
aaa77082ef
post-train: M4 — clipped_pg_loss + scale_rows (GRPO policy-gradient op)
Compare 3 commits »
gahow
pushed to
main
at
gahow/aituner
2026-06-30 06:10:21 +00:00
1b8f5a3af1
Integrate descriptor runtime candidates into harness
gahow
pushed to
main
at
gahow/xtrain
2026-06-30 04:38:08 +00:00
99090465bf
docs: M3 — DPO results (infra correct, held-out correctness flat, over-optimization collapse)
2f827fd6d8
post-train: M3 — DPO pair-gen + training loop (verifiable arithmetic)
f3c764ce95
post-train: M3 — seq_logprob + dpo_loss autograd ops
Compare 3 commits »
gahow
pushed to
main
at
gahow/aituner
2026-06-30 04:05:05 +00:00
adb5356c4b
Add advisory harness attribution and descriptor planner MVP
gahow
pushed to
main
at
gahow/xtrain
2026-06-30 04:01:11 +00:00
b39e6e7110
docs: M2a — KV-cache decode engine results (token-identical + length-dependent speedup)
eff26a0898
post-train: M2a — KV-cache incremental decode engine (token-identical)
c88e2ab88c
post-train: M2 — decode primitives (rope_at + decode_attention)
Compare 3 commits »
gahow
pushed to
main
at
gahow/xtrain
2026-06-30 03:13:29 +00:00
1574e21d89
post-train: M1 — verifiable-arith eval scorer + SFT format-baseline result
gahow
pushed to
main
at
gahow/xtrain
2026-06-29 15:28:26 +00:00
cb64604496
post-train: M1 fix — enlarge arith key space + saturation guard
First
Previous
1
2
3
4
5
...
Next
Last