Gahow Wang gahow
  • Joined on 2026-04-03
gahow pushed to main at gahow/xserv 2026-07-01 07:13:52 +00:00
fcf531a9b2 style: rustfmt server engine files
d96ee0766c server: sampling-param validation, finish_reason normalization, backpressure
ce10e4a998 sampling: NaN-safe sample() top-k/top-p path
5f060902f6 cuda: fix remaining int32-address and nondeterministic-reduction bugs
Compare 4 commits »
gahow pushed to main at gahow/aituner 2026-07-01 06:43:31 +00:00
d8899c50ce Add interaction screening matrix generator
gahow pushed to main at gahow/aituner 2026-07-01 06:34:09 +00:00
407c082e6e Add interaction screening matrix generator
gahow pushed to main at gahow/aituner 2026-07-01 06:32:04 +00:00
49296359a7 Add interaction screening matrix generator
gahow pushed to main at gahow/aituner 2026-07-01 06:28:35 +00:00
8a3c0d5f4c Add interaction screening matrix generator
gahow pushed to main at gahow/xserv 2026-07-01 06:16:51 +00:00
a67753f516 softmax: cap block size at 512 threads
f5ec10c2c3 xserv-cli: expose sampling params and greedy repetition penalty
ce7229f4fe speculative: Qwen3 draft-model v0 with paged verify parity
5b350ee5f0 cuda: deterministic BF16 gemv + paged attention reductions
Compare 4 commits »
gahow pushed to main at gahow/xtrain 2026-07-01 06:09:54 +00:00
6465a2d5ce test: T21-for-proc — clear ENV_DROPOUT across tests to sever ordering coupling
gahow pushed to main at gahow/xtrain 2026-07-01 05:51:55 +00:00
33a1aee9ec test: T21-for-proc — dropout-live regression under process-per-GPU
86de6bfb51 distributed: T21-for-proc — wire --dropout into the process-per-GPU launcher
Compare 2 commits »
gahow pushed to main at gahow/xserv 2026-07-01 05:48:29 +00:00
0314b4f3ac server: non-blocking stream send — stop one slow client stalling the batch
cfbd64d206 cuda: fix int32 overflow in MoE dense kernels; surface launch errors in release
Compare 2 commits »
gahow pushed to main at gahow/aituner 2026-07-01 03:12:59 +00:00
46b477f48e Add initial config preflight review
gahow pushed to main at gahow/xtrain 2026-06-30 15:03:35 +00:00
4379868f2d docs: M2d — ragged-batching lever, 9× measured, step bottleneck → rollout
0e82b2438e test: M2d — ragged-forward + batched-op equivalence gates + throughput bench
c2ebf62ae1 post-train: M2d — batch the GRPO training-side forwards (op + module + wiring)
Compare 3 commits »
gahow pushed to main at gahow/xtrain 2026-06-30 09:39:11 +00:00
41d46208a6 docs: M2c — device KV cache + the bottleneck-shift finding
3a3425960c post-train: M2c — device-side KV cache (cat_seq), profile-first bottleneck shift
Compare 2 commits »
gahow pushed to main at gahow/xtrain 2026-06-30 09:20:04 +00:00
0f76c0fdb0 docs: M2b — batched decode results (token-identical + ~1.7x rollout, device-cache next)
361c5290fa post-train: M4 — use M2b batched rollout in GRPO (~1.7× step)
2c9b58cb3b post-train: M2b — batched KV-cache decode (G-way, token-identical)
Compare 3 commits »
gahow pushed to main at gahow/xtrain 2026-06-30 09:01:24 +00:00
096e45b845 docs: M4 — GRPO results (infra + memory/rollout walls + capability-wall negative result)
7fb3b32fd9 post-train: M4 — GRPO actor-learner loop + cached temperature rollout
aaa77082ef post-train: M4 — clipped_pg_loss + scale_rows (GRPO policy-gradient op)
Compare 3 commits »
gahow pushed to main at gahow/aituner 2026-06-30 06:10:21 +00:00
1b8f5a3af1 Integrate descriptor runtime candidates into harness
gahow pushed to main at gahow/xtrain 2026-06-30 04:38:08 +00:00
99090465bf docs: M3 — DPO results (infra correct, held-out correctness flat, over-optimization collapse)
2f827fd6d8 post-train: M3 — DPO pair-gen + training loop (verifiable arithmetic)
f3c764ce95 post-train: M3 — seq_logprob + dpo_loss autograd ops
Compare 3 commits »
gahow pushed to main at gahow/aituner 2026-06-30 04:05:05 +00:00
adb5356c4b Add advisory harness attribution and descriptor planner MVP
gahow pushed to main at gahow/xtrain 2026-06-30 04:01:11 +00:00
b39e6e7110 docs: M2a — KV-cache decode engine results (token-identical + length-dependent speedup)
eff26a0898 post-train: M2a — KV-cache incremental decode engine (token-identical)
c88e2ab88c post-train: M2 — decode primitives (rope_at + decode_attention)
Compare 3 commits »
gahow pushed to main at gahow/xtrain 2026-06-30 03:13:29 +00:00
1574e21d89 post-train: M1 — verifiable-arith eval scorer + SFT format-baseline result
gahow pushed to main at gahow/xtrain 2026-06-29 15:28:26 +00:00
cb64604496 post-train: M1 fix — enlarge arith key space + saturation guard