xtrain

gahow/xtrain

Fork 0

Commit Graph

Author	SHA1	Message	Date
Gahow Wang	7a1fba95b5	docs: v12 — 1.05B long-ctx base + chat-alpha SFT quality check - run 12: dim1664/22L true-GQA 1.05B base, seq1024, 6.765B FineWeb tokens, 81h on 8x5090. Fixed eval v1 @seq1024 = 2.7410 vs v11 2.7467 — a real but marginal gain; v11->v12 is a capacity-only step on fixed data, so the ~0.2% return confirms the 1B base is now data-limited. - run 13: three SFT stages from the v12 base (synthetic / anchor / real-mix-repair). The pipeline works and produces a chat-shaped model that follows the format and stops, but none of the variants is a stable high-quality chat model — bottleneck is SFT data quality + selection signal (val loss decouples from generation quality), not infra. - scripts/run_v12_phase.sh wrapper + chat_alpha_fixed_prompts.txt eval set. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-29 16:19:12 +08:00

Author

SHA1

Message

Date

Gahow Wang

7a1fba95b5

docs: v12 — 1.05B long-ctx base + chat-alpha SFT quality check

- run 12: dim1664/22L true-GQA 1.05B base, seq1024, 6.765B FineWeb tokens,
  81h on 8x5090. Fixed eval v1 @seq1024 = 2.7410 vs v11 2.7467 — a real but
  marginal gain; v11->v12 is a capacity-only step on fixed data, so the ~0.2%
  return confirms the 1B base is now data-limited.
- run 13: three SFT stages from the v12 base (synthetic / anchor /
  real-mix-repair). The pipeline works and produces a chat-shaped model that
  follows the format and stops, but none of the variants is a stable
  high-quality chat model — bottleneck is SFT data quality + selection signal
  (val loss decouples from generation quality), not infra.
- scripts/run_v12_phase.sh wrapper + chat_alpha_fixed_prompts.txt eval set.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-29 16:19:12 +08:00

1 Commits