quant/__init__.py at 3abc51e3e3c8bedd84c0505936d541496bf2aa73 - quant - Local Gitea

gahow/quant

Files

Gahow Wang 5e1c4a681d Add point-in-time S&P 500 backtest to expose survivorship bias

The existing framework fetches today's S&P 500 constituents from Wikipedia
and applies that list to the entire 10-year price history — classic
survivorship bias. Stocks that went bankrupt or were removed for poor
performance are absent, while today's winners (which may have been minor
names 10 years ago) are implicitly selected. This materially inflates
reported strategy returns.

New pipeline:
  - universe_history.py reconstructs per-ticker membership intervals by
    walking Wikipedia's "Selected changes" table backward from today.
  - research/fetch_historical.py downloads prices for all 848 tickers
    that were ever members (Yahoo returns ~675 of them; ~170 fully
    delisted names are unavailable — remaining partial bias).
  - research/pit_backtest.py masks prices to NaN outside membership
    windows so strategies naturally cannot select non-members.
  - research/strategies_plus.py adds RecoveryMomentumPlus (generalized
    Recovery+Momentum with configurable weighting / blend / regime hook)
    and an EnsembleStrategy.
  - research/optimize.py runs five experiments: bias drift, hyperparameter
    sweep (2016-2022 train / 2023-2026 test), SPY MA regime filter,
    weighting schemes, and an uncorrelated-config ensemble.

Headline finding: the biased backtest reports 40.9% CAGR for
recovery_mom_top10 over 2016-2026; the point-in-time version reports
22.4% (vs 14.0% SPY buy-and-hold). True edge is ~8pp CAGR, not ~27pp.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-04-17 16:26:02 +08:00

0 lines

0 B

Python

Raw Blame History

The file is empty.