10 KiB
Factor Attribution Design
Date: 2026-04-07
Repo: /Users/gahow/projects/quant
Goal
Add a factor attribution module that explains strategy returns using:
- Standard external US factors when available:
MKT-RF,SMB,HML,RMW,CMA,RF - Local price-derived extension factors:
MOM,LOWVOL,RECOVERY - Local proxy fallback factors for markets without standard external data
The module must integrate with the current backtest workflow, reuse existing strategy equity curves, cache downloaded factor data locally, and produce both terminal summaries and exportable tabular outputs.
Scope
In scope:
- New factor attribution module for research backtests
- US support using external standard factors plus local extension factors
- CN support using local proxy factors only
- CAPM, FF5, and FF5-plus-extension models
- CLI flags in
main.pyto enable attribution and export results - Tests for parsing, factor construction, and regression behavior
Out of scope for this iteration:
- Intraday attribution
- Portfolio optimizer changes
- Live trader attribution in
trader.py - Notebook or plotting UI for attribution results
- External fundamental datasets beyond standard downloadable factor files
Existing Context
The repo already has:
- A vectorized backtest engine in
main.py - Strategy implementations that produce daily target weights
- Performance metrics in
metrics.py - Local daily price caches in
data/us.csv,data/us_open.csv,data/cn.csv
Current "alpha" in trader.py simulate is only total return minus benchmark return. The new module adds regression-based alpha and factor exposure analysis.
Design Overview
Add a new module factor_attribution.py with four responsibilities:
- Load and cache factor datasets
- Build local extension and proxy factors from existing price data
- Run regression models against strategy daily returns
- Render summary tables and export detailed results
main.py remains the orchestration point. It will continue running backtests and benchmark normalization, then optionally invoke attribution on the resulting daily return series.
Module Structure
factor_attribution.py
Planned top-level responsibilities:
-
load_external_us_factors(...)- Download Ken French daily factor files
- Parse, normalize, convert percent to decimal
- Cache to
data/factors/ - Fall back to cache when network fetch fails
-
build_extension_factors(price_data, benchmark, market)- Build local daily factor return series for:
MOMLOWVOLRECOVERY
- Build local daily factor return series for:
-
build_proxy_core_factors(price_data, benchmark, market)- Used mainly for CN or when external factors are unavailable
- Build daily proxy series for:
MKTSMB_PROXYHML_PROXYRMW_PROXYCMA_PROXY
-
prepare_factor_models(...)- Merge standard factors and local factors
- Produce factor matrices for:
capmff5ff5plus
-
run_factor_regression(strategy_returns, factor_frame, risk_free_col)- Fit OLS with intercept
- Return alpha, annualized alpha, loadings, t-stats, p-values, R-squared, adjusted R-squared, residual volatility, date range, and observation count
-
attribute_strategies(results_df, benchmark_series, price_data, market, model_selection)- Convert equity curves to returns
- Run attribution for each strategy
- Return structured summary and long-form loadings tables
-
print_attribution_summary(...)- Render compact terminal output
-
export_attribution(...)- Write CSV outputs
Data Sources
US Standard Factors
Preferred source:
- Ken French daily factor datasets for:
- Fama-French 5 Factors daily
- Momentum daily if separately required
Normalization rules:
- Convert index to pandas
DatetimeIndex - Convert values from percent to decimal returns
- Keep
RFas decimal daily risk-free rate
Cache location:
data/factors/ff5_us_daily.csvdata/factors/mom_us_daily.csv
If the source format changes or download fails:
- Use the latest local cache if present
- Otherwise fall back to local proxy factors and mark the run as
proxy_only
Local Price Inputs
Reuse repo price caches:
- US:
data/us.csv,data/us_open.csv - CN:
data/cn.csv
Only adjusted close prices are required for attribution factor construction.
Factor Definitions
Standard Factors
For US:
MKT-RF,SMB,HML,RMW,CMA,RFfrom external factor data
Local Extension Factors
These are built from the same universe already used by the repo.
MOM
- Cross-sectional momentum long-short factor
- Rank stocks by 12-1 month return
- Long top quantile, short bottom quantile
- Equal weight within long and short legs
- Factor return is long return minus short return
LOWVOL
- Cross-sectional low-volatility factor
- Compute rolling volatility from daily returns
- Long lowest-vol quantile, short highest-vol quantile
- Equal weight within legs
RECOVERY
- Cross-sectional recovery factor
- Rank stocks by distance from rolling 63-day low
- Long strongest recovery names, short weakest recovery names
- Equal weight within legs
Proxy Core Factors
Used for CN by default and as fallback for US.
MKT
- Benchmark daily return if benchmark exists
- Otherwise equal-weight universe return
SMB_PROXY
- Size proxy using inverse price level or market-cap proxy when only price data is available
- First iteration uses inverse price rank as a transparent proxy and explicitly labels it as proxy
HML_PROXY
- Value proxy using price-to-range or distance-to-trailing-low style signal
- This is not a true book-to-market factor and must be labeled proxy
RMW_PROXY
- Profitability proxy from return consistency and stability
CMA_PROXY
- Investment proxy from asset trend smoothness or expansion/contraction behavior inferred from price action
Proxy factors are included for model completeness, but the output must label them clearly as proxies rather than standard academic factors.
Factor Construction Rules
- All local factors use only information available up to date
tto explain returns att+1 - No future data leakage
- Factor series are daily return series, not ranks
- Long-short factors should be approximately dollar-neutral
- Missing values are allowed during warmup windows and dropped during model alignment
- Quantile counts should adapt to available universe size
Regression Models
CAPM
Model:
strategy_excess_return ~ alpha + (MKT-RF)
FF5
Model:
strategy_excess_return ~ alpha + MKT-RF + SMB + HML + RMW + CMA
FF5Plus
Model:
strategy_excess_return ~ alpha + MKT-RF + SMB + HML + RMW + CMA + MOM + LOWVOL + RECOVERY
Proxy Model
For markets without standard factors:
strategy_return ~ alpha + MKT + SMB_PROXY + HML_PROXY + RMW_PROXY + CMA_PROXY + MOM + LOWVOL + RECOVERY
The module should report which model family was actually used.
Alignment Rules
- Convert all equity curves to daily returns
- Build factor frames at daily frequency
- Join strategy returns and factor returns on date intersection
- For standard factor models, subtract
RFfrom strategy returns - Keep benchmark return separately for active return diagnostics, but not as a replacement for
MKT-RFin standard factor models
Output Schema
Summary Output
One row per strategy per model with fields including:
strategymarketmodelfactor_sourceproxy_onlystart_dateend_daten_obsalpha_dailyalpha_annalpha_t_statalpha_p_valuer_squaredadj_r_squaredresidual_vol_ann
Selected factor loadings should also be flattened into summary columns when available:
beta_mktbeta_smbbeta_hmlbeta_rmwbeta_cmabeta_mombeta_lowvolbeta_recovery
Loadings Output
Long-form table:
strategymodelfactorbetat_statp_value
CLI Changes
Add arguments to main.py:
--attribution--attribution-model {capm,ff5,ff5plus,all}--attribution-export <dir>
Behavior:
- If
--attributionis not set, current behavior is unchanged - If set, attribution runs after backtest metrics are printed
- If export path is set, write:
summary.csvloadings.csv
Terminal Reporting
For each strategy and selected model, print a compact line containing:
- annualized alpha
- major factor loadings
- R-squared
- residual volatility
After the numeric table, print a short interpretation section:
- whether alpha remains after adding factors
- which factors explain most of the strategy
- whether the model fit is weak or strong
Interpretation should remain descriptive and avoid overclaiming statistical significance.
Error Handling
- External factor download failure:
- Use cache if available
- Otherwise downgrade to proxy mode
- Missing or short overlap window:
- Skip that model and report insufficient data
- Singular matrix or severe multicollinearity:
- Catch and report model failure or unstable fit
- Missing benchmark column:
- Fall back to equal-weight universe market proxy where possible
Testing Plan
Unit Tests
- External factor parser converts dates and percent units correctly
- Cache loader returns cached data on download failure
- Extension factor builders produce expected columns and no future leakage
- Regression on synthetic data recovers approximate known alpha and betas
Integration Tests
- End-to-end attribution on a small deterministic equity and factor dataset
- CLI export produces expected files and columns
Regression Tests
- Fixed local US sample produces stable output shape and model naming
Implementation Notes
- Prefer
numpy.linalg.lstsqorscipyOLS utilities already available in dependencies - Keep implementation dependency-light
- Keep factor construction functions separate from regression code for testability
- Avoid changing existing strategy behavior
Risks
- Standard factor downloads may change source file formatting
- Proxy factor definitions for CN will be weaker than true academic factors
- Some strategy returns may be highly collinear with momentum-like factors, reducing interpretability
- Short or overlapping warmup windows can materially reduce sample size
Success Criteria
- A user can run backtests with
--attributionand receive factor-based explanations of returns - US runs use standard external factors when available
- CN runs still produce a clearly labeled proxy attribution report
- Outputs distinguish residual alpha from factor exposure
- The module is easy to extend with new factors later