# Factor Attribution Design Date: 2026-04-07 Repo: `/Users/gahow/projects/quant` ## Goal Add a factor attribution module that explains strategy returns using: - Standard external US factors when available: `MKT-RF`, `SMB`, `HML`, `RMW`, `CMA`, `RF` - Local price-derived extension factors: `MOM`, `LOWVOL`, `RECOVERY` - Local proxy fallback factors for markets without standard external data The module must integrate with the current backtest workflow, reuse existing strategy equity curves, cache downloaded factor data locally, and produce both terminal summaries and exportable tabular outputs. ## Scope In scope: - New factor attribution module for research backtests - US support using external standard factors plus local extension factors - CN support using local proxy factors only - CAPM, FF5, and FF5-plus-extension models - CLI flags in `main.py` to enable attribution and export results - Tests for parsing, factor construction, and regression behavior Out of scope for this iteration: - Intraday attribution - Portfolio optimizer changes - Live trader attribution in `trader.py` - Notebook or plotting UI for attribution results - External fundamental datasets beyond standard downloadable factor files ## Existing Context The repo already has: - A vectorized backtest engine in `main.py` - Strategy implementations that produce daily target weights - Performance metrics in `metrics.py` - Local daily price caches in `data/us.csv`, `data/us_open.csv`, `data/cn.csv` Current "alpha" in `trader.py simulate` is only total return minus benchmark return. The new module adds regression-based alpha and factor exposure analysis. ## Design Overview Add a new module `factor_attribution.py` with four responsibilities: 1. Load and cache factor datasets 2. Build local extension and proxy factors from existing price data 3. Run regression models against strategy daily returns 4. Render summary tables and export detailed results `main.py` remains the orchestration point. It will continue running backtests and benchmark normalization, then optionally invoke attribution on the resulting daily return series. ## Module Structure ### `factor_attribution.py` Planned top-level responsibilities: - `load_external_us_factors(...)` - Download Ken French daily factor files - Parse, normalize, convert percent to decimal - Cache to `data/factors/` - Fall back to cache when network fetch fails - `build_extension_factors(price_data, benchmark, market)` - Build local daily factor return series for: - `MOM` - `LOWVOL` - `RECOVERY` - `build_proxy_core_factors(price_data, benchmark, market)` - Used mainly for CN or when external factors are unavailable - Build daily proxy series for: - `MKT` - `SMB_PROXY` - `HML_PROXY` - `RMW_PROXY` - `CMA_PROXY` - `prepare_factor_models(...)` - Merge standard factors and local factors - Produce factor matrices for: - `capm` - `ff5` - `ff5plus` - `run_factor_regression(strategy_returns, factor_frame, risk_free_col)` - Fit OLS with intercept - Return alpha, annualized alpha, loadings, t-stats, p-values, R-squared, adjusted R-squared, residual volatility, date range, and observation count - `attribute_strategies(results_df, benchmark_series, price_data, market, model_selection)` - Convert equity curves to returns - Run attribution for each strategy - Return structured summary and long-form loadings tables - `print_attribution_summary(...)` - Render compact terminal output - `export_attribution(...)` - Write CSV outputs ## Data Sources ### US Standard Factors Preferred source: - Ken French daily factor datasets for: - Fama-French 5 Factors daily - Momentum daily if separately required Normalization rules: - Convert index to pandas `DatetimeIndex` - Convert values from percent to decimal returns - Keep `RF` as decimal daily risk-free rate Cache location: - `data/factors/ff5_us_daily.csv` - `data/factors/mom_us_daily.csv` If the source format changes or download fails: - Use the latest local cache if present - Otherwise fall back to local proxy factors and mark the run as `proxy_only` ### Local Price Inputs Reuse repo price caches: - US: `data/us.csv`, `data/us_open.csv` - CN: `data/cn.csv` Only adjusted close prices are required for attribution factor construction. ## Factor Definitions ### Standard Factors For US: - `MKT-RF`, `SMB`, `HML`, `RMW`, `CMA`, `RF` from external factor data ### Local Extension Factors These are built from the same universe already used by the repo. #### `MOM` - Cross-sectional momentum long-short factor - Rank stocks by 12-1 month return - Long top quantile, short bottom quantile - Equal weight within long and short legs - Factor return is long return minus short return #### `LOWVOL` - Cross-sectional low-volatility factor - Compute rolling volatility from daily returns - Long lowest-vol quantile, short highest-vol quantile - Equal weight within legs #### `RECOVERY` - Cross-sectional recovery factor - Rank stocks by distance from rolling 63-day low - Long strongest recovery names, short weakest recovery names - Equal weight within legs ### Proxy Core Factors Used for CN by default and as fallback for US. #### `MKT` - Benchmark daily return if benchmark exists - Otherwise equal-weight universe return #### `SMB_PROXY` - Size proxy using inverse price level or market-cap proxy when only price data is available - First iteration uses inverse price rank as a transparent proxy and explicitly labels it as proxy #### `HML_PROXY` - Value proxy using price-to-range or distance-to-trailing-low style signal - This is not a true book-to-market factor and must be labeled proxy #### `RMW_PROXY` - Profitability proxy from return consistency and stability #### `CMA_PROXY` - Investment proxy from asset trend smoothness or expansion/contraction behavior inferred from price action Proxy factors are included for model completeness, but the output must label them clearly as proxies rather than standard academic factors. ## Factor Construction Rules - All local factors use only information available up to date `t` to explain returns at `t+1` - No future data leakage - Factor series are daily return series, not ranks - Long-short factors should be approximately dollar-neutral - Missing values are allowed during warmup windows and dropped during model alignment - Quantile counts should adapt to available universe size ## Regression Models ### CAPM Model: - `strategy_excess_return ~ alpha + (MKT-RF)` ### FF5 Model: - `strategy_excess_return ~ alpha + MKT-RF + SMB + HML + RMW + CMA` ### FF5Plus Model: - `strategy_excess_return ~ alpha + MKT-RF + SMB + HML + RMW + CMA + MOM + LOWVOL + RECOVERY` ### Proxy Model For markets without standard factors: - `strategy_return ~ alpha + MKT + SMB_PROXY + HML_PROXY + RMW_PROXY + CMA_PROXY + MOM + LOWVOL + RECOVERY` The module should report which model family was actually used. ## Alignment Rules - Convert all equity curves to daily returns - Build factor frames at daily frequency - Join strategy returns and factor returns on date intersection - For standard factor models, subtract `RF` from strategy returns - Keep benchmark return separately for active return diagnostics, but not as a replacement for `MKT-RF` in standard factor models ## Output Schema ### Summary Output One row per strategy per model with fields including: - `strategy` - `market` - `model` - `factor_source` - `proxy_only` - `start_date` - `end_date` - `n_obs` - `alpha_daily` - `alpha_ann` - `alpha_t_stat` - `alpha_p_value` - `r_squared` - `adj_r_squared` - `residual_vol_ann` Selected factor loadings should also be flattened into summary columns when available: - `beta_mkt` - `beta_smb` - `beta_hml` - `beta_rmw` - `beta_cma` - `beta_mom` - `beta_lowvol` - `beta_recovery` ### Loadings Output Long-form table: - `strategy` - `model` - `factor` - `beta` - `t_stat` - `p_value` ## CLI Changes Add arguments to `main.py`: - `--attribution` - `--attribution-model {capm,ff5,ff5plus,all}` - `--attribution-export