Files
quant/docs/superpowers/specs/2026-04-07-factor-attribution-design.md

10 KiB

Factor Attribution Design

Date: 2026-04-07 Repo: /Users/gahow/projects/quant

Goal

Add a factor attribution module that explains strategy returns using:

  • Standard external US factors when available: MKT-RF, SMB, HML, RMW, CMA, RF
  • Local price-derived extension factors: MOM, LOWVOL, RECOVERY
  • Local proxy fallback factors for markets without standard external data

The module must integrate with the current backtest workflow, reuse existing strategy equity curves, cache downloaded factor data locally, and produce both terminal summaries and exportable tabular outputs.

Scope

In scope:

  • New factor attribution module for research backtests
  • US support using external standard factors plus local extension factors
  • CN support using local proxy factors only
  • CAPM, FF5, and FF5-plus-extension models
  • CLI flags in main.py to enable attribution and export results
  • Tests for parsing, factor construction, and regression behavior

Out of scope for this iteration:

  • Intraday attribution
  • Portfolio optimizer changes
  • Live trader attribution in trader.py
  • Notebook or plotting UI for attribution results
  • External fundamental datasets beyond standard downloadable factor files

Existing Context

The repo already has:

  • A vectorized backtest engine in main.py
  • Strategy implementations that produce daily target weights
  • Performance metrics in metrics.py
  • Local daily price caches in data/us.csv, data/us_open.csv, data/cn.csv

Current "alpha" in trader.py simulate is only total return minus benchmark return. The new module adds regression-based alpha and factor exposure analysis.

Design Overview

Add a new module factor_attribution.py with four responsibilities:

  1. Load and cache factor datasets
  2. Build local extension and proxy factors from existing price data
  3. Run regression models against strategy daily returns
  4. Render summary tables and export detailed results

main.py remains the orchestration point. It will continue running backtests and benchmark normalization, then optionally invoke attribution on the resulting daily return series.

Module Structure

factor_attribution.py

Planned top-level responsibilities:

  • load_external_us_factors(...)

    • Download Ken French daily factor files
    • Parse, normalize, convert percent to decimal
    • Cache to data/factors/
    • Fall back to cache when network fetch fails
  • build_extension_factors(price_data, benchmark, market)

    • Build local daily factor return series for:
      • MOM
      • LOWVOL
      • RECOVERY
  • build_proxy_core_factors(price_data, benchmark, market)

    • Used mainly for CN or when external factors are unavailable
    • Build daily proxy series for:
      • MKT
      • SMB_PROXY
      • HML_PROXY
      • RMW_PROXY
      • CMA_PROXY
  • prepare_factor_models(...)

    • Merge standard factors and local factors
    • Produce factor matrices for:
      • capm
      • ff5
      • ff5plus
  • run_factor_regression(strategy_returns, factor_frame, risk_free_col)

    • Fit OLS with intercept
    • Return alpha, annualized alpha, loadings, t-stats, p-values, R-squared, adjusted R-squared, residual volatility, date range, and observation count
  • attribute_strategies(results_df, benchmark_series, price_data, market, model_selection)

    • Convert equity curves to returns
    • Run attribution for each strategy
    • Return structured summary and long-form loadings tables
  • print_attribution_summary(...)

    • Render compact terminal output
  • export_attribution(...)

    • Write CSV outputs

Data Sources

US Standard Factors

Preferred source:

  • Ken French daily factor datasets for:
    • Fama-French 5 Factors daily
    • Momentum daily if separately required

Normalization rules:

  • Convert index to pandas DatetimeIndex
  • Convert values from percent to decimal returns
  • Keep RF as decimal daily risk-free rate

Cache location:

  • data/factors/ff5_us_daily.csv
  • data/factors/mom_us_daily.csv

If the source format changes or download fails:

  • Use the latest local cache if present
  • Otherwise fall back to local proxy factors and mark the run as proxy_only

Local Price Inputs

Reuse repo price caches:

  • US: data/us.csv, data/us_open.csv
  • CN: data/cn.csv

Only adjusted close prices are required for attribution factor construction.

Factor Definitions

Standard Factors

For US:

  • MKT-RF, SMB, HML, RMW, CMA, RF from external factor data

Local Extension Factors

These are built from the same universe already used by the repo.

MOM

  • Cross-sectional momentum long-short factor
  • Rank stocks by 12-1 month return
  • Long top quantile, short bottom quantile
  • Equal weight within long and short legs
  • Factor return is long return minus short return

LOWVOL

  • Cross-sectional low-volatility factor
  • Compute rolling volatility from daily returns
  • Long lowest-vol quantile, short highest-vol quantile
  • Equal weight within legs

RECOVERY

  • Cross-sectional recovery factor
  • Rank stocks by distance from rolling 63-day low
  • Long strongest recovery names, short weakest recovery names
  • Equal weight within legs

Proxy Core Factors

Used for CN by default and as fallback for US.

MKT

  • Benchmark daily return if benchmark exists
  • Otherwise equal-weight universe return

SMB_PROXY

  • Size proxy using inverse price level or market-cap proxy when only price data is available
  • First iteration uses inverse price rank as a transparent proxy and explicitly labels it as proxy

HML_PROXY

  • Value proxy using price-to-range or distance-to-trailing-low style signal
  • This is not a true book-to-market factor and must be labeled proxy

RMW_PROXY

  • Profitability proxy from return consistency and stability

CMA_PROXY

  • Investment proxy from asset trend smoothness or expansion/contraction behavior inferred from price action

Proxy factors are included for model completeness, but the output must label them clearly as proxies rather than standard academic factors.

Factor Construction Rules

  • All local factors use only information available up to date t to explain returns at t+1
  • No future data leakage
  • Factor series are daily return series, not ranks
  • Long-short factors should be approximately dollar-neutral
  • Missing values are allowed during warmup windows and dropped during model alignment
  • Quantile counts should adapt to available universe size

Regression Models

CAPM

Model:

  • strategy_excess_return ~ alpha + (MKT-RF)

FF5

Model:

  • strategy_excess_return ~ alpha + MKT-RF + SMB + HML + RMW + CMA

FF5Plus

Model:

  • strategy_excess_return ~ alpha + MKT-RF + SMB + HML + RMW + CMA + MOM + LOWVOL + RECOVERY

Proxy Model

For markets without standard factors:

  • strategy_return ~ alpha + MKT + SMB_PROXY + HML_PROXY + RMW_PROXY + CMA_PROXY + MOM + LOWVOL + RECOVERY

The module should report which model family was actually used.

Alignment Rules

  • Convert all equity curves to daily returns
  • Build factor frames at daily frequency
  • Join strategy returns and factor returns on date intersection
  • For standard factor models, subtract RF from strategy returns
  • Keep benchmark return separately for active return diagnostics, but not as a replacement for MKT-RF in standard factor models

Output Schema

Summary Output

One row per strategy per model with fields including:

  • strategy
  • market
  • model
  • factor_source
  • proxy_only
  • start_date
  • end_date
  • n_obs
  • alpha_daily
  • alpha_ann
  • alpha_t_stat
  • alpha_p_value
  • r_squared
  • adj_r_squared
  • residual_vol_ann

Selected factor loadings should also be flattened into summary columns when available:

  • beta_mkt
  • beta_smb
  • beta_hml
  • beta_rmw
  • beta_cma
  • beta_mom
  • beta_lowvol
  • beta_recovery

Loadings Output

Long-form table:

  • strategy
  • model
  • factor
  • beta
  • t_stat
  • p_value

CLI Changes

Add arguments to main.py:

  • --attribution
  • --attribution-model {capm,ff5,ff5plus,all}
  • --attribution-export <dir>

Behavior:

  • If --attribution is not set, current behavior is unchanged
  • If set, attribution runs after backtest metrics are printed
  • If export path is set, write:
    • summary.csv
    • loadings.csv

Terminal Reporting

For each strategy and selected model, print a compact line containing:

  • annualized alpha
  • major factor loadings
  • R-squared
  • residual volatility

After the numeric table, print a short interpretation section:

  • whether alpha remains after adding factors
  • which factors explain most of the strategy
  • whether the model fit is weak or strong

Interpretation should remain descriptive and avoid overclaiming statistical significance.

Error Handling

  • External factor download failure:
    • Use cache if available
    • Otherwise downgrade to proxy mode
  • Missing or short overlap window:
    • Skip that model and report insufficient data
  • Singular matrix or severe multicollinearity:
    • Catch and report model failure or unstable fit
  • Missing benchmark column:
    • Fall back to equal-weight universe market proxy where possible

Testing Plan

Unit Tests

  • External factor parser converts dates and percent units correctly
  • Cache loader returns cached data on download failure
  • Extension factor builders produce expected columns and no future leakage
  • Regression on synthetic data recovers approximate known alpha and betas

Integration Tests

  • End-to-end attribution on a small deterministic equity and factor dataset
  • CLI export produces expected files and columns

Regression Tests

  • Fixed local US sample produces stable output shape and model naming

Implementation Notes

  • Prefer numpy.linalg.lstsq or scipy OLS utilities already available in dependencies
  • Keep implementation dependency-light
  • Keep factor construction functions separate from regression code for testability
  • Avoid changing existing strategy behavior

Risks

  • Standard factor downloads may change source file formatting
  • Proxy factor definitions for CN will be weaker than true academic factors
  • Some strategy returns may be highly collinear with momentum-like factors, reducing interpretability
  • Short or overlapping warmup windows can materially reduce sample size

Success Criteria

  • A user can run backtests with --attribution and receive factor-based explanations of returns
  • US runs use standard external factors when available
  • CN runs still produce a clearly labeled proxy attribution report
  • Outputs distinguish residual alpha from factor exposure
  • The module is easy to extend with new factors later