Add factor attribution design spec
This commit is contained in:
376
docs/superpowers/specs/2026-04-07-factor-attribution-design.md
Normal file
376
docs/superpowers/specs/2026-04-07-factor-attribution-design.md
Normal file
@@ -0,0 +1,376 @@
|
||||
# Factor Attribution Design
|
||||
|
||||
Date: 2026-04-07
|
||||
Repo: `/Users/gahow/projects/quant`
|
||||
|
||||
## Goal
|
||||
|
||||
Add a factor attribution module that explains strategy returns using:
|
||||
|
||||
- Standard external US factors when available: `MKT-RF`, `SMB`, `HML`, `RMW`, `CMA`, `RF`
|
||||
- Local price-derived extension factors: `MOM`, `LOWVOL`, `RECOVERY`
|
||||
- Local proxy fallback factors for markets without standard external data
|
||||
|
||||
The module must integrate with the current backtest workflow, reuse existing strategy equity curves, cache downloaded factor data locally, and produce both terminal summaries and exportable tabular outputs.
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- New factor attribution module for research backtests
|
||||
- US support using external standard factors plus local extension factors
|
||||
- CN support using local proxy factors only
|
||||
- CAPM, FF5, and FF5-plus-extension models
|
||||
- CLI flags in `main.py` to enable attribution and export results
|
||||
- Tests for parsing, factor construction, and regression behavior
|
||||
|
||||
Out of scope for this iteration:
|
||||
|
||||
- Intraday attribution
|
||||
- Portfolio optimizer changes
|
||||
- Live trader attribution in `trader.py`
|
||||
- Notebook or plotting UI for attribution results
|
||||
- External fundamental datasets beyond standard downloadable factor files
|
||||
|
||||
## Existing Context
|
||||
|
||||
The repo already has:
|
||||
|
||||
- A vectorized backtest engine in `main.py`
|
||||
- Strategy implementations that produce daily target weights
|
||||
- Performance metrics in `metrics.py`
|
||||
- Local daily price caches in `data/us.csv`, `data/us_open.csv`, `data/cn.csv`
|
||||
|
||||
Current "alpha" in `trader.py simulate` is only total return minus benchmark return. The new module adds regression-based alpha and factor exposure analysis.
|
||||
|
||||
## Design Overview
|
||||
|
||||
Add a new module `factor_attribution.py` with four responsibilities:
|
||||
|
||||
1. Load and cache factor datasets
|
||||
2. Build local extension and proxy factors from existing price data
|
||||
3. Run regression models against strategy daily returns
|
||||
4. Render summary tables and export detailed results
|
||||
|
||||
`main.py` remains the orchestration point. It will continue running backtests and benchmark normalization, then optionally invoke attribution on the resulting daily return series.
|
||||
|
||||
## Module Structure
|
||||
|
||||
### `factor_attribution.py`
|
||||
|
||||
Planned top-level responsibilities:
|
||||
|
||||
- `load_external_us_factors(...)`
|
||||
- Download Ken French daily factor files
|
||||
- Parse, normalize, convert percent to decimal
|
||||
- Cache to `data/factors/`
|
||||
- Fall back to cache when network fetch fails
|
||||
|
||||
- `build_extension_factors(price_data, benchmark, market)`
|
||||
- Build local daily factor return series for:
|
||||
- `MOM`
|
||||
- `LOWVOL`
|
||||
- `RECOVERY`
|
||||
|
||||
- `build_proxy_core_factors(price_data, benchmark, market)`
|
||||
- Used mainly for CN or when external factors are unavailable
|
||||
- Build daily proxy series for:
|
||||
- `MKT`
|
||||
- `SMB_PROXY`
|
||||
- `HML_PROXY`
|
||||
- `RMW_PROXY`
|
||||
- `CMA_PROXY`
|
||||
|
||||
- `prepare_factor_models(...)`
|
||||
- Merge standard factors and local factors
|
||||
- Produce factor matrices for:
|
||||
- `capm`
|
||||
- `ff5`
|
||||
- `ff5plus`
|
||||
|
||||
- `run_factor_regression(strategy_returns, factor_frame, risk_free_col)`
|
||||
- Fit OLS with intercept
|
||||
- Return alpha, annualized alpha, loadings, t-stats, p-values, R-squared, adjusted R-squared, residual volatility, date range, and observation count
|
||||
|
||||
- `attribute_strategies(results_df, benchmark_series, price_data, market, model_selection)`
|
||||
- Convert equity curves to returns
|
||||
- Run attribution for each strategy
|
||||
- Return structured summary and long-form loadings tables
|
||||
|
||||
- `print_attribution_summary(...)`
|
||||
- Render compact terminal output
|
||||
|
||||
- `export_attribution(...)`
|
||||
- Write CSV outputs
|
||||
|
||||
## Data Sources
|
||||
|
||||
### US Standard Factors
|
||||
|
||||
Preferred source:
|
||||
|
||||
- Ken French daily factor datasets for:
|
||||
- Fama-French 5 Factors daily
|
||||
- Momentum daily if separately required
|
||||
|
||||
Normalization rules:
|
||||
|
||||
- Convert index to pandas `DatetimeIndex`
|
||||
- Convert values from percent to decimal returns
|
||||
- Keep `RF` as decimal daily risk-free rate
|
||||
|
||||
Cache location:
|
||||
|
||||
- `data/factors/ff5_us_daily.csv`
|
||||
- `data/factors/mom_us_daily.csv`
|
||||
|
||||
If the source format changes or download fails:
|
||||
|
||||
- Use the latest local cache if present
|
||||
- Otherwise fall back to local proxy factors and mark the run as `proxy_only`
|
||||
|
||||
### Local Price Inputs
|
||||
|
||||
Reuse repo price caches:
|
||||
|
||||
- US: `data/us.csv`, `data/us_open.csv`
|
||||
- CN: `data/cn.csv`
|
||||
|
||||
Only adjusted close prices are required for attribution factor construction.
|
||||
|
||||
## Factor Definitions
|
||||
|
||||
### Standard Factors
|
||||
|
||||
For US:
|
||||
|
||||
- `MKT-RF`, `SMB`, `HML`, `RMW`, `CMA`, `RF` from external factor data
|
||||
|
||||
### Local Extension Factors
|
||||
|
||||
These are built from the same universe already used by the repo.
|
||||
|
||||
#### `MOM`
|
||||
|
||||
- Cross-sectional momentum long-short factor
|
||||
- Rank stocks by 12-1 month return
|
||||
- Long top quantile, short bottom quantile
|
||||
- Equal weight within long and short legs
|
||||
- Factor return is long return minus short return
|
||||
|
||||
#### `LOWVOL`
|
||||
|
||||
- Cross-sectional low-volatility factor
|
||||
- Compute rolling volatility from daily returns
|
||||
- Long lowest-vol quantile, short highest-vol quantile
|
||||
- Equal weight within legs
|
||||
|
||||
#### `RECOVERY`
|
||||
|
||||
- Cross-sectional recovery factor
|
||||
- Rank stocks by distance from rolling 63-day low
|
||||
- Long strongest recovery names, short weakest recovery names
|
||||
- Equal weight within legs
|
||||
|
||||
### Proxy Core Factors
|
||||
|
||||
Used for CN by default and as fallback for US.
|
||||
|
||||
#### `MKT`
|
||||
|
||||
- Benchmark daily return if benchmark exists
|
||||
- Otherwise equal-weight universe return
|
||||
|
||||
#### `SMB_PROXY`
|
||||
|
||||
- Size proxy using inverse price level or market-cap proxy when only price data is available
|
||||
- First iteration uses inverse price rank as a transparent proxy and explicitly labels it as proxy
|
||||
|
||||
#### `HML_PROXY`
|
||||
|
||||
- Value proxy using price-to-range or distance-to-trailing-low style signal
|
||||
- This is not a true book-to-market factor and must be labeled proxy
|
||||
|
||||
#### `RMW_PROXY`
|
||||
|
||||
- Profitability proxy from return consistency and stability
|
||||
|
||||
#### `CMA_PROXY`
|
||||
|
||||
- Investment proxy from asset trend smoothness or expansion/contraction behavior inferred from price action
|
||||
|
||||
Proxy factors are included for model completeness, but the output must label them clearly as proxies rather than standard academic factors.
|
||||
|
||||
## Factor Construction Rules
|
||||
|
||||
- All local factors use only information available up to date `t` to explain returns at `t+1`
|
||||
- No future data leakage
|
||||
- Factor series are daily return series, not ranks
|
||||
- Long-short factors should be approximately dollar-neutral
|
||||
- Missing values are allowed during warmup windows and dropped during model alignment
|
||||
- Quantile counts should adapt to available universe size
|
||||
|
||||
## Regression Models
|
||||
|
||||
### CAPM
|
||||
|
||||
Model:
|
||||
|
||||
- `strategy_excess_return ~ alpha + (MKT-RF)`
|
||||
|
||||
### FF5
|
||||
|
||||
Model:
|
||||
|
||||
- `strategy_excess_return ~ alpha + MKT-RF + SMB + HML + RMW + CMA`
|
||||
|
||||
### FF5Plus
|
||||
|
||||
Model:
|
||||
|
||||
- `strategy_excess_return ~ alpha + MKT-RF + SMB + HML + RMW + CMA + MOM + LOWVOL + RECOVERY`
|
||||
|
||||
### Proxy Model
|
||||
|
||||
For markets without standard factors:
|
||||
|
||||
- `strategy_return ~ alpha + MKT + SMB_PROXY + HML_PROXY + RMW_PROXY + CMA_PROXY + MOM + LOWVOL + RECOVERY`
|
||||
|
||||
The module should report which model family was actually used.
|
||||
|
||||
## Alignment Rules
|
||||
|
||||
- Convert all equity curves to daily returns
|
||||
- Build factor frames at daily frequency
|
||||
- Join strategy returns and factor returns on date intersection
|
||||
- For standard factor models, subtract `RF` from strategy returns
|
||||
- Keep benchmark return separately for active return diagnostics, but not as a replacement for `MKT-RF` in standard factor models
|
||||
|
||||
## Output Schema
|
||||
|
||||
### Summary Output
|
||||
|
||||
One row per strategy per model with fields including:
|
||||
|
||||
- `strategy`
|
||||
- `market`
|
||||
- `model`
|
||||
- `factor_source`
|
||||
- `proxy_only`
|
||||
- `start_date`
|
||||
- `end_date`
|
||||
- `n_obs`
|
||||
- `alpha_daily`
|
||||
- `alpha_ann`
|
||||
- `alpha_t_stat`
|
||||
- `alpha_p_value`
|
||||
- `r_squared`
|
||||
- `adj_r_squared`
|
||||
- `residual_vol_ann`
|
||||
|
||||
Selected factor loadings should also be flattened into summary columns when available:
|
||||
|
||||
- `beta_mkt`
|
||||
- `beta_smb`
|
||||
- `beta_hml`
|
||||
- `beta_rmw`
|
||||
- `beta_cma`
|
||||
- `beta_mom`
|
||||
- `beta_lowvol`
|
||||
- `beta_recovery`
|
||||
|
||||
### Loadings Output
|
||||
|
||||
Long-form table:
|
||||
|
||||
- `strategy`
|
||||
- `model`
|
||||
- `factor`
|
||||
- `beta`
|
||||
- `t_stat`
|
||||
- `p_value`
|
||||
|
||||
## CLI Changes
|
||||
|
||||
Add arguments to `main.py`:
|
||||
|
||||
- `--attribution`
|
||||
- `--attribution-model {capm,ff5,ff5plus,all}`
|
||||
- `--attribution-export <dir>`
|
||||
|
||||
Behavior:
|
||||
|
||||
- If `--attribution` is not set, current behavior is unchanged
|
||||
- If set, attribution runs after backtest metrics are printed
|
||||
- If export path is set, write:
|
||||
- `summary.csv`
|
||||
- `loadings.csv`
|
||||
|
||||
## Terminal Reporting
|
||||
|
||||
For each strategy and selected model, print a compact line containing:
|
||||
|
||||
- annualized alpha
|
||||
- major factor loadings
|
||||
- R-squared
|
||||
- residual volatility
|
||||
|
||||
After the numeric table, print a short interpretation section:
|
||||
|
||||
- whether alpha remains after adding factors
|
||||
- which factors explain most of the strategy
|
||||
- whether the model fit is weak or strong
|
||||
|
||||
Interpretation should remain descriptive and avoid overclaiming statistical significance.
|
||||
|
||||
## Error Handling
|
||||
|
||||
- External factor download failure:
|
||||
- Use cache if available
|
||||
- Otherwise downgrade to proxy mode
|
||||
- Missing or short overlap window:
|
||||
- Skip that model and report insufficient data
|
||||
- Singular matrix or severe multicollinearity:
|
||||
- Catch and report model failure or unstable fit
|
||||
- Missing benchmark column:
|
||||
- Fall back to equal-weight universe market proxy where possible
|
||||
|
||||
## Testing Plan
|
||||
|
||||
### Unit Tests
|
||||
|
||||
- External factor parser converts dates and percent units correctly
|
||||
- Cache loader returns cached data on download failure
|
||||
- Extension factor builders produce expected columns and no future leakage
|
||||
- Regression on synthetic data recovers approximate known alpha and betas
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- End-to-end attribution on a small deterministic equity and factor dataset
|
||||
- CLI export produces expected files and columns
|
||||
|
||||
### Regression Tests
|
||||
|
||||
- Fixed local US sample produces stable output shape and model naming
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- Prefer `numpy.linalg.lstsq` or `scipy` OLS utilities already available in dependencies
|
||||
- Keep implementation dependency-light
|
||||
- Keep factor construction functions separate from regression code for testability
|
||||
- Avoid changing existing strategy behavior
|
||||
|
||||
## Risks
|
||||
|
||||
- Standard factor downloads may change source file formatting
|
||||
- Proxy factor definitions for CN will be weaker than true academic factors
|
||||
- Some strategy returns may be highly collinear with momentum-like factors, reducing interpretability
|
||||
- Short or overlapping warmup windows can materially reduce sample size
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- A user can run backtests with `--attribution` and receive factor-based explanations of returns
|
||||
- US runs use standard external factors when available
|
||||
- CN runs still produce a clearly labeled proxy attribution report
|
||||
- Outputs distinguish residual alpha from factor exposure
|
||||
- The module is easy to extend with new factors later
|
||||
Reference in New Issue
Block a user