(This post is thanks to Editor Amit Goyal)

The lead article in Volume 23, Issue 1 of the *Review of Finance* is “Which Factors?” by Kewei Hou, Haitao Mo, Chen Xue, and Lu Zhang.

A *factor model* proposes why different stocks have different returns. The most famous is the Capital Asset Pricing Model, which argues that assets have higher returns if they are more sensitive to movements in the broader market, because that makes them riskier. Kewei, Haitao, Chen, and Lu’s paper compares the power of various factor models in explaining asset returns. The authors find that the best-performing models are *q*-factor models. These models start from the basic finance principle that companies should invest more when their cost of capital is lower – so you can back out a firm’s cost of capital (expected returns) from its investment decisions.

**Factor Models**

In 1993, Fama and French introduced their celebrated 3-factor model, which proposed that stock returns depend on their sensitivity to not only the market (as in the CAPM) but two additional factors based on size and valuation (measured by the book-to-market ratio). More than 25 years later, there is a renewed resurgence in different factor models that can explain the numerous anomalies in cross-sectional asset pricing (see Harvey, Liu, and Zhu (2016) for a recent review of these anomalies). Some of the most prominent examples are:

Model | Authors | #Factors | Factors |

q |
Hou, Xue, and Zhang (2015) | 4 | Market (MKT), size (Me), investment (I/A), and profitability (Roe) |

q^{5} |
Hou, Mo, Xue, and Zhang (2018) | 5 | Market (MKT), size (Me), investment (I/A), profitability (Roe), and expected growth (Eg) |

FF5 | Fama and French (2015) | 5 | Market (MKT), size (SMB), value (HML), investment (CMA), and profitability (RMW) |

FF6 | Fama and French (2018) | 6 | Market (MKT), size (SMB), value (HML), investment (CMA), profitability (RMW) or cash-based profitability (RMWc), and momentum (UMD) |

SY | Stambaugh and Yuan (2017) | 4 | Market (MKT), size (SMB), management (MGMT), and performance (PERF) |

BS | Barillas and Shanken (2018) | 6 | Market (MKT), size (SMB), value (HML^{m}), investment (I/A), profitability (Roe), and momentum (UMD) |

DHS | Daniel, Hirshleifer, and Sun (2018) | 3 | Market (MKT), financing (FIN), and post-earnings-announcement-drift (PEAD) |

These studies differ not only in terms of which factors they include, but also how the factors are constructed. For instance:

- Fama and French construct the value factor (HML) using annually updated variables while Barillas and Shanken use monthly updating (HML
^{m}). - The profitability factor is constructed using different variables in Fama and French models (RMW) versus that in
*q*-models (Roe). - Some studies use 30/70 cutoffs (e.g. when the size factor is calculated, large firms are defined as being above the 70th size percentile and small firms as being below the 30th) while others use 20/80 cutoffs, etc.

**Choosing Between Models**

How do you demonstrate the effectiveness of your favorite factor model? One way is to show that it explains the cross-section of returns. This boils down to calculating alphas from time-series regressions on a set of test assets/portfolios – the so-called left-hand-side (LHS) approach to judge models. (The *alpha* is the intercept of the regression, and measures extra return an asset earns over and above what the factor model would predict, and thus measures the model’s error). While most studies indeed use this approach, the results are sometimes hard to compare across different factor models because different studies use different test portfolios. It is also well-known (see, for example, Lewellen, Nagel, and Shanken (2010)) that the choice of test assets is not innocuous.

Recent work by Barillas and Shanken (2017, 2018) offers a simple and elegant way out of these difficulties. These authors propose a right-hand-side (RHS) approach that uses *spanning regressions* to judge whether individual factors add explanatory power. Loosely speaking, you regress a factor from a candidate model on all the factors in another benchmark model. If the intercept is non-zero, the candidate factor/model is useful – it adds to the explanatory power of the benchmark model. If the intercept is zero, the candidate factor provides no incremental information (see Gibbons, Ross, and Shanken (1985) for the original illustration of the idea). Thus, this RHS approach side steps the thorny issue of test assets as it involves calculating only the alphas of the factors from one model on the factors from another model, and vice-versa. This is the primary approach adopted in the current paper.

**The Results**

The authors find that the *q*– and *q*5-models largely subsume the Fama and French (FF) models. From January 1967 to December 2016, the alphas of the value, investment, profitability, and momentum factors (HML, CMA, RMW, and UMD) in the FF-models relative to the *q*-model are economically small (maximum is 0.12% per month) and statistically insignificant, although the cash-based profitability factor (RMWc) from FF6 has a *q*-model alpha of 0.25% (*t* = 3.83). In contrast, the investment and profitability factors (I/A and Roe) have economically large alphas (maximum is 0.80% per month) when regressed on the FF-model, which are strongly significant. Thus, the *q*-model has significant explanatory power over relative to the FF-models.

The Stambaugh and Yuan (SY) factors and the Daniel, Hirshleifer, and Sun (DHS) factors both have significant alphas relative to the *q*-model. Thus, they add incremental value to the *q*-model. At the same time, the *q*-model factors also have alphas relative to the SY model, and the investment (I/A) factor in the *q*-model has alpha relative to the DHS model. Finally, the monthly-formed HML^{m} factor in the Barillas and Shanken (BS) model also has alpha relative to the *q*– model.

The authors also reconstruct some of the factors from other studies in a more consistent way for example, by using NYSE breakpoints and the 30^{th}/70^{th} percentiles. They find that the performance of these reconstructed factors is different from that of the original factors. For example, the reconstructed performance factor (PERF) in the Stambaugh and Yuan model does not have an alpha relative to the *q*-model. Interestingly, the authors report that the reconstructed factors have high correlations with the factors in the *q*-model.

**Ideas for Future Research**

The authors’ results are very thought-provoking. They have done a very nice job of putting the factors in different models on the same pedestal before comparing them. The idea of recreating some of the factors to have a consistent method of construction is also a good one. Future work could try to address other unresolved issues, some of which I mention below.

All methods of factor construction are ad-hoc. For instance, there is nothing magical about a 30-70 or a 20-80 split—one is not better than the other. Stambaugh and Yuan explicitly recognize this possibility and mention their motivation for choosing the 20-80 split. Similarly, annual/quarterly/monthly updating of variables in the factors could be driven by data considerations (I have not seen a reconstruction of the Fama and French factors using quarterly accounting data; it remains an open question how these reconstructed factors would fare against the quarterly-formed factors in the *q*-model). The authors’ work is useful in highlighting the importance of these choices. Nevertheless, one still has little guidance on how to construct better factors. Some of these decisions will be governed by the objective functions while others may rely on statistical criterion (Grinblatt and Saxena (2017) is an important contribution in this area).

Spanning tests essentially compare Sharpe ratios from one set of factors to another set of factors. As mentioned by Barillas and Shanken (2017), comparing nested models using these metrics is straightforward but comparing non-nested models is not. An additional complication is that a simple comparison of Sharpe ratios (or GRS statistics) ignores their sampling variation. Fama and French (2018) use simulations to get a sense of the sampling variation but more formal work would be welcome.

Finally, while the RHS approach is elegant and useful, many readers remain interested in understanding how factors explain LHS returns. For example, practitioners would like to know which alphas could be generated. In this sense, it is still useful to know how the factors explain anomalies. Therefore, extensions of Fama and French (2016, 2017) and Hou, Xue, and Zhang (2015, 2018) would be welcome.