Yufeng Han, Ai He, David E Rapach, Guofu Zhou
Review of Finance, Volume 28, Issue 6, November 2024, Pages 1807–1831, https://doi.org/10.1093/rof/rfae027
The Fama-MacBeth regression framework is a workhorse method for analyzing cross-sectional expected stock returns. In this paper, we extend the Fama-MacBeth framework for cross-sectional return prediction to incorporate big data and machine learning. Our extension—what we call the E-LASSO approach—involves a three-step procedure for generating cross-sectional out-of-sample return forecasts based on Fama-MacBeth regressions with regularization and predictor selection as well as forecast combination and encompassing. As a byproduct, it produces estimates of characteristic payoffs that are designed to avoid overfitting, thereby providing more reliable payoff estimates. Our new E-LASSO method adds a unique procedure to the list of machine learning applications for predicting cross-sectional returns. Compared to more complicated machine learning procedures, the E-LASSO approach is straightforward to implement and interpret.
We also develop three performance measures for assessing cross-sectional return forecasts. First, we provide a generalization of the popular time-series out-of-sample R-squared statistic to the cross section. Second, we propose an encompassing test to compare the information content in cross-sectional return forecasts generating by two competing models. Third, we develop a graphical device for assessing the consistency of the accuracy of the cross-sectional return forecasts produced by a model over time.
We use our E-LASSO approach to generate cross-sectional stock return forecasts based on over 200 firm characteristics for the 1970:01 to 2021:12 out-of-sample period. The E-LASSO forecasts significantly improve out-of-sample predictive accuracy relative to a naïve benchmark that ignores the information in the firm characteristics. They also provide more accurate forecasts than more complicated machine learning models, such as a random forest and deep neural network. Based on the estimated payoffs produced by the E-LASSO approach, a relatively large number of characteristics matter for determining cross-sectional expected returns.
Finally, we use the E-LASSO forecasts to construct a monthly value-weighted hedge portfolio that goes long (short) stocks with the highest (lowest) return forecasts. This portfolio provides substantial value to an investor, delivering an annualized Sharpe ratio of 1.65 (compared to a Sharpe ratio of 0.47 for the CRSP value-weighted market portfolio over the out-of-sample period). As shown by the accompanying Figure 3 from the paper, the E-LASSO portfolio generates gains consistently over time, and it performs especially well during business-cycle recessions (when the market portfolio suffers sizable losses).