We introduce a novel strategy to predict the equity premium by extracting news from more than 700000 newspaper articles, which were published in The New York Times and Washington Post between 1980 and 2018. The information is extracted and quantified by a statistical machine learning algorithm, namely the correlated topic model (CTM). We mimic the information set of a real-time investor and construct a continuous time series that tracks which type of news has been discussed at which point in time. The higher the media coverage of a certain topic, the higher is our assigned importance to that topic in that given point in time. We average these so-called topic proportions on a monthly basis, which are then used as predictors for the monthly excess returns of the S&P 500 index.
We devise a simple, yet flexible econometric strategy for mapping the estimated predictors into an overall point forecast of the equity premium. This approach effectively deals with estimation error and protects against over-fitting. We first apply univariate regressions to forecast the equity premium. We then aggregate those univariate forecasts in a data-adaptive manner by switching between model selection and model averaging, based on the recent forecasting performance. Model selection predicts the equity premium relying upon only one forecast. Model averaging predicts the equity premium computing the simple mean over all forecasts. Switching between model selection and model averaging can lower estimation error, while at the same time allowing to quickly adapt to changing market environments.
Our empirical results document large economic and statistical out-of-sample (OOS) gains. More precisely, we obtain an OOS R2 of 6.52% and and sizeable utility gains for a mean-variance investor. The news-based aggregate forecast embeds predictive information that is not captured by established economic predictors of the equity premium. In line with previous empirical findings, our forecasting gains are achieved in down markets. We link our results to an economic learning mechanism that is capable of explaining why predictability is concentrated in bad times. We further document that geopolitical news are at times more valuable than economic news to predict the equity premium. This particularly holds true for the years between 1999 and 2006, which covers the 9/11 attacks, the Afghanistan War, and the Iraq War.