Tarun Ramadorai, Antoine Uettwiller, and Ansgar Walther
Review of Finance, Volume 29, Issue 5, September 2025, Pages 1337–1367, https://doi.org/10.1093/rof/rfaf017
In this paper, we examine the heterogeneity in firms’ data extraction practices and privacy policies using a comprehensive dataset covering over 5,000 U.S. firms from 2016 to 2023. We analyze how firms collect, extract, and share consumer data while navigating privacy risks and regulatory requirements. Our study documents key patterns in privacy policy disclosures, web tracking practices, and cybersecurity risks, while also assessing the impact of major regulatory changes such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).
We find that privacy policies vary significantly within industries, indicating that firms do not simply adopt standardized boilerplate language. Instead, privacy policy differences are closely linked to firms’ technical sophistication and data extraction strategies. Firms with intermediate technical sophistication tend to follow a “collect and share” model, where they gather large amounts of consumer data and share it with third parties for processing. These firms face higher cybersecurity risks as a result. In contrast, firms with high technical sophistication follow a “receive and process” model, where they acquire data from others and process it in-house, reducing third-party tracking and mitigating privacy-related risks.
Our analysis also highlights a strong relationship between firm size and data extraction intensity, with larger firms engaging in more extensive data collection. These firms tend to write longer privacy policies, which appear to serve as a legal hedge rather than a tool to inform consumers. Furthermore, firms that engage in significant data extraction are more likely to report privacy risks in their financial disclosures and experience cybersecurity incidents, leading to greater financial exposure from data breaches.
We also document substantial shifts in firms’ privacy policies in response to evolving regulations. Over time, privacy policies have become more visible, longer, and more transparent, suggesting that regulatory pressures have influenced firm behavior. However, these changes have not necessarily reduced firms’ reliance on data collection.
Overall, our findings provide empirical support for a two-tier data market, where firms of intermediate sophistication take on greater cybersecurity risks.
Figure 1
