Companies collect enormous datasets of the behavior of consumers. These datasets are extremely valuable, as they allow companies to more accurately match customers with services. (Think of Akbarpour, Li, and Gharan (2017) here: if firms are able to identify who is about to leave a marketplace, they are able to implement better algorithms). Companies have a perverse incentive to keep these datasets secret. Releasing information into the world might cause competitors to surpass them, which is really unfortunate, because with sufficiently detailed datasets, one can investigate anything.
Here are two really cool examples using data from rideshare companies. The first is Currier, Glaeser, and Kreindler (2023), which is able to not only estimate the roughness of roads across the entirety of the United States, but to figure out how much it actually matters. Uber is able to detect vertical acceleration — what we might call a bump. The more it bumps, after adjusting for speed, the rougher the road. They can take this, and the fact that Uber drivers are on the clock, and estimate a dollar value for road roughness from how much drivers slow down when faced with a rougher road. They find that a road with the median level of roughness costs a driver $1.05 per mile, with a totally smooth road costing $0.74 per mile, and a one standard deviation increase costs a driver an additional $0.23. We now have some idea of when it is optimal to repave roads! Simply multiply the number of drivers by the expected change in quality, and pave whenever the benefits exceed the cost. Tullock’s farmers, eat your heart out. This is especially relevant because, as they decisively show, road repaving is scarcely related to road roughness, or any other economic objectives, at all. To even approximate what Uber has, the NTSB has to send out cars to measure road roughness, and this lacks any estimate of the value which people assign to roads.
Or how about another paper from Uber, by Cook, Diamond, Hall, List, and Oyer (2020). Suppose we want to know why the difference in pay by gender exists. Uber sets their prices for drivers through an algorithm which hasn’t the slightest clue what the gender of its drivers are. The marginal returns to working are constant — working for eight hours earns you eight times as much as working for one hour. There is no wage negotiation, and they can show that there is not customer discrimination. Yet after all that, women still earn 7% less per hour then men. There are three reasons for this. First, male drivers are more willing to drive in rougher neighborhoods, and so part of the wage premium is a compensating differential. Men tend to have worked longer, which leads to higher wages through knowing when to cancel and when to accept jobs. And last of all, males simply drive faster. If wage gaps can persist as the result of differences in preferences, without any discrimination whatsoever, then perhaps wage gaps in the rest of economy are due to differences in preferences too.
None of this happens in a world where companies do not make their data public. This should be balanced against the desire of consumers to keep their data private. Jones and Tonetti (2020) suggest that the optimal starting point is to give the customers the rights to their data. This will not lead to the optimal allocation – the price paid, due to the market power of the companies, will be lower than optimal – but will alleviate concerns that firms won’t release their data because they fear creative destruction. Firms make investments into specialized things; a shakeup of the economy leaves them “holding the bag”.
There is absolutely no argument, however, for banning the sale of data. In their estimation, consumer welfare is only 40% of the optimum when data sales are banned. By contrast, when firms own the data, consumer welfare is 93% of optimal, and when consumers own the data, it’s greater than 99%. We have seen how laws which strictly regulate the sale of data have caused massive harm; the GDPR reduced the number of apps available by almost 32%. Jones (2022) reviews the research; studying it is challenging, because there is no true control group and the exact data where it begins affecting behavior is fuzzy, but the effects are so large as to be undeniable in the absence of any other plausible explanation. The cost of compliance has been quite large, with an estimate that Fortune 500 companies alone spending $7.8 billion on compliance, and 74% of small- and mid-sized organizations spending more than $100,000 on compliance.
Regulators must be careful not to give the consumer gifts that they neither want nor care for. “Privacy” is something which is all too easy to regard as sacred; but it, like everything, trades off with other things. To maximize privacy would be to the consumers' detriment, and so when US legislators consider banning the sale of data, we must strenuously resist it.
Great post.
Unintuitive but convincing; economics is wonderful.
The gender wage gap thing was super interesting to think about