Spurious Mobility in Historical Datasets
Measurements of variance are sensitive to error, more at 11
One of the primary things which economic historians do these days is link datasets, especially in job market papers where we want to show off our skills. We might, for example, have numerous censuses, and want to show that John Jackson Jr. in 1860 is the same John Jackson Jr. as in 1870, 1880, and 1890, and that his child is the John Jackson III found censuses beginning in 1880, and so on.
Ross Mattheis’s recent job market paper is on how errors in linking datasets systematically overstate the degree of mobility over time. This is related to the concept of regression dilution. In a linear regression, errors in measurement do not bias the slope if they occur only in the Y axis. Ordinary least squares regression is trying to find the line which minimizes the squared vertical distance from each of the data points. Since it is the vertical distance, variance from the true value will reduce the precision of the estimate, but will not change the proportion of errors up to errors down.
With social mobility, if it is low, any error will bias it upwards. (Likewise, if very high, any error will bias it down). In a world where there is no social mobility, and each person has the exact same relative income as their parents, a misidentification will necessarily decrease the correlation of people’s rank, and increase mobility. Because people can be mislinked to someone of any social status, any error in attribution must involve the X axis. In addition, the measurement variable of one generation is the treatment variable of the next. The image below, courtesy of Andrew Gelman, is an excellent illustration of this effect.
To point this out is valuable, but perhaps insufficient for a paper. His next contribution is to estimate how big a deal this is, and what that says about our present estimates of social mobility. His corrections overturn the conventional narrative around social mobility before the Civil War. There was no time in which the United States was a land of extremely high social mobility. Contemporary estimates (largely from Chetty, Hendren, and co-authors) put the correlation of rank-status between .24 and .4, depending on the metro. (A score of 1 would mean that there is a perfect correlation of rank status generation by generation, and thus no mobility). Rather than a correlation of .15 prior to the Civil War, he finds a correlation of .3, and around the turn of the century intergenerational correlation rose all the way to .5.
Mattheis’s results tie into a line of literature from Gregory Clark and Neil Cummins, who look at social mobility in England and other places. There as well, they find that social mobility is lower than we think, and surprisingly constant over time. Their main contribution, besides the data analysis, is showing that social mobility is not a stochastic process. It is not memoryless; people in further back generations still influence the outcomes of their grandchildren and great-grandchildren, though with the correlation attenuating as the connection grows more distant. Including more generations reduces actual mobility, as the children of outliers regress to their own distributions. (They come to correlations between .73 and .9 in the long run in England, as shown in “Surnames and Social Mobility, 1170-2012”). We oughtn’t be discouraged by this, though. Mobility is ultimately a measure of variance, which often correlates with aristocracy and other bad outcomes, but need not be so. Traits are passed down in families. What is far more important is that we continue to grow and make better lives for everyone, and that we use the talents which people have to their full extent.
I thought that this was a really cool, and really impressive, paper. I did not cover the method for attenuating error, which is very technical, but it is the work of a very capable person. If anyone reading this involved in hiring professors, and is looking either for economic historians or econometricians, I recommend hiring this man.