Berry-Hausman-Pakes Should Win the Nobel Prize
A prize for revolutionizing industrial organization
I am, unapologetically, a fan of economics in the way others are a fan of baseball. They have arguments over who should be inducted into the Hall of Fame; I can therefore argue over who should win the Nobel Prize each year. My preference, and also my prediction, for the 2025 Nobel Prize in Economics is that Steven Berry, Jerry Hausman, and Ariel Pakes win it. Berry and Pakes will win it for revolutionizing industrial organization, and Hausman will get it for making the tools with Berry and Pakes built upon.
This essay will be divided into five sections, with the first three focusing on each winner individually. The fourth section will discuss the other contenders, in particular the work of Tim Bresnahan. The fifth will speculate about the future. This is the longest article I have written, and will cover technical matters. I am keeping the equations to a minimum, focusing on the intuition behind why they made the choices they did, but this will likely take a while to read.
i. Ariel Pakes
The case for Pakes (pronounced Pay-ks) is strongest. If I could pick only one, I would pick him. It is rare that somebody will have a paper which is profoundly influential on how economists approach an entire literature; Pakes has written three. He wrote Berry-Levinsohn-Pakes (1995), which is the foundational paper for estimating demand curves; Olley-Pakes (1996), which is the foundational paper for estimating production functions; and Ericson-Pakes (1994), which is a foundational paper for dynamic games. Let’s start with Olley-Pakes (1996), which uses their method of production function estimation to see the effects the break-up of AT&T had on productivity.
Firms combine various types of inputs to produce an output. The equation which describes how these combine is called a production function. A simple production function is one where output is a function of fixed and variable inputs, such as a factory and labor, respectively. Mathematically, that’s output (Y) is a f(K,L), where K is capital and L is labor. When we see output increase, it could be due to more inputs, or we could be using the inputs more efficiently. We need to estimate the production function in order to say which it is. In the simple little model I laid out, we can find this by seeing in the data how more capital versus more labor effects output with a regression.
However, we cannot just regress inputs on outputs in practice. Firms differ in their productivity, which they can observe but the econometrician cannot. More productive firms will invest more, which leads us to overstate the effect of investments on output. At the same time, firms which have made larger investments in the past will stick around for longer, even when they get unusually bad productivity draws. This will lead us to underestimate the effect of investments on output, and unfortunately these cannot be expected to simply exactly balance out. We need to deal with this, and that’s what OP gave us the tools to do.
Olley and Pakes have output be a function of fixed and variable inputs, with an unobserved productivity draw omega (which, when lower case looks like a “w”, so I will be using that). Y=f(K, V, w). w is assumed to be a scalar, so it doesn’t change the optimal ratio of fixed and variable inputs, and is also a first-order Markov variable. (That sounds scary, but it really isn’t. It just means that whether future productivity goes up or down is unknown, but it will be similar to what your prior productivity was like). In each period, firms observe their productivity, determine whether to exit or not, and then determine how much to invest. The amount that firms invest is strictly increasing in their productivity draw. As a consequence, we can invert the equation, and solve for past productivity draws using observed investments. With the observed productivity draws, we can then fill in the rest, and we have a production function.
They find that productivity substantially increased after AT&T was no longer able to bar competitors. Olley-Pakes introduced a handy way of expressing how this happened. It could be the case that the average productivity of firms rose, or it could be that the more productive firms took a larger market share. You can capture this with a term for the average productivity (unweighted), and then the covariance of productivity and market share. The breakup of AT&T didn’t change average productivity all that much, but it did substantially reallocate business from unproductive to productive firms.
The Olley-Pakes framework would be improved upon with Levinsohn-Petrin (2003), and Ackerberg-Caves-Frazer (2015), but these are still recognizably minor extensions. It would be incredibly widely used, and has been cited over 10,000 times. The most important use is in estimating markups, which is the difference between price and marginal cost. Provided you have linear pricing, the markup is a sufficient statistic for misallocation in the economy. I am going to gloss over the math, but cost minimization implies that the markup is equal to the elasticity of revenue with respect to a variable input, divided by the revenue paid to that variable input. In other words, if you increase the amount paid to a variable input such as labor by 10%, and output increases by 15%, the markup is equal to 1.5, or 50% over marginal cost. The math is explained with extreme clarity in section II of De Loecker-Eeckhout-Unger (2020), although I caution the reader against believing their empirical results without qualifications.
Pakes won the Frisch medal, given to the best paper published in Econometrica (and awarded every two years) for his paper on patents as options. Patent offices wish to discourage people filing for frivolous patents they have no intention of actually pursuing. For that reason, many require that the patentor pay an annual fee to renew the rights to the patent. This means that there is an implied optimal stopping problem, and solving this gives us a value for patents.
His model assumes that each year, the owner of the patent performs “experiments”, which can alternately provide no information, make it clear that it will never be worthwhile, or discover a way to increase returns in the future. This last point means that patent holders do not simply not pay for every patent whose current returns do not exceed the patent fee. This information is a Markov variable, and so is only dependent on the profitability in the period before, plus the parameters defining how likely each type of information is. If you had these values, it would be simple to say what percentage will drop out; thus observing the distribution of people dropping out can find you the parameters. Much like Olley-Pakes, an assumption of monotonicity in the cost of renewing a patent allows us to invert the equation.
The procedure to estimate it is essentially magic to me, and I don’t think I can very much comment on it. You can’t solve the model exactly, so you simulate it lots of times to find the closest match. We could discuss the results, which show that the value of patents are extremely skewed, but they’re almost besides the point. What makes it beautiful and wonderful are the methods, and the problem-solving to overcome the constraints of data. If you wanted, you could get the values today. All of the stuff discussed in this essay is, to my knowledge, used by practicing economists in corporations and in the FTC.
Pakes is also behind (with Richard Ericson) the model we use to study firms when the decisions they make affect the terms of the game. Game theory is perfectly simple to solve when it is static. You have some map of payoffs conditional on other’s actions, and you go to the end of the game and solve by backward induction. When players actions affect the payoffs faced by themselves and by others, though, it quickly descends into a morass of computationally infeasible conditional values.
As with Pakes (1986), firms explore the world and decide whether to enter or exit. Unlike the patents paper, these entry and exit decisions are explicitly affected by the entry and exit of other firms. There is a state space of possible actions, with each firm's productivity being public information, while their entry costs and scrap values are private. The industry state follows a Markov chain as before. It’s hard to estimate because state space increases exponentially in the number of firms. To make it work, we say that firms are symmetric and anonymous, which just means that firms with the same characteristics are treated exactly the same. Firms optimize based only on the current state vector and their private shocks, which also simplifies things greatly. He has several papers on actually estimating this, with Pakes and McGuire (1994) and Pakes, Ostrovsky, and Berry (2007).
Difficult though it is to estimate, many questions do in fact need the repeated game structure. For example, suppose you wish to study “The Costs of Environmental Regulation in a Concentrated Industry”, as Stephen P. Ryan did. Producing cement requires an extraordinary amount of fuel to heat cement kilns, and the production of cement is responsible for 2.5% of human carbon emissions around the world. For that reason, we in the United States have substantially regulated how much firms can emit, such as through the 1990 Clean Air Acts amendments. They must get a permit after being inspected, which costs around $5 million. Skipping over the details of estimation, if you ignored the endogenous effect on entry of firms, you would underestimate the costs to the consumer, and get the wrong sign on the cost to firms. Rather than reduce producer profits, it would increase them, as the entry of future competitors is stifled.
ii. Steven Berry
Steven Berry is most famous for his work approaching markets through demand, rather than supply. We would like to sketch out what consumer demand is for a product. You have doubtless seen a supply and demand curve before – knowing its slope and shape is of utmost importance for answering questions from the effect of taxes, to the extent of market power in an economy, to the effect of mandate to purchase health insurance. The real world is not as convenient as our most simplified expositions. Products vary substantially in their characteristics, and consumers have heterogenous preferences. Berry contributed a way of dealing with all of this.
In discussing demand estimation, I will be frequently referring to his handbook chapter with Phil Haile, which I think is the best source on explaining this. Multiple professors have described Berry-Levinsohn-Pakes (1995) as something “nobody understands from reading the paper”, so you are best served learning it from the textbook. You can also turn to Aviv Nevo’s “Practitioner’s Guide” if you just can’t get enough.
First things first, in the BLP framework products are not products. They are bundles of characteristics, which exist in an arbitrarily dimensioned space. Thinking of products like this allows us meaningful claims about how consumers will substitute between goods, and respond to the introduction of new products. We also allow consumers to vary in their demand for particular characteristics, with them following a normal distribution around the measured mean demand.
What we would think to turn to is to estimate consumer demand with an instrumental variable. Suppose you want to estimate consumer demand for corn. You might argue that rainfall affects the supply of corn, but not its demand; thus you can use that to infer the demand curve. To estimate demand for all goods in the economy, we would simply have separate shifters for each good J, which is challenging but conceivable.
There are a few problems with this. Both price and quantity are endogenous. You actually need separate shifters for both price and quantity. (See section 2.4). Right away we’re doubling the number of instruments needed. More importantly, demand for a good depends not only on its own price, but on the price and characteristics of *all other goods* in the economy. Now you need not 2J instruments, but J squared. This will not do. This builds upon Berry and Haile (2014).
Instead, we’re going to have a discrete choice model. Each consumer chooses one (and only one) of the options available, although goods are defined broadly as to include every combination of goods and an outside good of purchasing nothing. Consumers have random utility drawn from some distribution over products and the characteristics of the products. The consumer knows their utility, while we do not. (3.1)
The standard model is given in (3.2). The utility of a consumer (on the left hand side) is equal to the randomly drawn coefficient beta times the characteristics of the products, minus the price of the product times another randomly drawn coefficient alpha, plus unobserved characteristics xi plus the error term epsilon. The advantage of random coefficients is that if you get rid of them, the elasticity with respect to price depends only on its quantity. Thus you would see very dissimilar products have identical cross-elasticities if the market shares are the same.
You still do need instruments, and these are often a bit shaky. (4.2) In Berry-Levinsohn-Pakes (1995), they use the characteristics of competing goods as demand shifters, which has never particularly convinced me or anyone else as actually exogenous. You could use arguably exogenous shifters of costs, like taxes or materials costs, and assumptions on pass-through. You can use Hausman instruments, which are the price of the same good in a different market. You can exploit the tendency for goods to be priced the same in different cities. You can use details on the consumer demographics of the different markets.
In the model, each product adds one dimension. If two products with seemingly identical characteristics exist, then they must presumably have different unobserved characteristics, which is picked up in xi. Berry and Pakes (2007) explores what happens when you drop xi. In particular, the gains from new products do not grow without bound, and you don’t need to treat the characteristics of other products as fixed, as BLP does.
The model does not have a clean solution. Basically you just try out parameter values, and see how close they are to the data, and then try again. Finding an exact solution is computationally infeasible, so they use a nested fixed-point algorithm. This is a bit of a black box to me, but it follows from Pakes (1986), which we covered earlier. One can see a sort of division of labor here – Berry first sketched out the method to be applied in his 1994 paper, and Pakes comes in to do the estimation. To get a full answer on various counterfactuals, they assume Bertrand competition (where firms compete on price) and solve. (4.3.6)
The framework here is actually quite flexible in what it can be applied to. In work that Berry has been involved with, it has been applied to the choice of candidates in an election, where they found that the electoral map substantially favors Republicans, and that raising the cost to vote would also benefit Republicans; and in estimating labor market power using data on worker choices (rather than the conventional estimate the production function first method). You do need fairly detailed information on what you’re studying, which means that most studies using the framework tend to be on narrowly defined industries with easily definable characteristics. BLP (1995) looked at the auto market, Nevo (2001) looked at the ready-to-eat cereal, Syverson (2004) is off using ready-mixed cement, and so on. Olley-Pakes has the advantage of needing less data, but is dependent upon the things you clump together actually sharing a production function.
Berry-Levinsohn-Pakes follow up their 1995 paper with a 2004 paper on what you can do if you get more data. In this case, they got the proprietary data which General Motors collects on its consumers, the most important variable being “if you did not purchase this car, what car would you purchase?”. Knowing the second choice and its characteristics allows them to pin down much more of the variance in preferences. BLP (1995) requires only data on market shares, without micro data. We had assumed that there was a normal distribution around average consumer preferences. See also (6.3) in the handbook.
The structural approach replaced the previously dominant “structure-conduct-performance” paradigm. Berry, with Martin Gaynor and Fiona Scott Morton, has a JEP piece explaining why. The SCP paradigm was basically just trying to run regressions. On one side, you have output, prices, or markups as your dependent variables. On the other, you have market concentration, augmented with a number of controls for time, industry, and so on.
The troubles start with measurement. You don’t see markups in the data. In fact you barely see quantities, and since revenue is dependent both upon output and the markup, you might not be able to draw any conclusions at all. Concentration is also dependent upon defining the industry, which is often meaningless. It’s also not really possible to say which variables are exogenous. More importantly, though, the observed correlation between concentration and prices is dependent upon multiple primitives. These primitives do not map onto a monotonic increase in price or markups or quantities. The welfare implications are determined by the primitives, and nothing else.
Structural demand estimation is the principal contribution of Berry, but there are other cool papers. He won the Frisch medal for his 1992 paper on airline entry. I particularly liked his paper “Do Mergers Increase Product Variety?” with Joel Waldfogel, on the local radio industry. You might intuitively think that increasing market concentration reduces variety, but in fact it could go the opposite way. Suppose there are N firms in the market, each of which produces 1 variety of programming. Listeners have preferences over the characteristics of the radio shows (think of these as points in k-dimensional space), and go listen to the radio station which is close enough to their point. If a firm moves in space, or changes its characteristics, it takes away some customers from another firm. Now suppose that there are a very limited number of firms, but each produces multiple varieties of programs. Now when it moves around in product-space, some of the customers it takes are its own. Of course, they could just close the varieties, and offer fewer. It’s dependent upon the parameters.
To estimate it, they use data on the content of US radio stations from 1993-1997, and a natural experiment provided by the Telecommunications Act of 1996 which expanded the ability to merge, conditional upon the number of stations in the market. Annoyingly, the FCC did not provide a list of markets for which the different rules applied, leaving Berry and Waldfogel to guess at how they would apply the standards, but aside from that practical difficulty using it as an instrumental variable is straightforward. The mergers increased concentration, but did not reduce product variety.
iii. Jerry Hausman
Hausman’s work is incredibly wide-ranging. I can cover only a fraction of it. It ranges from estimating the value of NBA superstars, to valuing new goods, to testing the exogeneity of instrumental variables, to correcting attrition in panel data. He was not just a pure theorist, but was continually motivated by specific empirical applications. He seems to have a paper on everything. I have not met him, but there is universal agreement on two counts: he is a genius, and he talks really fast. The work most similar to Berry and Pakes is on the valuation of new goods, and its implication for making a consumer price index. He also had some work developing the sort of logit models which BLP will use for their demand estimation. In particular, the BLP framework is an extension of Hausman and Wise (1978).
A price index is not just measuring the prices of goods in a market. We also need to weigh each of them by how much consumers purchase. Complicating things further, the goods which people buy are constantly changing, as new products enter and old products leave. What we’re measuring is the cost over time of buying a basket of goods which yields some amount of utility. Hausman expended an enormous amount of energy convincing the Bureau of Labor Statistics that the way they included new goods was misstating inflation. In particular, an unobserved increase in quality will lead us to believe that there was more inflation than actually occurred.
The BLS can’t directly observe quality, so they need to make some simplifying assumptions which need not work. For example, as Hausman and Leibtag (2004) point out, when Walmart entered a new market, the BLS assumed that the food which they sold, adjusted for quality, was the same price as the places they replaced. This is the case even if the goods were literally identical at the two establishments. Their solutions to the problem tie into another long running hobby horse of Hausman, which is that the BLS needs to make use of new and better data sources in calculating the Consumer Price Index.
Hausman proposes a method for estimating the gains from new products in what is colloquially called his “Apple-Cinnamon Cheerios” paper. The basic theory is long-settled under perfect competition. For each good, you estimate a price at which demand would be zero, called the shadow price. A good being invented is then equivalent to the price sliding down, and consumer surplus equal to the area under the curve.
Estimating the effect of introducing a new brand under imperfect competition, however, is not trivial, as discussed in the Berry-Waldfogel paper. It is entirely possible for it to lower welfare by allowing companies that produce other brands to raise their prices. Intuitively, introducing a new brand pulls away the customers who only care a bit about the brand of cereal which they were buying before. The customers who remain are the ones who really care about the brand, and so the oligopolist can sell a higher markup.
To estimate the welfare gains from Apple Cinnamon Cheerios, he attacks the problem in three levels. First, he has the demand for cereal as a whole. Then he segments the demand system into different types, like adult or child, and estimates demand for specific brands conditional upon the segment with instrumental variables. This nesting of demands is a very common artifice. Most examples of demand systems where the paper assumes a constant elasticity of demand will nest it into arbitrary segments, such that like goods only substitute with like (and like categories of goods substitute with like categories of goods, and so on). The instrumental variables are using local changes in demand due to advertising campaigns, which is assuming that there are no national changes in demand. Putting it all together, he estimates the introduction of Apple Cinnamon Cheerios under imperfect competition increased consumer surplus by $66 million per year.
Related to constant utility price indices, he contrasts two types of consumer surplus, Marshallian and Hicksian, in “Exact Consumer’s Surplus and Deadweight Loss” (1981). The first is likely the one you’re most familiar with – sketch out a demand curve, move the supply curve, shade in the area under the line. The second accounts for the fact that a change in the price of your basket also changes your effective income. Rather than combining the income effect and the substitution effect, you obtain only the latter by holding the utility you are purchasing constant. Why does this matter? Because if you include income effect, you’re no longer just including the loss of efficiency, which in the case of, say, a Giffen good, can completely flip the sign on a tax. Hausman derives formulas for expressing the Hicksian demand curve as a function of observed data, and all you need are regressions. These are restricted to linear demand curves, but that’s alright.
A possibility he considers, but rejects, for valuing new goods is to simply ask people how much they are willing to pay for them, in what is called “contingent valuation”. Hausman and Diamond (1994) and Hausman (2012) are quite entertaining survey articles on the subject, lambasting it as essentially nonsense. I wrote about this before in “Are Preferences Without Prices Meaningful?”.
Like Pakes, Hausman also worked on computational methods for optimal stopping problems in “Estimation and Inference in Nonlinear Structural Models” (1974). Now, his method is different, as Pakes is using simulated parameters while BHHH can get an analytic solution, but it’s in the same vein.
Hausman had the first work evaluating people’s discount rates from their purchases of durable goods. People often face a tradeoff between pricier goods which will reduce their expenditures over time, versus cheaper goods which will be more costly to pay for over time. For instance, automobiles with greater fuel efficiency will be (all else equal!!) more expensive. Hausman considers this in the context of air conditioners. He runs a hedonic regression, which is figuring out how much an increase in a given characteristic of a product increases the price by decomposing each product into a sum of characteristics. He then calculates demand from how many hot days a person will experience, calculates the rate at which air conditioners break down from an (admittedly small) sample, and then solves for the only thing remaining, the individual discount rate. He estimates that consumers discount future consumption at 20% per year. This is very large, and suggests that agents are myopic about future consumption, providing a case for fuel efficiency standards. The subsequent work he subsequently inspired is surveyed on page 23 in “Measuring Time Preferences”. (As an aside, you can find my thoughts on what I think the “real” rate of pure time preference is in this article). His work on time preferences builds upon Hausman and Wise (1978), which is also the basis for Berry-Levinsohn-Pakes.
His most cited paper, by far, is “Specification Tests in Econometrics”. Now, citations for econometric papers don’t map quite onto citation counts for normal articles, and are quite inflated relative to contribution. We converge on a consensus paper to cite for everyone using a method. Still a cool paper, though.
Suppose we want to run a regression, where Y = Xb + e. The expectation of e should be 0, and the variance should be homoskedastic and uncorrelated. The first part means that it should not systematically miss one way or the other, and the second means that the amount that the observed data varies around the line of best fit should be constant throughout. The first breaking means that your estimate of the effect of X on Y will be wrong, because the beta b is wrong. The other means that your standard errors will be wrong. Ordinary least squares regression (which is drawing the line which minimizes the squared distance vertically to each point of data) requires that your data be homoskedastic, otherwise your standard errors will be wrong. The Hausman test allows you to see if your more powerful methods can be used.
A practical application of this is in dealing with panel data. There are some things which you can’t observe about your participants. How do you deal with it? The two methods are fixed effects and random effects. Fixed effects means you only compare change for an individual. Everyone gets their own intercept, and you only compare self to self. This means you can only compare things over time, which reduces your number of observations. Random effects assumes that your sample is an unbiased draw from a population, which allows you to treat within period variation and between period variation as a source of an effect.
Let’s say you’re measuring education. Fixed effects means you look at the change in income over time for each individual, making some argument about education being randomly assigned. Random effects allow you to also consider the cross section of people, with the key assumption being that education is uncorrelated with ability. Obviously I think that isn’t true, but the Hausman test will show that they differ and you need to use fixed effects.
Hausman contributed to methods for dealing with attrition in panel data. Despite the best efforts of an experimenter, some of your subjects are going to drop out. If we are using fixed effects, each individual is their own best control, so the power of our experiment is reduced. We can’t very well compare the group before to the group after. Still worse, attrition might be correlated with the effects of the treatment. What can we do about this?
“Attrition Bias in Experimental and Panel Data”, (1979) with David Wise, is Hausman’s paper to win the Frisch medal. In the late 1970s, a number of places in the United States experimented with negative income taxes (NIT) under the auspices of the Department of Health, Education, and Welfare. One such place was Gary, Indiana. There, they guaranteed a minimum income of around $4,000, with subsequent labor earnings being deducted out of the amount paid. The authors want to find the effect on income and labor supply, accounting for the 35% of black male study participants who dropped out. Their approach is to explicitly model the decision to drop out. You end up doing essentially a two-stage least squares, which is the approach to instrumental variable estimation. The observed characteristics are assumed to follow a normal distribution, and you can fill in the folks who dropped out.
Relatedly, the NIT experiments necessarily only include a small part of the population. Hausman and Wise (1977) use the data collected in the New Jersey negative income tax experiment to assess the impact of education and intelligence on pay, even among low-income people. The fundamental challenge to overcome is that while there is a random sample of the people eligible, eligibility is determined by something we’re trying to use as an endogenous variable. Take the effects of education on income, as shown in the figure below. Since we miss the high education observations with higher incomes, we understate the effect of education on income.
The solution, yet again, is to exploit normality assumptions and use a maximum likelihood estimator. If you assume homoskedasticity, you can fill in the unobserved curve. This is rather similar in spirit to the Tobit, which corrects for when observations are heaped up at some point. (For example, if people are predicting the likelihood of an event from 0 to 100, and the actual probability is quite close to 100. Even if the guesses were right on average, the people who miss high can’t go above 100, biasing the results). Hausman and Wise extend it to simultaneous equations in order to find actual productivity, as total wages are affected both by hours and productivity.
Hausman has other papers on negative income taxes and panel data. Burtless and Hausman (1978) deals with taxes and subsidies making budget sets both nonlinear and nonconvex. Putting it less technically, the standard model would assume that hours of work is determined by a wage rate per hour which is unaffected by how many hours of labor are supplied. In the NIT experiments, effective tax rates would kick in only at discrete points. It’s a predecessor of the bunching literature, which uses these discontinuities to estimate the effect of income on labor supply. Intuitively, if people’s preferences follow a distribution, then you can measure the distortion away from that distribution around the point where taxes changed. There must be some people who would have worked a little bit more at the old tax rate, but chose to stop earlier at the new, higher tax rates. Now, they don’t have the bunching in the data – but that’s almost entirely beside the point! This is about methods!
In non pareils, he and Gregory Leonard estimated the value of Michael Jordan to the NBA. This might surprise you, but he was, in fact, pretty valuable. Just the local average treatment effects, comparing television ratings with and without Jordan, plus some effects on merchandise, implied a value of $53 million a year to other teams. The general equilibrium effects were likely larger.
Something I rather liked about his papers, especially the ones in Econometrica, was how short they were. They do not waste words. I rather miss this style, although the empirical turn of economics does rightfully necessitate longer papers. I am aware that I have likely missed some of the econometrics papers, but we need to get a move on.
iv. James Levinsohn, Tim Bresnahan, and the rest
Unfortunately, every positive judgement of status has an implicit negative judgement. I would like to address some of the arguments for other possible candidates. The easiest is not including James Levinsohn, of Berry-Levinsohn-Pakes fame, in the list of laureates. I think that BLP is essentially Berry’s paper, that BLP 2004 is a simple extension of BLP 1995, and Levinsohn’s only notable publication otherwise is a somewhat trivial extension of Olley-Pakes to include intermediate inputs instead of just investment. A good career, but simply not good enough.
The other person most commonly mentioned is Tim Bresnahan, currently of Stanford University. I think his work is excellent. However, I can pick only three. His two most important papers are “The Solution Concept for Oligopolies is Identified” (1982) and “Competition and Collusion in the American Automobile Industry” (1987). The two lead into each other, so we will discuss in sequence.
In the first, the way in which firms in oligopoly compete is actually extremely important for predicting price and quantity. Whether producers choose first the quantity to produce, and then price (Cournot competition), or first choose price and then quantity (Bertrand competition) leads to totally different outcomes. We would like to boil down the nature of competition to a single parameter. Take quantity demanded to be a function of price, an instrumental variable Y, and parameters to be estimated. On the supply side, price under perfect competition is a function of the quantity produced, an instrumental variable W, and various parameters to be estimated. We can model imperfect competition by adding in a function where the demand side parameters are multiplied by the coefficient lambda, because the demand side parameters are what affects marginal revenue. If lambda is 0, we’re back at perfect competition; if 1, it’s a monopoly; if somewhere in between, it’s a particular mode of competition (with Cournot, for instance, putting lambda to 1/n, where n is the number of firms). The question before us now is to identify lambda.
The central difficulty is that without marginal cost data, the line we could be tracing out with the instrumental variable W is either the demand curve or the marginal revenue curve. As we shift the demand curve with Y, marginal revenue and demand retain the same slope, so we can’t distinguish them.
To distinguish the two, we need to change the slope of demand with another instrumental variable Z. This might be best seen graphically, as below. We’re rotating around the point E1. If the market is competitive, nothing changes. Demand still crosses marginal cost at the same point. If it is not, though, marginal cost under oligopoly (MCm) crosses marginal revenue at a different quantity. Put intuitively, Z is a substitute good, which makes people more inclined to substitute to other goods and thus makes demand more elastic, while Y is income.
His 1987 paper on the auto industry tries to find an explanation for why in 1955, quantity produced was 45% greater than in 1954 and 1956, with the quality adjusted price being lower. His hypothesis was that the mode of competition changed, which is consistent with historical accounts of a price war during that period. His Z here is the similarity of goods to each other, which alter the slope under competition but not under collusion. He shows that models which were more similar to each other had bigger price drops than the ones far apart.
Bresnahan was also the first to point out the meaninglessness of concentration on price regressions, as discussed in the section on Steve Berry, in his 1989 handbook chapter on industrial organization.
Bresnahan hated the Apple Cinnamon Cheerios paper of Hausman, a view he expressed with considerable vigor. My selection of Hausman should not be taken as my view on who won the Apple-Cinnamon-Cheerio War, which I honestly think was Bresnahan. The paper assumed that local demand shocks were the only thing going on, but any national demand shocks would bias the slope of the demand curve upward, leading us to overestimate the effect of a new good. Bresnahan’s results do pass the smell test, as it is with difficulty that I believe that Apple Cinnamon Cheerios contributed $76 million in 1990 dollars, although I distrust my intuition somewhat with regard to consumer surplus.
v. Who Next?
While I’m here, I’d like to state some desires, and make some predictions, about the next few years of Nobel Prizes. First and foremost, Sam Kortum and Marc Melitz need to win for their work on trade. It would have been Jonathan Eaton too, if it weren’t for his unfortunate passing last year. Eaton-Kortum (2002) and Melitz (2003) are the two pillars upon which modern trade theory builds upon.After them, I would like to see Nicholas Bloom, John van Reenen, and Chad Jones, but I expect it will be too early for them. Instead, it’s gonna be Aghion-Howitt.
My longlist – and this is a *very* long list, down to current grad students – of people who I feel are or will be contenders is as follows: Amy Finkelstein, Dave Donaldson, Samuel Kortum, Raj Chetty, Matthew Gentzkow, Jesse Shapiro, Isaiah Andrews, Emi Nakamura, Jon Steinsson, Pete Klenow, Ufuk Akcigit, Jacob Moscona, Adrien Auclert, Matthew Rognlie, Ludwig Straub, Greg Kaplan, Stephen Morris, Drew Fudenberg, Rebecca Diamond, Michael Kremer (second time), Arnaud Costinot, Treb Allen, Costas Arkolakis, Stephen Redding, Susan Athey, Victor Chernozhukov, Sendhil Mullainathan, David Baqaee, Kunal Sangani, Ivan Werning, Emmanuel Farhi (after the Singularity), Xavier Gabaix, Bob Hall, Robert Barro, Elhanan Helpman, Takuo Sugaya (I’ve been assured by a friend), Emmanuel Saez, Stefanie Stantcheva, Ed Glaeser, Larry Katz, Chad Jones, Jonathan Hall, Nobuhiro Kiyotaki, John Moore, Daniel Luo, Petra Moser, Ariel Rubinstein, Arnold Harberger, Nancy Stokey, Peter Hull, Xavier Jaravel, Chad Syverson, Aidan Toner-Rodgers, John List, Tishara Garg, Frank Yang, Dev Patel, Matthew O Jackson, Raffaella Sadun, Nicholas Bloom, John van Reenen, Mert Demirer, Richard Schmalensee, Karthik Sastry, Joel Mokyr, Jon Dingel, Suresh Naidu, Ernest Liu, Michael Woodford, Joel Flynn, Pascaline Dupas, John Grigsby, Phillippe Aghion, Steve Howitt, Gene Grossman, Avinash Dixit, John Rust, Edward Miguel, Oleg Itskhoki, Pascual Restrepo, Philipp Strack, Jidong Zhou, and Ezra Oberfield.
And of course, omissions should be taken as a deliberate slight :) (but in all seriousness, ask in the comments). I claim no objective standard, and it is merely a matter of taste.
So, in conclusion, my predictions are:
2025: Steve Berry, Jerry Hausman, Ariel Pakes
2026: Sam Kortum, Marc Melitz, (Elhanan Helpman?)
2027: Phillippe Aghion, Peter Howitt
2028: Stephen Morris, Drew Fudenberg
2029: Nicholas Bloom, John van Reenen
2030: Raj Chetty
I add, as a bit of housekeeping, that this is the fourth article which is an in depth profile of the work of some of my heroes. Previously, I covered Duflo, Banerjee, and Kremer; Paul Krugman; and Raj Chetty.



I have absolutely no idea who any of these people are. My extremely limited understanding of economics means I can only understand the executive-level summaries you provide of each of their work, and barely understand it at that. (The fact that I’m skimming this at 4:45 AM while torn between insomnia and dread of work is probably not helping.)
But I love to see someone who has very strong, very studied, very sourced opinions on a niche topic like this, and it being an important subject like economics makes it all the better. I loved reading this.
I enjoyed reading this essay. I have (or used to have) quite a bit of expertise in estimating BLP95 kind of models. I knew the other papers, but I never saw them explained as a whole.
I agree with you that there is no BLP95 without Berry94. Berry94 seems like such a simple idea once you read it (but not before!). One of my favourite papers in empirical IO.
I also find it peculiar that BLP95 was incredibly influential but the literature only took off once Nevo came and applied it to a completely different industry (rte cereal instead of cars). Everyone just got it. I also remember Nevo had some matlab code in his website showing people how to estimate this stuff. This is also a reminder how it is much easier to understand econometrics from code than from the theoretical math exposition.