We shouldn’t necessarily start with a prior of no effect. Rather, we should first figure out what the most reasonable position is, and change with empirical evidence if and only if that empirical evidence is capable of finding the effects we believe exist. Some concrete examples.
In the development economics literature, there is a debate about the effectiveness of mass-deworming programs. The 2004 paper “Worms” by Edwin Miguel and Michael Kremer kicked this off by finding large and positive effects on health and later earnings. Later work cast doubt on the effectiveness, however, with a 2019 Cochrane meta-analysis finding that there was insufficient evidence to recommend giving everyone in countries with endemic worms an annual dose.
This is bizarre, though. Everyone agrees that the medicines definitely work if you do have worms, and that being infested with worms is bad for you. Testing for worms is considerably more expensive than giving somebody a pill, and there’s no complicated dosing schedule — you just take ivermectin once. The social effect should simply be the individual effect multiplied by the proportion of the population infected, and any plausible numbers in the developing world come out positively.
A very recent paper from the Center for Global Development finds that the null results are simply because the studies weren’t large enough to find the effect sizes posited. Including more analyses pushes the effects over the threshold for significance. If we start from the prior that nothing does anything, and need evidence to shift, we would not have been deworming. Many things aren’t particularly testable, and so we have to reason about what is best in the absence of even the possibility of a dispositive test. Why should we stop doing that when such tests are available?
How about another example, this time more controversial. Starting in the late 1990s, crime began to drop, and with the exception of post-2020 spike in violence, has been dropping since. In 2001, Donahue and Levitt proposed that the legalization of abortion was responsible. 28 years prior, the Supreme Court legalized abortion across the nation. The people getting abortions are, almost by definition, people who don’t want the baby, and if those babies would be more likely to grow up and commit a crime, then removing them should reduce the average level of criminality. To support this, Levitt and Donohue used a difference-in-differences approach, where they compared the difference in trends between states which had legal abortion before 1973, and those that didn’t. Places which always had abortion should see no difference in trend years after abortion is legalized, while places which did not have legal abortion should see it diverge from trend.
The empirical evidence in favor was always going to be incredibly shaky. Difference-in-differences (or DiD) is the worst of estimators. You have to assume that, but for the event occuring, the places would have had identical trends. This is an enormous assumption! Imagine that having legal abortion before 1973 is due to liberal politics, which correlates with having policies that poor people prefer. If people moving because of this changes over time, then you would think legalizing abortion reduces crime, even though it’s simply people with a higher propensity to crime moving. You can easily construct other examples of spurious causation, especially since you are searching backwards and can simply not report on the times when you tried to find something and found no effect.
The thing they are trying to measure is ill-suited for DiD. It’s at its best when the treatment and effect is instant, or at the very least you know when effects should kick in. Here the uptake is fuzzy, and we don’t know when we should expect the trends to start diverging – anywhere between 14 and 25 years after, I suppose! People shouldn’t be able to cross over from treated to untreated, so people crossing the border to get an abortion before it is legal will bias your results, although it will bias it towards no trend. People are in general able to move between states, and may move because of the abortion policies. What’s more, the initial Donahue and Levitt results were largely not even real – they were driven by a spreadsheet error. Correcting this cut their estimate in half.
But to some degree, the arguing over the evidence from the econometric estimator is missing the point. Finding a null should reduce your credence in abortion reducing crime, but it should not eliminate it. It is not possible to be both skeptical of difference-in-difference estimators, and take finding a null as evidence that there is no effect. This works only if one starts from the prior that there is no effect, but one could start this only by ignoring sound reasoning as to what we should expect to find. Donahue and Levitt make predictions in their 2001 paper, which were vindicated by the passage of time. In their re-analysis from 2019, they found that their prediction of a steady state 20 years later, and a continued decline of 1% per year until then, were right.
I suppose this is me coming out as a Bayesian, not a frequentist. The point of theory is not just to suggest possible tests of a hypothesis, but to tell us what our prior should be. If we test it, and don’t find significant results, that should make us shift our views – not totally change them.