Should Papers Report Their Results?
Or should we review them solely on the basis of methods?
Recently, I read a paper. I would like to tell you all about this paper, but I must keep all of the details anonymous. Nevertheless, this is a paper from the last ten years of economics which studied an industry, onto which something does a thing; doing this thing gives us an outcome. This paper tests a different way of doing the thing, but it gives its headline result not as a change in the outcome, but as an equivalent increase in the number of things, which is a result for which we lack any real intuition. If one wants to find a precise figure for the outcome, one will have to dig through the whole paper to find it on some undisclosed page between 20 and 40. This struck me as strange – the industry is of such importance and public interest that surely you would want to trumpet just how much you can improve the outcome by altering the thing.
I asked the professor about the choice, and they told me that it was to be made only while the paper was yet unpublished. If they report the number directly, then the discussion of the paper becomes about just that number, not the methods or how they reached the result. It makes it harder to change. While it comes at a cost – journalists tend to focus on the outcome and not so much the process of reaching there – the authors believe it increases their chance of getting into a top journal, and my interlocutor agrees.
This got me thinking. Why do we report results at all? Does the quality of a paper depend at all on the results? When we referee papers for publication in a journal, should the reviewers be allowed to see the results, or should they all be redacted like it’s a classified report from the CIA?
Journals face two roles now, which pull against each other. On the one hand, a journal is there to disseminate information. It is the news. I like Armona, Gentzkow, Kamenica, and Shapiro’s definition of what the optimal rule for a benevolent information provider is – given a certain prior about the state of the world, the journal should report the information which produces the greatest change in one’s posterior beliefs, weighted by the importance of the topic on which we are changing beliefs. Thus, if you live in a place which is almost always sunny, the news should report when it will rain; and if the rate of inflation is almost always steady, it should report when it changes.
On the other hand, journals have increasingly become about quality certification, rather than about dissemination. Publishing in a top journal does matter – there are a few freaks, like me, who read through the QJE, and receiving their imprimatur helps with outsiders (though frankly, not enough! I am sick of people having no clue how wildly different journals are by quality, and citing very poor studies as peer reviewed, and thus equivalent to the best!) – but it matters less for the paper’s sake, but for the career of the professor. There, we do not care in the slightest about the results which were found, conditional upon the question which was asked. We only care about whether they have a sense for asking important questions, finding good data and using it well, and making theoretical advances – in other words, the things which will lead in expectation to finding important information. The results are irrelevant.
The rule I described above is optimal if we take the process of finding information as totally exogenous, and researchers are bound to fully and honestly report the results of their inquiries. This is, of course, not the case. If journals favor significant results, then researchers can try out different specifications of their tests until they get something which is spuriously significant. The biggest advantage of not viewing results is that that incentive is largely mooted. There is still the incentive to find a result, but this would get rid of the professional incentives.
The main problem with it, I think, is that it makes replication and checking for robustness impossible. A referee cannot recreate the results using the code provided (replication), much less test the robustness of the results by trying out different tests. Seeing the results may also give one some intuition for the model, and why it leads to the results which it does. I would also be concerned that finding topics where the priors people have are wrong is itself something of a talent, and is something which informs the quality of a researcher.
The other main concern is a practical concern, with publishing in economics the way it is now. Papers circulate first as working papers, often for years before it is eventually published. Especially at the top journals, it is likely that the reviewer has already seen or heard of it. Maintaining secrecy over papers would be impossible, if we want to have genuine scientific collaboration.
I think it is possible to strike a middle ground. When I consider a paper’s quality, I consider mainly the methods and the question. I am downright unenthusiastic about the results of the paper! (When I tweet about a paper, I sometimes have to stop writing a thread when I discover, partway through, that it is in fact a disappointment). I do think we’ve gotten better about this, although all too often null results being published is the result of them being surprising.
How I think economics avoids this is in moving away from purely reduced form results. We are not restricted to simply testing one hypothesis, seeing if there’s a significant difference, and hitting publish. Even if your results are not significant, you have still said something important about what is optimal, and learning that the current system is optimal is always surprising.

As far as peer review goes, TBH I think the whole system we have now is knida dumb. In physics, we have arxiv.org, where pretty much everyone puts their papers, often as pre-prints, or even drafts, before they're published. This is great because it makes it easy to find anything, and it provides a good home for articles that are valuable but would be difficult to publish in a journal (like notes from a conference, or some review articles).
So I think that is a good solution for the "availability" part, and I think it's important to divorce it entirely from the review part, since there are very different purposes and incentives there.
"Review" is always tricky because sub-specialties are often small enough that everyone knows everyone anyway, so any anonymity can be difficult. Making only the referees anonymous ends up with the bad incentives we have now, where you get asked to cite irrelevant papers, add unrelated trivia, and so on.
I think the right thing to do isn't to modify procedures like you're proposing, but to carefully re-engineer the review part of the system to have better incentives for high quality.
Eg, one can imagine a system where referees are anonymous when reviewing, but authors do not have to accept their advice to get published, and when published, the referees become public along with all of their comments, addressed and not addressed.
It would be very embarrassing to publicly see your name next to your ignored "please cite my papers" suggestion!
It would also be nice to see comments and questions posted under papers, though it would be hard to keep that useful, even if you only restrict commenters to previously-published scientists or something.
For methodology, it would make sense to split the paper into two independently published and reviewed parts: a pre-registered methodology part that's done at conception, and a (possibly years-later) results part. It's much harder to sneak in methodology changes that way. And if referee comments are public in both cases, you get benefit for each part independently, without mixing review-related incentives for results with good methodology.
I'm glad to see someone outside of the hard sciences understanding that results per se are not interesting. That's a key reason that there's no replication crisis in hard sciences.
Though I would say it's not quite the "methods" that are critical. They're very important, but what's more important is knowing the underlying model as well as possible, how reliable it is, what its nature is, and carefully keeping track of its assumptions and the argument's assumptions.
For example, in intro physics, you learn that the force due to friction is equal to the normal force times a material-dependent coefficient of friction 'mu'. But it's important to know that this is *fundamentally* an approximation (ie, it is *never* exactly true), there are multiple approximate parts of it (mu being a material-dependent constant, the equation being linear in N, no area dependence, etc). And it's important to know that this approximation was chosen primarily for computational convenience, and the fact that it's easy enough to verify per-situation if the approximation is good enough.
But that's very different than, eg F=ma, which is (more-or-less) definitionally true. Or the statement that energy is conserved, which is a theorem you can prove from how Newton's laws are structured (Nother's theorem) and must exactly apply to every system, even ones whose descriptions are approximate or unknown.
When I read papers outside of the hard sciences, authors rarely seem to make a clear distinction between these cases, and moreover often inappropriate assume that their model is of a different "kind" than it really is.