Peer review is ostensibly one of the central pillars of modern science. A paper is not taken seriously by other scientists unless it is published in a “peer reviewed” journal. Jobs, grants and tenure are parceled out, in no small part, on the basis of lists of “peer reviewed” papers. The public has been trained to accept as established truth any science that has gone through the gauntlet of “peer review”. And any attempt to upend, reform or even tinker with it is regarded as an apostasy.
But the truth is that peer review as practiced in the 21st century biomedical research poisons science. It is conservative, cumbersome, capricious and intrusive. It slows down the communication of new ideas and discoveries, while failing to accomplish most of what it purports to do. And, worst of all, the mythical veneer of peer review has created the perception that a handful of journals stand as gatekeepers of success in science, ceding undue power to them, and thereby stifling innovation in scientific communication.
This has to stop. In honor of Open Access Week, I am going to lay out what is wrong with peer review, how its persistence in its current form harms science, scientists and the public, and how we can restructure peer review to everyone’s benefit. [These ideas have emerged from over a decades worth of conspiring on this topic with Pat Brown, as well as myriad discussions with Harold Varmus, David Lipman, Vitek Tracz, my brother Jonathan, Gerry Rubin, Sean Eddy, other board members and staff at PLoS, and various and sundry people at meeting bars].
Peer review and its problems
To understand what’s wrong with peer review, you have to understand at least the basics of how it works. When a scientist has a result they want to share with their colleagues they write a paper and submit it to one of nearly 10,000 biomedical research journals.
The choice of journal is governed by many factors, but most scientists try to get their papers into the highest profile journal that covers their field and will accept it. Authors with the highest aspirations for their work send it to one of the wide circulation general science journals Science and Nature, or to a handful of high impact field-specific journals. In my field, molecular genetics/genomics, this would be Cell and PLoS Biology (a journal we started in 2003 to provide an open access alterative to these other three). In more clinical fields this would be something like the New England Journal of Medicine. [I want to make it clear that I am not endorsing these choices, just describing what people do].
When any of these top-tier journals receive a paper, it is evaluated by a professional editor (usually a Ph.D. scientist) who makes an initial judgment as to its suitability for their journal. They’re not trying to determine if the paper is technically sound – they are trying to figure out if the work described represents a sufficiently significant advance to warrant one of the coveted spots in their journal. If they think it might, they send the paper to 3 or 4 scientists – usually, but not always lab heads – who are knowledgeable about the subject at hand, and ask them to read and comment on the manuscript.
The reviewers are asked to comment on several things:
- The technical merits of the paper: are the methods sounds, the experiments reproducible, the data believable, the proper controls included, the conclusions justified – that is, is it a valid work of science.
- The presentation: is the writing understandable, are the figures clear, is relevant earlier work properly cited.
- Are the results and conclusions of the paper sufficiently important for the journal for which it is being reviewed.
For most journals, the reviewers address these questions in a freeform review, which they send to the editor, who weighs their various comments to arrive at a decision. Reviews come in essentially three flavors: Outright acceptance (rare), outright rejection (common for high tier journals), and rejection with the option to address the reviewers’ objections and resubmit. Often the editors and reviewers demand a series of additional experiments that might lead them to accept an otherwise unacceptable paper. Papers that are rejected have to go through the process over again at another journal.
There are too many things that are wrong with this process, but I want to focus on two here:
1) The process takes a really long time. In my experience, the first round of reviews rarely takes less than a month, and often take a lot longer, with papers sitting on reviewers’ desks the primary rate-limiting step. But even more time consuming is what happens after the initial round of review, when papers have to be rewritten, often with new data collected and analyses done. For typical papers from my lab it takes 6 to 9 months from initial submission to publication.
The scientific enterprise is all about building on the results of others – but this can’t be done if the results of others are languishing in the hands of reviewers, or suffering through multiple rounds of peer review. There can be little doubt that this delay slows down scientific discovery and the introduction to the public of new ways to diagnose and treat disease [this is something Pat Brown and I have talked about trying to quantify, but I don’t have anything yet].
Of course this might be worth it if this manifestation of peer review were an essential part of the scientific enterprise that somehow made the ultimate product better, in spite of – of even because of – the delays. But this leads to:
2) The system is not very good at what it purports to do. The values that people primarily ascribe to peer review are maintaining the integrity of the scientific literature by preventing the publication of flawed science; filtering of the mass of papers into to identify those one should read; and providing a system for evaluating the contribution of individual scientists for hiring, funding and promotion. But it doesn’t actually do any of these things effectively.
The kind of flawed science that people are most worried about are deceptive or fraudulent papers, especially those dealing with clinical topics. And while I am sure that some egregious papers are prevented from being published by peer review, the reality is that with 10,000 or so journals out there, most papers that are not obviously flawed will ultimately get published if the authors are sufficiently persistent. The peer reviewed literature is filled with all manner of crappy papers – especially in more clinical fields. And even the supposedly more rigorous standards of the elite journals fail to prevent flawed papers from being published (witness the recent Arsenic paper published by Science). So, while it might be a nice idea to imagine peer review as some kind of defender of scientific integrity – it isn’t.
And even if you believed that peer review could do this – several aspects of the current system make it more difficult. First, the focus on the importance of a paper in the publishing decision often deemphasizes technical issues. And, more importantly, the current system relies on three reviewers judging the technical merits of a paper under a fairly strict time constraint – conditions that are not ideally suited to recognize anything but the most obvious flaws. In my experience the most important technical flaws are uncovered after papers are published. And yet, because we have a system that places so much emphasis on where a paper is published, we have no effective way to annotate previously published papers that turn out to be wrong: once a Nature paper, always a Nature paper.
And as for classification, does anyone really think that assigning every paper to one journal, organized in a loose and chaotic hierarchy of topics and importance, is really the best way to help people browse the literature? It made some sense when journals had to be printed and mailed – but with virtually all dissemination of the literature now done electronically, this system no longer makes any sense whatsoever. While some people still read journals cover to cover – most people now find papers by searching for them in PubMed, Google Scholar or the equivalent. While the classification into journals has some value, it certainly doesn’t justify the delays in publication that it currently requires.
I could go on about the problems with our current peer review system, but I’m 1,500 words into this thing and I want to stop kvetching about the problem and get to the solution.
The way forward: decoupling publication and assessment
Despite the impression I may have left in the previous section, I am not opposed to the entire concept of peer review. I think there is tremendous value generated when scientists read their colleagues papers, and I think science needs efficient and effective ways to capture and utilize this information. We could do this without the absurd time-wasting and frivolity of the current system, by decoupling publication from assessment.
The outlines of the system are simple. Papers are submitted to a journal and assigned to an editor. They make an initial judgment of the suitability of the paper – rejecting things that manifestly do not belong in the scientific literature. If it passes this initial screen, the paper is sent out to peer reviewers (with the authors given the option of having their paper published immediately in a preliminary form).
Reviewers are given two separate tasks. First, to assess the technical validity of the paper, commenting on any areas where it falls short. Second, and completely independently, they are asked to judge the importance of the paper in several dimensions (methodological innovation, conceptual advance, significant discovery, etc…) and to determine who should be interested in the paper (all biologists; geneticists; Drosophila developmental biologists, etc….). This assessment of importance and audience would be recorded in a highly structured (and therefore searchable and computable) way – and would, in its simplest manifestation, amount to reviewers saying “this paper is good enough to have been published in Nature” or “this is a typical Genetics paper”.
The reviews would go back to the editor (whose main job would be to resolve any disagreement among the reviewers about the technical merits of the paper, and perhaps lead a discussion of its importance), who would pass on the decision to or not to publish (here based entirely on the technical merits) on to the authors along with the reviewers structured assessment of importance and any comments they may have. If the technical review was positive, and the authors were happy with the assessment of importance and audience, they could have it published immediately. Or they could choose to modify the paper according to the reviewer’s comments and seek a different verdict.
This system – pieces of which are already implemented in PLoS One and its mimics – has several immediate and obvious advantages.
First, it would be much faster. Most papers would only go through a single round of review after which they would be published. No ping-ponging from one journal to another. And this dramatic increase in speed of publication would not come at the price of assessment – afterall, main result of the existing peer review system, the journal in which a paper is published, is really just an assessment of the likely importance and audience for a paper – which is exactly the decision reviewers would make in the new system.
Second, by replacing the current journal hierarchy with a structured classification of research areas and levels of interest, this new system would undermine the generally poisonous “winner take all” attitude associated with publication in Science, Nature and their ilk. This new system for encoding the likely impact of a paper at the time of publication could easily replace the existing system (journal titles).
Third, by devaluing assessment made at the time of publication, this new system might facilitate the development of a robust system of post publication in peer review in which individuals or groups would submit their own assessments of papers at any point after they were published. These assessments could be reviewed by an editor or not, depending on what type of validation readers and other users of this assessments want. One could imagine editorial boards that select editors with good judgment and select their own readers to assess papers in the field, the results of which would bear the board’s imprimateur.
Finally, this system would be extremely easy to create. We already have journals (PLoS One is the biggest) that make publication decisions purely on their technical merits. We need to put some more thought into exactly what the structured review form would look like, what types of questions it would ask, and how we would record and transmit it. But once we do this, such a system would be relatively easy to build. We are moving towards such a system at PLoS One and PLoS Currents, and I’m optimistic that it will be built at PLoS. And with your ideas and support, we can – with remarkably little pain – fix peer review.
[Update] This is not just a problem with elite journals
In the comments Drug Monkey suggests that this problem is restricted to “Glamour Mags” like Science and Nature. While they are particularly bad practicer of the dark art, virtually all existing journals impose a significance standard on their submissions and end up rejecting a large number of technically sound papers because they are not deemed by the reviewers and editors to be important enough for their journals. All of the statistics that I’ve seen show that most of the mainstream society journals reject a majority of the papers submitted to them – most, I would bet, because they do not meet the journal’s standards for significance. In my experience as an author of almost a hundred papers and reviewer and editor for many more, reviewers view their primary job to determine the significance of a paper, and often prioritize this over assessing its technical merits. One of the funny (i.e. tragic) things I’ve noticed is that reviewers don’t actually modify their behavior very much when they review for different journals – they have one way of reviewing papers, and they do basically the same thing for every journal. Indeed, I’ve had the absurd experience of getting reviews from PLoS One – a journal that explicitly tells reviewers only to assess technical merits – that said that the paper was technically sound, but did not rise to the significance of PLoS One.