Darwin’s Tangled Bank in verse

My daughter has to memorize a poem for a school performance, and asked me if I knew a good poem about nature. There are, of course, many good ones, but I really wanted her to have the most poetic thing ever written about nature – the last paragraph of Darwin’s Origin of Species – rendered in verse. So I gave it a try.

 

The Tangled Bank

Contemplate a tangled bank
Clothed with many kinds of plant
Insects and birds flitting about
Worms crawling through the damp

Reflect that these elaborate
And differently constructed forms
Have been produced by such a simple set
Of ever acting norms

Growth, reproduction and inheritance
Variation to transmit
Natural selection then leading to
Extinction of the less fit

From the war of nature
From famine and from death
Follow the most exalted species
To have ever drawn a breath

There is grandeur in this view of life
And its powers not yet gone
Having been originally breathed
Into a few forms or just one

From as simple a beginning
As could ever be resolved
Endless forms most beautiful
Are continuously evolved.

 

Here’s the original:

It is interesting to contemplate an entangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent on each other in so complex a manner, have all been produced by laws acting around us. These laws, taken in the largest sense, being Growth with Reproduction; inheritance which is almost implied by reproduction; Variability from the indirect and direct action of the external conditions of life, and from use and disuse; a Ratio of Increase so high as to lead to a Struggle for Life, and as a consequence to Natural Selection, entailing Divergence of Character and the Extinction of less-improved forms. Thus, from the war of nature, from famine and death, the most exalted object which we are capable of conceiving, namely, the production of the higher animals, directly follows. There is grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.

 

Posted in Darwin, evolution | Tagged , , , | Comments closed

Is the NIH a cult?

As many of you know, I spent a fair amount of time last month engaged in debates about the wisdom of California’s Proposition 37, which would have mandated the labeling of genetically modified foods. While many of these discussions were civil, one particularly energetic fellow accuse me of having been brainwashed by the “cult of the NIH” into believing that anything science does must be good.

At the time I just giggled. But his tweet stuck in my head. After the election I looked back on my twenty years as a scientist in the “NIH system”, and I began to see the signs. So I read about cults – about what differentiates them from normal run of the mill organization. And I started keeping score (on a 1-9 scale, of course, with 1 being the most cultish).

Charismatic Leader

Every cult has a charismatic Svengali-like leader at its helm,

obsessed with their own self-image,

who ingratiates themselves with powerful leaders to further the cult’s agenda,

demands everyone call him “The Director” and publishes books and other materials espousing a personal philosophy and inviting comparison to deities.

 

Score: 1

Isolated Compound

Cults always have a compound,

with high levels of security

where The Director and his minions sit ensconced in a grandiose “Building 1”

 

complete with easily recognized signs of Roman imperial dominance.

Score: 1

Membership

Cults commonly have an elaborate process for selecting new members to ensure that they are appropriate and will not cause undue trouble. These often involve length screening periods during which potential members undergo different types of hazing.

Aspirants for membership in the NIH are forced to go through a grueling initiation ritual in which they are forced to recite a set of personal “Aims” and explain how they will further the cult’s organization’s “mission”. These applications are reviewed by existing members who are locked in windowless rooms without food or drink for extended periods of time where the merits of aspiring members are dissected by their potential “peers” flown in expressly for this purpose from across the country. Many are rejected out of hand for a wide range of undisclosed failings. Those who survive this “triage” process are subjected to further scrutiny and, after extensive wrangling and wheeling and dealing under the supervision of a “program manager” assigned to prevent violence, every aspirant is given a score. The solemn receiving of the score and assessment of the “peers”, printed on ceremonial pink paper, is one of the most stressful moments in the life of an NIH aspirant. They then are forced to endure months of additional waiting while receiving little or no information or encouragement, until their application is reviewed again by a mysterious organization of elders known as Councils who select new members according to the needs of their cell – also known as an Institute.

Newly selected members – known to insiders as grantees – are immediately given an obscure code containing a unique identifier along with a signifier of their rank decipherable only by other members of the organization. I have deciphered some of them. “K99” designates novices. Routine workers are known by “R01”. A select group of “pioneers” bear the mark “DP2”. Many aspire to be a local leader, known as a “P01”. And, in recent years, a new group of members known as “U01″‘s have emerged to carry out special mission at the express behest of “The Director”.

Score: 2

Recruitment

Cults often have an active process for recruiting new members, often by indoctrinating children and naive young adults who have the misfortune of finding themselves under the influence of existing members.

The NIH has several national indoctrination programs, but the most dangerous and effective is something known as the “Training Grant”. These NIH cells, found on most university campuses across the country and always led by an established “grantee”, prey on impressionable youths just out of college and eager to shed the structure of their parents’ worlds. The NIH takes them under its wing and gives them a generous personal stipend and a structured program of research and experimentation. They dangle the carrot of one day becoming a “grantee”, but they do not tell them about the lonely, grueling years to come, or that only a handful of them will actually make it to the point where they are even allowed to submit their first application for membership. By the time they are done with this program, most have drunk the NIH Kool Aid, and can think of nothing they want more than to become a grantee. And those who do not feel they have sunk too much of their time and energy into these first steps along the grantee path to give up.

Score: 3

Bait-and-switch

People who have broken free of cults often complain that they were initially given easy rewards within the organization – special quarters, access to leaders, choice of the best jobs – but that once they were in for a few years, these perks were not longer so easy to obtain.

The NIH has been known to give species status to first-time applicants – making it far easier for them to get selected than those already in the system, who are forced to endure a “renewal” process every 3-5 years during which time they have to justify their continued status in the organization (there are few things as tragic as an ex-grantee).

Score: 4

Separating members from family

Cults almost always work to separate members from friends and family who are not members of the organization – cutting them off from support and from the outside world.

The highly competitive review process engineered by the NIH forces aspiring and existing members to work all hours of the day, night and weekends, eschewing family and friends in the interest of furthering the NIH agenda and their own place within it. This often leads members to marry within the organization creating additional challenges, including something called a “two body problem”.

Score: 2

Pet projects

Many cult members end up subjugating up their own aspirations for the pet projects of the cult leader, which are often insanely grandiose and often lead to the financial ruin first of the members then of the organization itself.

Increasing amounts of money siphoned from public coffers by the NIH are going to pet projects of The Director.

Score: 1

Giving away Possessions

Cults are renowned for forcing their members to give away all of their possession.

The NIH has a “Public Access Policy” which forces members to give away the single most valuable thing they produce while members of the system. And now they threaten to expel grantees who do not obey.

Score: 2

 

Posted in NIH | Comments closed

Prop 37 and the Right to Know Nothing

As we approach election day, my neighborhood in Berkeley has sprouted dozens of blue and orange yard signs supporting Proposition 37, which would require the labeling of genetically modified foods.

The “Right to Know” has become the rallying cry of the initiative’s backers, who meet any criticism of the initiative, its motivation or of the “science” used to back it with the same refrain: “We have the right to know what’s in our food!!”.

It is, of course, hard to argue that people should not have this right. I am a very strong supporter of consumer rights and of providing information, even if people use it stupidly. But, I have closely followed the debate over Prop 37, reading and listening to and occasionally arguing with its proponents. And I have been struck throughout by just how little backers of the initiative actually want to know anything.

The law would require the application of a catchall “Contains GMOs” label to any product containing any ingredient from a genetically modified plant, animal or microbe. This language reflects the belief of its backers that GMOs are intrinsically bad and deserve to be labeled – and avoided – en masse, no matter what modification they contain or towards what end they were produced. This is not a quest for knowledge – it is a an attempt to reify ignorance.

Sure, if you think, as some people do, that moving genes from one species to another is some kind of crime against nature that risks destroying life on Earth, a blanket prohibition against GMOs makes sense. But the bulk of Prop 37 supporters I have heard or spoken to express more rational concerns, primarily:

  1. The specific modifications in common GM crops – the production of insecticidal proteins or of genes for herbicide tolerance – make them unsafe for human consumption.
  2. Whether safe or unsafe for humans, GM crops encourage an industrialized monoculture approach to farming that is unsustainable and bad for the planet.
  3. GM technology is wielded by multinational conglomerates like Monsanto who have little regard for the public interest and produce GM crops solely to make more money, and who use intellectual property in their creations to squeeze farmers and increase their control over global agriculture.

Whether one agrees with these points or not – I disagree with 1, but agree with 2 and 3 to varying degrees – none of them apply uniformly to all GMOs.

If you’re worried that the GMOs you’re eating might kill you, then you should want to know what specific modification your food contains. I don’t think there is any harm in eating food containing the insecticidal “Bt” protein, but even if it were dangerous this would have no bearing on the safety of golden rice.

Similarly, if you are concerned that the transgenic production of plants resistant to certain herbicides encourages the excessive use of herbicides and triggers an herbicide treadmill, then you can boycott crops containing these modifications. But it doesn’t make sense to oppose the use of crops engineered to resist diseases, or to produce essential vitamins. Indeed, there are many, like UC Davis’s Pam Ronald, who believe that advanced development of GMOs is the best way to advance organic and sustainable agriculture. You may disagree with her, but it should be clear that the effect on agricultural practices varies depending on the specific plant and type of modification being considered.

And, while I share much of the disdain anti-GMO advocates feel for the business practices of companies like Monsanto, not every seed company uses the same practices, and there are plenty of academic researchers, non-profits and companies laboring to use GMOs to solve major challenges in global food production, distribution and nutrition. To hamper what they are doing in the name of sticking it to Monsanto – whose questionable business practices extend far beyond GMOs – makes no sense.

Thus the very reasons supporters of GMO labeling cite for labeling GMOs demand more information than “This product contains genetically modified ingredients”. And it’s the central irony of Prop 37 that in backing the bill they are, in tangible ways, working to ensure they do not get information that will be actually useful to them.

Some backers of Prop 37 say that it is the first step towards more comprehensive food labeling. If, in the push to pass the initiative I saw a thirst for real knowledge and understanding of where crops come from and how food is produced, then I’d share their optimism.

But everything I’ve seen from proponents of Prop 37 suggests something else – a lazy and self-satisfied acceptance of an internally incoherent piece of legislation that, rather than giving consumers the “right to know”, will actually protect their desire to know nothing.

Posted in GMO | Comments closed

Science is healthy for children and other living things

Posted in GMO | Comments closed

Retraction action, what’s your faction: the dangers of citation worship

If you ask scientists to list words they are most afraid to hear associated with their work, I suspect “retraction” would rank high on the list. Retraction is a kind of death sentence, applied only when papers contain serious methodological errors or were tainted by fraud.

So the recent retraction of a PLoS Pathogens paper linking the virus XMRV to prostate cancer, following a new PLoS ONE paper that demonstrated that the original results were due to contamination, caught many (including the authors of the original paper, many of whom were involved in the followup study) off guard. Martin Enserink at ScienceNOW and Retraction Watch have excellent posts with details on the story.

Before offering my thoughts on this, I want to state at the outset that I have more than an passing interest in the story. I was one of the co-founders of PLoS, am a member of its Board of Directors, and continue to play an active role in its activities. I am also worked closely with the senior author on the original paper – Joe DeRisi – for three years while we were in Pat Brown’s lab at Stanford, and he remains a good friend. He is not only one of the most creative people I know, he is one of the best, and most careful, experimentalists I have ever met.

Putting aside the question of retraction for a moment, this is exactly how science is supposed to work. Several very good scientists found an intriguing and potentially important result and published a paper on it. Subsequent efforts failed to confirm their initial result. Rather than digging in their heals and defending their initial study – as many scientists do – the original authors accepted the newer results, and went to great lengths to figure out what had gone wrong. Their new paper is a model of detective work, and a cautionary tale about the challenges of working with clinical samples and viruses that everyone should read.

So it is now pretty clear that the major conclusion of the original paper – the association between XMRV and prostate cancer – is wrong. Obviously, people working in the field and anyone interested in the prostate cancer and chronic fatigue syndrome (the subject of a subsequent paper) who come upon the 2006 PLoS Biology paper need to know that subsequent studies have shown that the samples were contaminated and the conclusions are no longer accepted by authors. The question is how to do this.

Unfortunately, in the current world of scientific publishing, there aren’t a lot of ways to do this, and the editors at PLoS Pathogens chose to retract the paper. This retraction was accompanied by an editorial from PLoS Pathogens editor Kasturi Haldar and PLoS Medicine editor Ginny Barbour on the role of retractions in correcting the literature. I don’t agree with the decision to retract this paper, but it is worth understanding their logic:

There is much misunderstanding about retractions. Authors and editors have been notoriously unwilling to use them, for the perceived shame that they bring upon authors, editors, and journals. Journalists regularly note the fact that retractions are increasing and ask whether the scientific literature is thus becoming less reliable. Websites such as Retraction Watch list and dissect retractions – an extra exposure at what is already a difficult time for authors and editors. In addition there is much confusion about how to effect retractions practically. In an effort to bring some clarity to this issue in 2009 the Committee on Publication Ethics of which PLOS Pathogens is a member and one of us (VB) is currently Chair, issued guidelines on retractions, which explicitly state that retractions are appropriate when findings are unreliable, either as a result of misconduct (e.g. data fabrication) or honest error.

In essence, they are trying to expand the definition of retraction away from its common usage as a way to indicate misconduct to include all cases in which the findings of a paper should now be judged unreliable. They go on to explain how they will wield this redefined tool in the future:

We firmly believe that acceleration also requires being open about correcting the literature as needed so that research can be built on a solid foundation. Hence as editors and as a publisher we encourage the publication of studies that replicate or refute work we have previously published. We work with authors (through communication with the corresponding author) to publish corrections if we find parts of articles to be inaccurate. If a paper’s major conclusions are shown to be wrong we will retract the paper. By doing so, and by being open about our motives, we hope to clarify once and for all that there is no shame in correcting the literature. Despite the best of efforts, errors occur and their timely and effective remedy should be considered the mark of responsible authors, editors and publishers.

No matter what Haldar and Barbour want, they can not erase the stigma of retraction by fiat. When a work means something in the community, it doesn’t matter what a dictionary or some unknown committee says. Retractions are viewed by scientists and the public as marks of shame. Imagine how the students and postdocs who carried out the work described in the 2006 paper. They did nothing wrong. Indeed several participated in the effort to figure out what went wrong – going above and beyond what most people would have done. And the reward for their effort is to have “RETRACTED” show up every time someone searches for them on PubMed? This is not the right solution.

I understand the instinct to want a way to correct the literature, especially in cases like this that have attracted a lot of public attention. But isn’t science ultimately all about correcting the literature? It’s not a singular act to look back at previous work and find things that could have been done better, and even things that are outright wrong. This is a large part of what we do. If you look back at the literature from five year, ten years or longer ago, you will find myriad papers that, given what we know now, have findings that are unreliable and conclusions that are now clearly wrong. Are we going to go back and retract all of these papers? Of course not. It’s insane.

As easy as it might be to dismiss this incident as an isolated example of editorial overreach, this is really just the latest manifestation 0f a broader problem that plagues scientific publication and poisons the scientific process: the reification of the citation. Going back and correcting published papers only makes sense if you view the scientific literature as an isolated collection of discrete, singular events – publications – commemorated with a sacred merk – the citation. If papers are supposed to stand forever as vessels of truth, then of course you have to purge those that are shown to be wrong – both to protect people from untruths, and to defend the sanctity of the citation.

Researchers dread retractions for the same reason they will sell their souls to publish in a high impact journals – because the currency of academic success is not achievement – it is citations. Sure, they are not unlinked. But where they come into conflict, citation almost always win. A Nature paper is a Nature paper forever – even if the results turn out to be insignificant, or, as is often the case, outright wrong. The only thing that can change that is a retraction.

Thus, in some ways, the proposal by Haldar and Barbour is not reactionary, as many have suggested – it is deeply subversive. By exposing all citations – not just those achieved dishonestly – to the threat of retraction it strips the citation of one of its most valuable properties – permanence.  But despite my love for all things subversive, I do not think this is the right solution, as it ultimately reinforces the idea of the scientific literature as a collection of discrete events.

An obvious solution to all of these problems follows from thinking about the literature as what it really is: a historical record of ideas, discoveries and, yes, mistakes – whose value comes not from static individual pieces, but from ways in which they are connected and change over time. It is often said that science is “self-correcting”, recognizing that our views of the value and validity of previously published work inevitably changes over time as we use, build on and expand upon the work of our colleagues – something perfectly demonstrated by the XMRV story. What we need to do is not to isolate and protect ourselves from the dynamic nature of science, but to embrace it.

It’s disheartening that in this day of electronic publications and databases that the editors felt that the only way they could ensure that people reading the 2006 XMRV paper would look at it in the context of newer findings was to retract the paper. If we had a way of capturing how new methods, data and ideas were changing our view of earlier work, they would not have needed to even consider something as dire or as clumsy as a retraction. And there is no reason we can’t do this – we have the technical means to switch from one-time assessments of a paper to a system of ongoing evaluation and reevaluation whose output changes as our understanding grows. The only thing stopping us is the continued reification of the citation in science, and our unwillingness to discard it.

UPDATE: I want to emphasize that my goal here was not to take the editors’ to task. I don’t completely support what they did, but they were trying to deal a real, immediate problem – people acting on conclusions from a paper whose results nobody now believes to be true. What I was primarily lamenting was the fact that our system does not provide them with any other tool than retraction.

Posted in publishing, science | Tagged | Comments closed

Blinded by Big Science: The lesson I learned from ENCODE is that projects like ENCODE are not a good idea

When the draft sequence of the human genome was finished in 2001, the accomplishment was heralded as marking the dawn of the age of “big biology”. The high-throughput techniques and automation developed to sequence DNA on a massive scale would be wielded to generate not just genomes, but reference data sets in all areas of biomedicine.

The NHGRI moved quickly to expand the universe of sequenced genomes, and to catalog variation within the human population with HapMap, HapMap 2 and 1000 genomes. But they also began to dip their toe into the murkier waters of “functional genomics”, launching ENCODE, a grand effort to build an encyclopedia of functional elements in the human genome. The idea was to simultaneously annotate the human genome and provide basic and applied scientists working on human disease with reference data sets that they would otherwise have had to generate themselves. Instead of having to invest in expensive equipment and learn complex protocols, they would often be able to just download the results, thereby making everything  they did faster and better.

Now, a decade and several hundred million dollars later, the winding down of ENCODE and the publication of dozens of papers describing its results offer us a vital opportunity to take stock in what we learned, if it was worth it, and, most importantly, whether this kind of project makes sense moving forward. This is more than just an idle intellectual question. NHGRI is investing $130m in continuing the project, and NHGRI and the NIH as a whole, have signalled their intention to do more projects like ENCODE in the future.

I feel I have a useful perspective on these issues. I served as member of the National Advisory Committee for the ENCODE and related modENCODE projects throughout their lifespans. As a postdoc with Pat Brown and David Botstein in the late 90’s I was involved in the development of DNA microarrays and had seen first hand the transformative potential of genome sequences and the experimental genomic techniques they enabled. I believed then, and still believe now, that looking at biology on a big scale is often very helpful, and that it can make sense to let people who are good at doing big projects, and who can take advantage of economies of scale, generate data for the community.

But the lesson I learned from ENCODE is that projects like ENCODE are not a good idea.

American biology research achieved greatness because we encouraged individual scientists to pursue the questions that intrigued them and the NIH, NSF and other agencies gave them the resources to do so. And ENCODE and projects like it are, ostensibly at least, meant to continue this tradition, empowering individual scientists by producing datasets of “higher quality and greater comprehensiveness than would otherwise emerge from the combined output of individual research projects”.

But I think it is now clear that big biology is not a boon for individual discovery-driven science. Ironically, and tragically, it is emerging as the greatest threat to its continued existence.

The most obvious conflict between little science and big science is money. In an era when grant funding is getting scarcer, it’s impossible not to view the $200m spent on ENCODE in terms of the ~125  R01’s it could have funded. It is impossible to score the value lost from these hundred or so unfunded small projects against the benefits of one big one. But a awful lot of amazing science comes out of R01’s, and it’s hard not to believe that at least one of these projects would have been transformative.

But, as bad as the loss of individual research grants is, I am far more concerned about the model of independent research upon which big science projects are based.

For a project like ENCODE to make sense, one has to assume that when a problem in my lab requires high-throughput data, that years in advance, someone – or really a committee of someones – who has no idea about my work predicted precisely the data that I would need and generated it for me. This made sense with genome sequences, which everyone already knew they needed to have. But for functional genomics this is nothing short of lunacy.

There are literally trillions of cells in the human body. Multiply that by life stage, genotype, environment and disease state, and the number of possible conditions to look at is effectively infinite. Is there any rational way to predict which ones are going to be essential for the community as a whole, let alone individual researchers? I can’t see how the answer is possibly yes. What’s more, many of the data generated by ENCODE were obsolete by the time they were collected. For example, if one were starting to map transcription factor binding sites today, you would almost certainly use some flavor of exonuclease ChIP, rather than the ChIP-seq techniques that dominate the ENCODE data.

I offer up an example from my own lab. We study Drosophila development. Several years ago a postdoc in my lab got interested in sex chromosome dosage compensation in the early fly embryo, and planned to use genome-wide mRNA abundance measurements in male and female embryos to study it. It just so happened that the modENCODE project was generating genome-wide mRNA abundance measurements in Drosophila embryos. Seems like a perfect match. But these data was all but useless to us, not because the data weren’t good – the experiment was beautifully executed – but because their data could not answer the question we were pursuing. We needed sex-specific expression; they pooled males and females. We needed extremely precise time resolution (to within a few minutes); they looked at two hour windows. There was no way they could have anticipated this – or any of the hundreds of other questions about developmental gene expression that came up in other labs.

We were fortunate. I have money from HHMI and was able to generate the data we needed. But a lot of people would not have been in my position, and in many ways would have been worse off because the existence of ENCODE/modENCODE makes it more difficult to get related genomics projects funded. At this point the evidence for such an effect is anecdotal – I have heard from many people that reviewers explicitly cited an ENCODE project as a reason not to fund their genomics proposal – but it’s naive to think that these big science projects will not affect the way that grants are allocated.

Think about it this way. If you’re an NIH agency looking to justify your massive investment in big science projects, you are inevitably going to look more favorably on proposals that use data that has already, or is about to be, generated by expensive projects that feature in the institute’s portfolio. And the result will be a concentration of research effort on datasets of high technical quality, but little intrinsic value, with scientists wanting to pursue their own questions left out in the cold, and the most interesting and important questions at risk of never being answered, or even asked.

You can already see this mentality at play in discussions of the value of ENCODE. As I and many others have discussed, the media campaign around the recent ENCODE publications was, at best, unseemly. The empty and often misleading press releases and quotes from scientists were clearly masking the fact that, despite publishing 30 papers, they actually had very little of grand import to say, today, about what they found. The most  pensive of  them realized this, and went out of their way to emphasize that other people were already using the data, and that the true test was how much the data would be used over the coming years.

But this is the wrong measure. These data will be used. It is inevitable. And I’m sure this usage will be cited often to justify other big science projects ad infinitum. And we will soon have a generation of scientists for whom an experiment is figuring out what kinds of things they can do with data selected three years earlier by a committee sitting in a windowless Rockville hotel room. I don’t think this is the model of science anyone wants – but it is precisely where we are headed if the metastasis of big science is not amended.

I want to be clear that I am not criticizing the people who have carried out these projects. The staff at the NIH who ran ENCODE, and the scientists who carried it out worked tirelessly to achieve its goals, and the organizational and technical feat they achieved is impressive. But that does not mean it is ultimately good for science.

When I have raised these concerns privately with my colleagues, the most common retort I get is that, in today’s political climate, Congress is more willing to fund big, ambitious sounding projects like ENCODE than they are to simply fund the NIH extramural budget. I can see how this might be true. Maybe the NIH leadership is simply feeding Congress what they want in order to preserve the NIH budget. And maybe this is why there’s been so little push back from the general research community against the expansion of big biology.

But it will be a disaster if, in the name of protecting the NIH budget and our labs’ funding, we pursue big projects that destroy investigator driven science as we know it in the process.

Posted in ENCODE, NOT junk, science, science and politics | Tagged | Comments closed

A neutral theory of molecular function

In 1968 Motoo Kimura published a short article in Nature in which he argued that “most mutations produced by nucleotide replacement are almost neutral in natural selection”. This fantastic paper is generally viewed as having established the “neutral theory” of molecular evolution, whose central principle was set out by Jack King and Lester Jukes in a Science paper the following year:

Evolutionary change at the morphological, functional, and behavioral levels results from the process of natural selection, operating through adaptive change in DNA. It does not necessarily follow that all, or most, evolutionary change in DNA is due to the action of Darwinian natural selection.

It is hard to overstate the importance of these papers. They offered an immediate challenge to the deeply flawed, but widely held, belief that all changes to DNA must be adaptive – an assumption that was poisoning the way that most biologists were reckoning with the first wave of protein sequence data. And, as their ideas were rapidly accepted in the nascent field of molecular evolution, the neutral theory loomed over virtually all analyses of sequence variation within and between species for decades to come.

What Kimura, King and Jukes really did was to establish a new “null model” against which any putative example of adaptive molecular change must be judged. Indeed, neutrality offered such a good explanation for sequence changes over time that when I entered the field in the early 90’s researchers will still struggling to find a single example of molecular change for which a neutral explanation could be rejected.

While the explosion of sequence data in the past decade ultimately yielded unambiguous evidence for large-scale adaptive molecular evolution, it is hard to overstate just how powerful the neutral null model was in forcing people to think clearly about what adaptive change means, and how one would go about identifying clear examples of it.

I think a lot about Kimura, the neutral theory, and the salutary effects of clear null models every time I get involved in discussions about the function, or lack thereof, of biochemical events observed in genomics experiments, such as those triggered this week by publications from the ENCODE project.

It is easy to see the parallels between the way people talk about transcribed RNAs, protein-DNA interactions, DNase hypersensitive regions and what not, and the way people talked about sequence changes PK (pre Kimura). While many of the people carrying out RNA-seq, ChIP-seq, CLIP-seq, etc… have been indoctrinated with Kimura at some point in their careers, most seem unable to apply his lesson to their own work. The result is a field suffused with implicit or explicit thinking along the following lines:

I observed A bind to B. A would only have evolved to bind to B if it were doing something useful. Therefore the binding of A to B is “functional”.

One can understand the temptation to think this way. In the textbook view of molecular biology, everything is highly regulated. Genes are transcribed with a purpose. Transcription factors bind to DNA when they are regulating something. Kinases phosphorylate targets to alter their activity or sub-cellular location. And so on. Although there have always been lots of reasons to dismiss this way of thinking, until about a decade ago, this is what the scientific literature looked like. In the day where papers described single genes and single interactions, who would bother to publish a paper about a non-functional interaction they observed?

But experimental genomics blew this world of Mayberry molecular biology wide open. For example, when Mark Biggin and I started to do ChIP-chip experiments in Drosophila embryos, we found that factors were binding not just to their dozen or so non-targets, but the thousands, and in some cases tens of thousands of places across the genome. Having studied my Kimura, I just assumed that the vast majority of these interactions had evolved by chance – a natural, essential, consequence of the neutral fixation of nucleotide changes that happened to create transcription factor binding sites. And so I was shocked that almost everyone I talked to about this data assumed that every one of these binding events was doing something – we just hadn’t figured out what yet.

But if you think about this, you will realize that this simply can not be true. As we and many others have now shown, molecular interactions are not rare. Transcripts, transcription factor binding sites, DNA modifications, chromatin modifications, RNA binding sites, phosphorylation sites, protein-protein interactions, etc… are everywhere. This suggests that these kind of biochemical events are easy to create – change a nucleotide here – wham, a new transcription factor binds, an splicing site is lost, a new promoter is created, a glycosylation site is eliminated.

Does this conflict with the neutral theory? Not at all! Indeed, it is perfectly consistent with it. The neutral theory does not demand that most sequence changes have no measurable effect on the organisms. Rather the only thing you have to assume is that the vast majority of the biochemical events that happen as a consequence of random mutations do not significantly affect organismal fitness. Given that such a large fraction of the genome is biochemically active, the same basic logic Kimura, King and Jukes used to argue for neutrality – that it is simply impossible for such a large number of molecular traits to have been driven to fixation by selection – argues strongly that most biochemical events do not contribute significantly to fitness. Indeed, given the apparent frequency with which new molecular interactions arise, it is all but impossible that we would still exist if every new molecular event had a strong phenotypic effect.

This, of course, does not mean that all these molecular events do nothing – their very existence is a form of function. But we are generally interested in different types of function – things that did arise through natural selection, are maintained by purifying selection, and whose disruption will cause a disease or other significant phenotype. Of course these things exist amidst the rubble. The question is how to find them. And here, I think we should once again take our cue from Kimura.

As I argued above, the field of molecular evolution developed a powerful intellectual core in no small part because researchers had to reckon with the powerful neutral null hypothesis – meaning that adaptive change had to be demonstrated, not assumed. We need to apply the same logic to molecular interactions.

Rather than assuming – as so many of the ENCODE researchers apparently do – that the millions (or is it billions?) of molecular events they observe are a treasure trove of functional elements waiting to be understood, they should approach each and every one of them with Kimurian skepticism. We should never accept the existence or a molecule or the observation that it interacts with something as prima facia evidence that it is important. Rather we should assume that all such interactions are non-functional until proven otherwise, and develop better, compelling, ways to reject this null hypothesis.

To paraphrase King and Jukes:

Life is dependent on the production of and interaction between DNAs, RNAs, proteins and other biomolecules. It does not necessarily follow that all, or most, biomolecules and interactions among them are due to the action of Darwinian natural selection.

I want to end by pointing out that there are lots of people (me and my group included) who have already been wrestling with this issue, with lots of interesting ideas and results already out there. From an intellectual standpoint I’d like to particularly point out the influence the writings of Mike Lynch have had on me – see especially this.

=======

NOTE: There’s a lot more to say about this, and in the interests of time (I have to give a genetics lecture first thing in the morning) I haven’t gone into as much depth as some of these issues deserve. I will update this post as time permits.

 

Posted in ENCODE, evolution, gene regulation, genetics, NOT junk, science | Comments closed

This 100,000 word post on the ENCODE media bonanza will cure cancer

It is oddly fitting that the papers describing the results of the NIH’s massive $200m ENCODE project were published in the midst of political convention season. For this was no typical scientific publication, but a carefully orchestrated spectacle, meant to justify a massive, expensive undertaking, and to convince us that we are better off now than we were five years ago.

I’ll touch more on details of the science, and the way it was carried out, in another, longer, post. But I want to try to explain to people who were asking on twitter why I found today’s media blitz to promote the ENCODE publications so off-putting. Because, as cynical as I am about this kind of thing, I still found myself incredibly disheartened by the degree to which the ENCODE press release and many of the interviews published today push a narrative about their results that is, at best, misleading.

The issues all stem, ultimately, from the press releases issued by the ENCODE team, one of which begins:

The hundreds of researchers working on the ENCODE project have revealed that much of what has been called ‘junk DNA’ in the human genome is actually a massive control panel with millions of switches regulating the activity of our genes. Without these switches, genes would not work – and mutations in these regions might lead to human disease. The new information delivered by ENCODE is so comprehensive and complex that it has given rise to a new publishing model in which electronic documents and datasets are interconnected.

The problems start before the first line ends. As the authors undoubtedly know, nobody actually thinks that non-coding DNA is ‘junk’ any more. It’s an idea that pretty much only appears in the popular press, and then only when someone announces that they have debunked it. Which is fairly often. And has been for at least the past decade. So it is more than just intellectually lazy to start the story of ENCODE this way. It is dishonest – nobody can credibly claim this to be a finding of ENCODE. Indeed it was a clear sense of the importance of non-coding DNA that led to the ENCODE project in the first place. And yet, each of the dozens of news stories I read on this topic parroted this absurd talking point – falsely crediting ENCODE with overturning an idea that didn’t need to be overturned.

But the deeper problem with the PR, and the main paper to some extent, is the way that they slip and slide around the extent and nature of the functions they have “discovered”. The pullquote from the press release is that the human genome is a “massive control panel with millions of switches regulating the activity of our genes”. So let’s untangle this a bit. It is true that the paper describes millions of sequences bound by transcription factors or prone to digestion by DNase. And it is true that many bona fide regulatory sequences will have these properties. But as even the authors admit, only some fraction of these sequence will actually turn out to be involved in gene regulation. So it is simply false to claim that the papers have identified millions of switches.

Ewan Birney, who lead the data analysis for the entire ENCODE project, wrote an excellent, measured post on the topic today in which he makes it clear that when they claim that 80% of the genome is “functional”, the are simply refers to its having biochemical activity. And yet even his quotes in the press release play a bit fast and loose with this issue, repeating the millions of switches line. Surely it’s a sign of a toxic process when people let themselves be quoted saying something they don’t really believe.

The end result is some fairly disastrous coverage of the project in the popular press. Gina Kolata’s story on the topic in the New York Times is, sadly, riddled with mistakes. It’s commonplace amongst scientists to blame this kind of thing on reporters not knowing what they’re talking about. But in this case at least the central problems with her story trace directly back to the misleading way in which the results were presented by the authors’.

The NYT piece is titled “Bits of Mystery DNA, Far From ‘Junk,’ Play Crucial Role” (wonder where they got that idea), and goes on to herald the “major medical and scientific breakthrough” that:

the human genome is packed with at least four million gene switches that reside in bits of DNA that once were dismissed as “junk” but that turn out to play critical roles in controlling how cells, organs and other tissues behave

This is complete crap. Yet it’s nothing more than a paraphrasing of the line the ENCODE team were promoting. Same thing with a statement later on that “At least 80 percent of this [junk] DNA is active and needed.” You can blame the reporter if you want for incorrectly mixing in the “needed” part there, which is not something the studies asserted. But this is actually a perfectly logical conclusion to reach from the 80% functional angle the authors were pitching.

I don’t mean to pick too harshly on the ENCODE team here. They didn’t invent the science paper PR machine, nor are they the first to traffic in various levels of misrepresentation to make their story seem sexier to journals and the press. But today’s activities may represent the apotheosis of the form. And it’s too bad – whatever one thinks about the wisdom of the whole endeavor, ENCODE has produced a tremendous amount of data, and the both the research community and interested public would have benefited from a more sober and realistic representation of what the project did and did not accomplish.

Posted in ENCODE, NOT junk, publishing, science | Comments closed

The Glacial Pace of Change in Scientific Publishing

I was excited today when my Twitter stream started lighting up with links to an article titled “The Glacial Pace of Scientific Publishing: Why It Hurts Everyone and What We Can Do To Fix It“. Sounded right up my alley.

I was even more excited when I clicked and saw that it was written by Leslie Vosshall, a colleague who not only does amazing work, but has has always been extremely thoughtful when I’ve talked to her about things like scientific publishing.

Her diagnosis of the problem is spot on:

Why is it that in these days of instant information dissemination via blogs, Twitter, Facebook, and other social media sites, our scientific publishing system has ground to a medieval, depressing, counterproductive near-halt?

I could not agree more. Consider that most papers submitted to journals last November 26th have still not been published. That’s not a random date – it happens to be the day NASA launched an Atlas rocket carrying the Mars Scientific Laboratory from Cape Canaveral.

While, on Earth, scientific papers were languishing in editorial purgatory and peer review, bouncing back and forth while authors attempted to cater to some reviewer’s whim, maybe went to another journal, and then sat around in production for months while the awaited online publication, an SUV-sized robot made its way to another planet, landed with pinpoint accuracy on the surface and started beaming back pictures.

NASA 1. Publishing 0.

Leslie starts her proposal for how to fix the problem with a crucial observation:

Scientific publishing is an enterprise handled by scientists for scientists, which can be fixed by scientists.

Again, spot on. Far too often scientists treat the myriad problems in scientific publishing as if they are some kind of externally applied force within which we are doomed to eternally labor in some kind of Sisyphian punishment ritual, when in reality, the system is precisely what WE make it.

Having eloquently dissected the problem, and recognized that fixing it is well within our power, I dared to hope that Leslie would come to the same conclusion that I have – that the whole way we go about publishing papers is crazy and needs to be reinvented from the ground up.

Instead she takes the “mend it, don’t end it” approach, proposing a series of fixes – or really a set of guiding aphorisms for authors (calibrate and accept rejection), editors (triage judiciously, seek advice and be decisive) and reviewers (advise honestly and promptly).

The piece is thoughtful and constructive. Everything she says is spot on, and if people listened the process of scientific publishing would be more productive, less unpleasant, and even a bit faster.

But it still would not be fast.

Would it be better if things we published in 3 months instead of 6, 9 or 12? Sure. Would it be better if authors didn’t have to run the gauntlet of reviewer “suggestions” and navigate the whim of a capricious editor to get their work published? Sure. But does a 3 month publishing process, no matter how congenial, really measure up in an era of instant communication?

If you believe as Leslie clearly does, and I do, that delays in publication are bad for science, then you should strive not to minimize them, but to eliminate them.

In a world in which technology makes it possible to share information instantly, there is no need to brook ANY delay in publication. When I have a piece of work from my lab that I am ready to share with my colleagues, I should be able to share it. Immediately. To paraphrase Clay Shirky: Publishing is not a process. Publishing is a button.

The major obstacle to achieving this goal is not efficiency of pre-publication peer review, but that we do it at all.

I am not proposing that we do away with peer assessment and editorial selection. Just that the order of events be reversed – moving from the current “assess then publish” to “publish then assess”. I’ve written before about how I think this could work (see Peer review is f***ed up – let’s fix it), and I won’t repeat those details here. And there are plenty of other people out there with great ideas about how we can retain most of the benefits of peer review while ceasing to use it as a publishing screen. Any of them would be immeasurably better than the system we have now – even with Leslie’s reforms in place. We just have to harness the power scientists have to reshape the way we communicate our science and make it work.

And then we can dare to dream that the time it takes to publish a paper would not just be less than the nine months it takes for a rocket to get to Mars, but less than 14 minutes it takes for photos to make their way back to Earth.

 

Posted in publishing | Comments closed

Senators Boxer and Sanders fill Agriculture Bill with anti-GMO nonsense

Anastasoa Bodnar alerted me on Twitter to the following amendment to the Farm Bill which would have authorized states to require the labeling of genetically modified foods. As I’ve said before, I’m not opposed to providing consumers with accurate information about what they’re eating (emphases on accurate). But I am saddened to see this amount of scientific misinformation reach the US Senate, including references to bogus and/or irrelevant studies and a evident lack of understanding of the technology and its value. It didn’t pass, but here’s the text:

SA 2256. Mr. SANDERS (for himself and Mrs. BOXER) submitted an amendment intended to be proposed by him to the bill S. 3240, to reauthorize agricultural programs through 2017, and for other purposes; which was ordered to lie on the table; as follows:

On page 1009, after line 11, add the following:

SEC. 12207. CONSUMERS RIGHT TO KNOW ABOUT GENETICALLY ENGINEERED FOOD ACT.

(a) Short Title.–This section may be cited as the “Consumers Right to Know About Genetically Engineered Food Act”.

(b) Findings.–Congress finds that–

(1) surveys of the American public consistently show that 90 percent or more of the people of the United States want genetically engineered or modified foods to be labeled as such;

(2) a landmark public health study in Canada found that–

(A) 93 percent of pregnant women had detectable toxins from genetically engineered or modified foods in their blood; and

(B) 80 percent of the babies of those women had detectable toxins in their umbilical cords;

First of all, several people have pointed out that the methods used in this study are not reliable. But even if they were, there is no evidence that the the natural bacterial insecticide in question has any negative effects on people. This is just alarmist nonsense being passed off as justification for legislation. 

(3) the tenth Amendment to the Constitution of the United States clearly reserves powers in the system of Federalism to the States or to the people; and

(4) States have the authority to require the labeling of foods produced through genetic engineering or derived from organisms that have been genetically engineered.

(c) Definitions.–In this section:

(1) GENETIC ENGINEERING.–

(A) IN GENERAL.–The term “genetic engineering” means a process that alters an organism at the molecular or cellular level by means that are not possible under natural conditions or processes.

Here’s some other things that alter an organism at the molecular and cellular level by means that are not possible under natural conditions or processes: fertilization, weeding, ploughing, mechanical irrigation…

(B) INCLUSIONS.–The term “genetic engineering” includes–

(i) recombinant DNA and RNA techniques;

(ii) cell fusion;

(iii) microencapsulation;

(iv) macroencapsulation;

No idea what this has to do with GMOs. 

(v) gene deletion and doubling;

(vi) introduction of a foreign gene; and

(vii) changing the position of genes.

(C) EXCLUSIONS.–The term “genetic engineering” does not include any modification to an organism that consists exclusively of–

(i) breeding;

(ii) conjugation;

(iii) fermentation;

(iv) hybridization;

(v) in vitro fertilization; or

(vi) tissue culture.

Because that’s possible under natural conditions….

(2) GENETICALLY ENGINEERED AND GENETICALLY MODIFIED INGREDIENT.–The term “genetically engineered and genetically modified ingredient” means any ingredient in any food, beverage, or other edible product that–

(A) is, or is derived from, an organism that is produced through the intentional use of genetic engineering; or

(B) is, or is derived from, the progeny of intended sexual reproduction, asexual reproduction, or both of 1 or more organisms described in subparagraph (A).

(d) Right to Know.–Notwithstanding any other Federal law (including regulations), a State may require that any food, beverage, or other edible product offered for sale in that State have a label on the container or package of the food, beverage, or other edible product, indicating that the food, beverage, or other edible product contains a genetically engineered or genetically modified ingredient.

(e) Regulations.–Not later than 1 year after the date of enactment of this Act, the Commissioner of Food and Drugs and the Secretary of Agriculture shall promulgate such regulations as are necessary to carry out this section.

(f) Report.–Not later than 2 years after the date of enactment of this Act, the Commissioner of Food and Drugs, in consultation with the Secretary of Agriculture, shall submit a report to Congress detailing the percentage of food and beverages sold in the United States that contain genetically engineered or genetically modified ingredients.

 

Posted in GMO | Comments closed