Our new paper in PLoS One: why we should sequence big genomes

We have a new paper out in PLoS One that I think is particularly cool:

Peterson BK*, Hare EE*, Iyer VN, Storage S, Conner L, et al. (2009) Big Genomes Facilitate the Comparative Identification of Regulatory Elements. PLoS ONE 4(3): e4688. 

The paper is about species with big genomes – in particular species from the Dipteran family Tephritidae – a group of flies distantly (~150 million years) related to Drosophila. Tephritids are economically important – several species, most famously the Medfly Ceratitis capitata, are serious agricultural pests. But what’s interesting about them here is that they have big genomes – significantly bigger than your typical Drosophila species (450 to 850 Mb as compared to 175 Mb). 

No one has yet sequenced a tephritid genome (although we are working on that now). But as part of a project to study the evolution of fly regulatory sequences (see also this paper), we sequenced a few developmentally important loci from each of these tephritid species. When the two (really fantastic) students who did this work (Brant Peterson and Emily Hare) got these sequences, they did the natural thing, and aligned them to each other. The result of this comparison is the heart of the paper:

drop_teph_vert1

The three panels show plots of conservation (computed using PhastCons) across orthologous Drosophila and tephritid loci, and a vertebrate locus for comparison (the Drosophila and vertebrate analyses usesets of species that are at equivalent molecular distances as the tephritid species we analyzed – all plots are on the same scale – protein-coding genes are marked with blue boxes).

The point should be clear. The tephritid comparison looks much more like the vertebrate comparison than the Drosophila comparison.

Why is this important? Because for over a decade genome comparisons like those shown above have been successfully used to identify regulatory sequences in vertebrate genomes. Comparisons of the human, mouse, dog, chicken and fish genomes, for example, have identified thousands of discrete blocks of conserved non-coding sequences flanked by large stretches of rapidly evolving DNA – known generally as conserved non-coding sequences (CNSs). When assayed in mice, many of these CNSs turn out to be transcriptional enhancers that control gene expression during development. 

The success of such methods in vertebrates inspired the sequencing of the Drosophila pseudoobscura genome – in the hope that comparisons of D. melanogaster and D. psuedoobscura genomes would lead to the rapid identification of  enhancers across these genomes. But, outside of a few isolated cases, it didn’t happen. And the reason should be clear from the picture shown above: unlike in vertebrates, where  islands of conservation stand out against a backdrop of rapidly evolving non-coding DNA, in Drosophila virtually all enhancer-sized chunks of non-coding DNA is conserved. Thus in Drosophila, while one can conclude that most non-coding DNA is functional, it is essentially impossible to determine where one regulatory sequence ends and the next begins. 

Why the difference between vertebrates and Drosophila? It has been generally assumed that the difference arises from fundamental differences in the way that non-coding DNA is organized in vertebrates and invertebrates. But the tephritids show that this is not the case. The landscape of non-coding sequence conservation in tephritids looks like that of vertebrates – blocks of conserved DNA flanked by areas of rapidly evolving sequence. Drosophila looks different not because its non-coding sequences function in some less complex way than those of vertebrates – but rather because most of the rapidly evolving non-coding DNA found in other species has been deleted. The differences between Drosophila and vertebrates are real – but they are a product of Drosophila’s small genome, not of some essential difference in the structure structure and function of Drosophila and vertebrate genomes.

In part because Drosophila is such a high-profile invertebrate, many people extrapolated from vertebrate-Drosophila differences to conclude that there are fundamental differences in the organization of vertebrate and invertebrate genomes. This view has been immensely strengthened by a pervasive bias against sequencing large invertebrate genomes. Since most sequenced invertebrate genomes are small, the shared properties of small genomes – such as densely packed functional elements in non-coding DNA – have been mistakenly assumed to be shared properties of all invertebrate genomes.

Our data form the tephritids shows that this is not the case, as these large invertebrate genomes look – at least in this regard – like vertebrates. So, until we sequence more large invertebrate genomes, we are going to have a very biased view of the evolution of genome structure.

There’s also a much more practical consequence of our observations about tephritid genomes. As we show in our paper, many of the tephritid CNSs we identify function as enhancer in D. melanogaster embryos – this even though the species diverged around 150 million years ago. And, what’s more, we showed that despite extensive sequence divergence, we can map tephritid CNSs to the D. melanogaster genome, and that the tephritid CNSs drive expression patterns that are identical or very similar to those driven by the D. melanogaster sequences to which the tephritid CNSs map. 

This immediately suggests that we can use comparisons of multiple tephritid species – and our mapping methods – to systematically annotate regulatory sequences in Drosophila genomes – something that comparisons of Drosophila species has not so far permitted. 

There’s much more detail in the paper. Read it! And more importantly – comment on it, trash it, ask questions using the PLoS One commenting system. PLoS One is all about building a viable system of post-publication peer review – and we need you to participate to make it work.

Posted in evolution, genome size, NOT junk | Comments closed

Is John Conyers Shilling for Special Interests? [HuffPost]

http://www.huffingtonpost.com/lawrence-lessig-and-michael-eisen/is-john-conyers-shilling_b_171189.html

Posted in politics, science and politics | Comments closed

I love Anna Eshoo!!

A friend posted this letter on Facebook:

February 27, 2009

Dear ——-,

Thank you for writing to me about H.R. 801, the Fair Copyright in Research Works Act. 

As you may know, Congress directed that all NIH funded studies be available for free online. H.R. 801 would effectively overturn this important and much needed policy by prohibiting agencies from making the results of tax-payer funded research free to the public. 

I support open access of all federally funded research because it fosters innovation, collaboration and advances scientific discovery. This legislation is not in the best interest of the taxpayers who fund research or the scientific community and if H.R. 801 comes to the floor for a vote, I will vote against it.

If you have any other questions or comments, let me hear from you. I value what my constituents say to me because I need your thoughts and benefit from your ideas.

I’ve created an ongoing e-newsletter to keep constituents informed on a variety of congressional issues and legislation. Many constituents tell me how much they value reading it, and if you would like to as well, you can go to my website athttp://eshoo.house.gov and click on “E-Mail Sign-Up.” Your email address will never be used by anyone except my office to communicate with you, and your tax dollars will be conserved by using electronic communications rather than traditional mailings.

Sincerely,
Anna G. Eshoo
Member of Congress

Posted in open access, science and politics | Comments closed

Stem Cells: The Future of Skin Rejuvination

As if there wasn’t enough stem cell hype already….

img0764

Posted in misc stuff, science | Comments closed

Evans and Reimer greatly underestimate effect of free access

A lot is being made of a new paper (no link – I only link to articles that are freely available) by University of Chicago sociologist James Evans in which he analyzes citation and free access histories of journals in various disciplines, and concludes that the effect of free access (which he mistakenly calls open access) on citations is much more modest than previously estimated.

This paper – and the response to it – has many flaws. A few of the most egregious:

1) The analysis shows that there are a lot of scientists out there benefiting from free access 

First, the authors and a lot of people responding to this paper seem to assume that the modest increase in the number of citations arising from the transition to free access is somehow an argument against free access. This is silly. Even if free access didn’t change citation numbers at all, it would still be an unambiguously good thing for a wealth of other reasons that I won’t rehash here.

The reason that open access opponents are so excited by the supposed conclusions of this paper are that a small citation increase associated with free access bolsters one of their favorite tropes – that all the important people (i.e. people who might eventually cite your paper) already have access to it because they are affiliated with a major research university in the developed world. If this were true journals making their contents freely available would have very little effect on which articles they read and cite. 

But even this paper – now being cited as evidence that free access is unimportant – reports at least an 8% increase in the number of citations associated with free access – suggesting that there are a significant number of active researchers out there who are getting access to articles only because they are available freely online. This may sound like a small number, but collectively across the global scientific community we’re talking about tens or hundreds of thousands of scientists.

Do the publishers really want to argue that even this modest increase in citations is unimportant? If so, I’ll remind them of it next time they issue a press release touting the 5% increase in their impact factor…

2) The 8% number comes from analysis of a lot of old articles. If you look only at articles that become freely online within two years of publication, the increase in citations is 20%. 

If you look at the online supplement for the article, you’ll see that it analyzes a lot of old literature. Very small amounts of old literature are cited, so it’s unsurprising that free access to this literature has little effect on its citation. However, if you look at their supplemental Figure S1(c), you see that the effect on more recently published articles – the bulk of what they analyze – is MUCH larger:

Analysis of effect of transition to free access as a function of time since publication.

So, for articles that are less than 2 years old, the effect is close to 20%. And the curve is clearly rising as you get to shorter time frames. Doing a little extrapolation it looks like the effect of immediate free access should be at least 50%. 

3.) The raw data for the paper are not available to confirm and/or reanalyze the authors’ claims

 Where the hell is the data for this paper? I’d love to look at the validity of their analyses and to do some of my own. But, whoops, I can’t. Because the data are private, and not provided anywhere by the authors or Science. This is even though Science‘s own publication policy makes it clear that:

After publication, all data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science

Are we really supposed to take seriously a paper presenting a bunch of complex analyses of data that the authors and the journal don’t make available?

Posted in open access | Comments closed

Great op-ed on Conyer’s Bill

Misunderestimating open science

By James Boyle

Published: February 24 2009 02:19 | Last updated: February 24 2009 02:19

It is hard for politicians to do anything that would shock me but I have to say that John Conyers, a US Congressman, has done it. In the process, he has taught us a lot about how far we have to go, all over the world, before we get our science policy right. Since science and technology are major engines of growth, that is a point of pressing interest for governments everywhere.

Rep. Conyers has introduced a bill, misleadingly called the ”Fair Copyright in Research Works Act,” that would eviscerate public access to taxpayer funded research. The bill is so badly drafted that it would also wreak havoc on federal information policy more generally. It is supported by the commercial science publishers, but opposed by a remarkable set of groups — ranging from the American Research Libraries, to 33 Nobel Prize Winners, to a coalition of patients’ rights organizations.

[continues here]

Posted in open access, science and politics | Comments closed

Unfair attacks on Sandlers

There’s a very good article from a respected bank analyst on the Sandlers that shows how the 60 Minutes attack was totally unwarranted.

Posted in misc stuff | Comments closed

Herb and Marion Sandler and the Financial Crisis

I just received an email from Herb Sandler:

Dear friends

We are living in a benighted time. It is hard to believe that the attacks on Marion and me, and the company, are really happening. As you know, we were one of the very few companies which was monomaniacally  focused on loan quality, doing what’s right for the customer, and operating at all times with the utmost integrity. Unfortunately, there is much pain out there, and the media and those with hidden and not so hidden agendas are looking for scapegoats. It’s especially difficult to be accused of practices which we abhor and fought against. The factual misstatements and false innuendos are astonishing.

We have created a web site which sets forth facts about the company’s history. You may link on at

www.goldenwestworld.com  

We will update it from time to time and if you have any suggestions or corrections, please send them to me or

goldenwestworld@gmail.com 

Herb and Marion

The background for this email is a series of attacks – beginning with a skit on Saturday Night Live – that all but accuse the Sandlers’ of triggering the mortgage crisis (you can read about some of this on their Wikipedia page). I know very little about the mortgage industry, so I will let the Sandlers website speak for itself. But I will say that I have known the Sandlers for many years and consider them to be amongst the most decent and honorable people I have met.

I came to know the Sandlers because of the support their family foundation gave to the Public Library of Science (which I co-founded) during a critical time in its growth. Prior to our initial meeting, I researched the Sandler’s company – Golden West Financial (whose sale to Wachovia is at the center of the attacks on them). I was astonished at how well-regarded the company and the Sandlers were – as entrepreneurs, as executives, as employers, and as people. Not the kind of reputation usually possessed by bankers. And in all of my interactions with them I found them to be tough, hardnosed, practical and wise. I respected their opinions, even when they were taking us to task. And above all else, I always found them to be exceptionally good people (and lest you think I am kissing up to donors – PLoS no longer receives money from the Sandlers, nor have we asked for support for the future). 

So, if any of you saw the SNL piece or have read things trying to scapegoat the Sandlers for the current financial mess, or if you are just interested in learning more about the financial mess we are currently in, I urge you to read the Sandlers’ website. 

Posted in misc stuff, PLoS | Comments closed

Stats on Commenting in PLoS One

Euan Adie has a great post over at Nascent (Nature’s web tech blog) about commenting in PLoS One. My thoughts on it later, but it’s definitely worth checking out what he’s done.

Posted in open access, PLoS | Comments closed

How should the NIH spend its stimulus money?

Steve Quake has an interesting post on Olivia Judson’s blog (Quake is a guest columnist while she is on sabbatical) about what life is like for a scientist at a modern research university. The interesting stuff is at the end, when he talks about how labs are funded. It’s a particularly important time to think about how we spend money in science, given that the economic stimulus bill working its way through Congress contains billions of dollars for the NIH, a substantial part of which would go to the NIH extramural program.

Here is Quake’s discussion: 

You may have noticed that one of my lifelines actually came from the N.I.H. — an agency not known for taking risks. I could write pages about the last presidential administration’s disastrous approach to science. However, for whatever reason (and I suspect it was dumb luck: the exception that proves the rule) George W. Bush appointed an N.I.H. director who was both visionary and an adept leader — Elias Zerhouni. Dr. Zerhouni changed the process for awarding grants, which had become inbred and conservative. Among other steps, he created a series of special awards — for “Pioneers” and “Innovators” — to fund highly risky research, and it is one of these that I was the recipient of.

As we think about how to heed President Obama’s call to “put science back in its rightful place,” I wonder if this should also be the time to rethink the basic foundations of how science is funded. Could we stimulate more discovery and creativity if more scientists had the security of their own salary and a long-term commitment to a minimal level of research support? Would this encourage risk-taking and lead to an overall improvement in the quality of science?

As we consider the monumental challenges facing our generation — climate change, energy needs and health care — and look to science for solutions, it would behoove us to remember that it is almost impossible to predict where the next great discoveries will be made — and thus we should invest broadly and let scientists off their leashes.

What, however, does it really mean to let scientists off their leashes? And how do we accomplish it?Given the pending infusion of at least $1.5 billion from the stimulus package to the NIH extramural budget, this is a particularly timely question.

From a standpoint of economic stimulus,  the important thing is simply to fund more research. There is a large pool of talented biomedical researchers, and the funding crunch in the last few years means that there is a large reservoir of projects that are ready to go, which will lead immediately to new, good jobs that will help stabilize the economy.

But it’s interesting that the House justified this spending as follows:

$1.5 billion for expanding good jobs in biomedical research to study diseases such as Alzheimer’s, Parkinson’s, cancer, and heart disease – NIH is currently able to fund less than 20% of approved applications.

The House is citing that 20% figure in part to show that there are lots of projects that can be funded immediately – which is what the economy needs. But there’s also a clear implication that pumping more money in to biomedical research will make it easier for scientists to get funded. Unfortunately, recent history shows that this is not necessarily true. 

The NIH budget effectively doubled during the Clinton administration. In most ways this was a fantastic thing for biomedical science. It eliminated a huge funding bottleneck that was leaving many very talented scientist out in the cold, and helped new faculty (like me) to get stable funding early in their careers. However, as has been pointed out elsewhere, this increase in funding also led to a big increase in the number of scientists, as universities invested in biomedical research infrastructure and hired many new faculty to fill their shiny new buildings.

Because of this hiring binge, it did not become easier for scientists to get grants. The number of applicants for grants simply increased – and continued to increase even after the budget increases stopped. This led, somewhat  paradoxically, to a smaller fraction of grants getting funded after the doubling than before.

Given that experience, it seems prudent to think about how we can ensure that as many worthy projects get funded as possible without creating perverse incentives for universities to hire more biomedical researchers than the system can support. From an economic perspective, it’s not clear that it’s substantially better to increase the number of labs that are getting NIH support than it is to increase funding to existing labs – both lead likely lead to proportionally more students and staff and expenditures on research supplies and equipment.

There are obviously a lot of good scientists with worthy projects whose grants have not been funded. But funded scientists also suffer in current conditions, as effective funding levels are lower, and the amount of time required to get funding has significantly increased. So there’s a choice to be made between directing new money to people currently outside of the system and increasing funding to existing grants and grant-holders. (Full disclosure – I am funded by the NIH). The former spreads the wealth and encourages people to go in to science – all worthy goals – but will not alter the funding landscape from the perspective of individual labs. The latter is arguably less equitable, but, by relieving some of the funding tension from existing labs,  is more likely to unleash scientists in the way that Quake so rightly advocates.

Posted in politics, science, science and politics | Comments closed