It is NOT junk

Friday, September 12, 2008

Congressional testimony on NIH Public Access repeal effort

The House Judiciary Committee held hearings today on the newly introduced "Fair Copyright and Research Works Act (H.R. 6845)," which is a publisher-promoted effort to repeal the NIH Public Access Policy.

Peter Suber's Open Access News discusses the hearings here and here. Karen Rustad at Little Green River has a great post about the hearing.

I'm still reading and watching the testimony from the good guys: NIH Director Elias Zerhouni and SPARC's Heather Joseph, and the bad guys: GWU Law Professor Ralph Oman and the APS's Marty Frank. I'll have more to say (and if the video is good, maybe some attack ads) later.

Now I disagree with Marty Frank on almost everything: he's short-sighted, often misleading and places the interests of his journal over the broader scientific good. However, I understand why he was invited to testify - he has become the chief spokesman (some would say apologist) for the anti-open access publishers.

But I have to say when I saw the list of witnesses, my first reaction was "Who the hell is Ralph Oman?" OK, I'm not a copyright lawyer, and maybe he's well-known in those circles. But as far as I can tell he's never been interested in science publishing.

So, of course, I googled him. And here's the top hit. A list of campaign contributions he's made. These lists are fascinating. Oman is clearly no Democrat. He gave money to Bill Frist, Henry Hyde and even Katherine Harris when she ran for Congress!! So it's curious that he also gave $500 to John Conyers, head of the House Judiciary Committee who is holding this hearing. Hmm. I wonder why he was invited.... Are our representatives really this cheap?

Wednesday, September 10, 2008

Identifying individuals in "anonymous" genetic studies

Most people who participate in genetic studies do so with the expectation that their participation - and more importantly their phenotype - will be anonymous. To preserve this anonymity, raw data (individual's genotypes and phenotypes) are not made publicly available. However, to enable validation and further research, pooled data - the average allele frequencies in cases and controls - have been made available through public databases like dbGaP.

But a really cool new paper in PLoS Genetics demonstrates that if you know an individual's genotype you can actually figure out whether they participated in a particular study. This may seem counterintuitive, but if you think about it for a sec it makes sense. An individual's inclusion in a dataset leaves a fingerprint - in terms of shifting the allele frequencies in the direction of their particular genotype. Obviously, if you only have a small number of genotypes this is meaningless. But if you have 500,000+, as most modern genotyping platforms do, the authors show that you can essentially just count up the number of times the pool's allele frequencies diverge from the expected allele frequencies in the direction of an individual's genotype. If this number is significantly higher than expected by chance, it is very likely that the individual was part of the pool.

A cute trick, no? But are there any practical implication? Clearly yes. As more and more people get whole genome genotyping from companies like 23andMe, Navigenics or DecodeMe (full disclosure - I am an advisor to 23andMe), and as many people start to share their genotypes - either intentionally or unintentionally - it would be theoretically possible for someone to take that person's genotype and scan all existing genome-wide association studies to see if they participated. And if they haven't had their genotype done, and someone else REALLY wanted to know if they participated in a study, that someone could steal a piece of hair and pay to have it genotyped. (It's not discussed in the paper, but I bet you could use a sibling, parent or even perhaps a more distant relative and get a similar answer - although presumably with less certainty).

Surprisingly, the paper has received little notice in the popular press. Bit it's created quite a stir in the human genetics community. The National Institutes of Health immediately shut off public access to its genome-wide association data, and urged others with similar data to follow suit. This is a rather shocking reversal for a community that had been pushing the open availability of these data.

It's really rather amazing that no one thought about this before. There are a lot of very bright people involved in human genetic mapping, yet none of them realized that individuals could be relatively easily "unpooled". I bet there are a lot of quantitative geneticists kicking themselves. And I hope some of them are working on a way around this.

Interestingly, the authors seem more interested in the forensics angle here. They offer up their method as a way to tell if a particular individual was in a room, handled a weapon, or anything else where a lot of different people might have left their DNA in the same place. You can see where this is going - get an individual's genotype and you can trace them all over the world.

Scientists cynical use of "Junk DNA"

This blog - like many others I presume - was started to give me a place to vent about a pet peeve. The target of my particular ire is the way that scientists who should know better continue to tout every new paper on the function of non-coding DNA as a new discovery that - GASP - "junk DNA is not really junk afterall".

The latest example surrounds a paper from my friends and former neighbors Jim Noonan, Shyam Prabhakar and Eddy Rubin published last week Science (I won't link the paper because it's not in an open-access journal - another pet peeve...).

Prabhakar et al. Human-specific gain of function in a developmental enhancer. Science 321(5894):1346-50.

The paper reports on a the discovery of a conserved noncoding sequence (named HACNS1) that acts as a developmental enhancer and has evolved extremely rapidly in humans and has gained a strong limb expression domain relative to the orthologous elements from chimpanzee and rhesus macaque. It's a beautiful piece of work that has both intriguing implications for human evolution and will serve as a paradigm for similar studies in the future.

What bothers me is not the paper, but the press release that accompanied it. Here's the headline and beginning:

Yale Researchers Find "Junk DNA" May Have Triggered Key Evolutionary Changes in the Thumb and Foot.

New Haven, Conn. — Out of the 3 billion genetic letters that spell out the human genome, Yale scientists have found a handful that may have contributed to the evolutionary changes in human limbs that enabled us to manipulate tools and walk upright.
Results from a comparative analysis of the human, chimpanzee, rhesus macaque and other genomes reported in the journal Science suggest our evolution may have been driven not only by sequence changes in genes, but by changes in areas of the genome once thought of as “junk DNA.”

So here's a fascinating observation about genome evolution, and yet they feel compelled to - once again - peg the story on the discovery that there is actually something going on in "areas of the genome once though of as 'junk DNA'".

Of course Noonan and colleagues know better. They work on non-coding DNA precisely because they know it is NOT junk. So why, when it's time to make a pitch to the local press officer, do they fall back on this old bromide? It obviously appeals to writers - who love it when they can pitch a story as overturning orthodoxy. It seems minor, but pegging it this way leads to some really attrocious misrepresentations of current biological knowledge.

Here are some headlines on news stories that followed the press release:

Who Says It's 'Junk DNA'? (Hartford Cuorant)

Enjoy Your Opposable Thumb? Thank your "Junk DNA" (Discover Magazine)

and my personal favorite

Meaningless Genetic Code Helped Form Human Hands (Telegraph)

Why is this such a problem? Well, first it's just WRONG. We've known almost since the dawn of the DNA age that not all DNA is protein-coding, and that there are essential functions encoded in non protein-coding DNA. Unfortunately, for initially practical reasons, a disproportionate amount (surely in excess of 90%) of research has focused on protein-coding genes, fostering the faulty impression - amongst scientists as well as science writers - that the ~3% of the human genome that is protein-coding contains > 90% of the function. And it would be great if scientists who, because they work on non-coding DNA are particularly aware that this view is incorrect, would stop promoting it in the popular press.

A second, and less obvious, problem is that this view has played into the hands of the intelligent design crowd. For reasons that baffle me, smart scientists continue to cite the "fact" that much of the human genome is non-functional as evidence against intelligent design. And every time a new study comes out reporting that "junk DNA" is not junk, the ID'ers jump on it as validation of the predictions of ID. It's hooey of course, but we needn't give them the opportunity.

So, I am making it my mission to shame everyone who uses the term "junk DNA" or its equivalent when talking about new research on the function of non-coding DNA. And every year I'm going to give out "JUNKY" awards to the most egregious examples. Send me your candidates!

Tuesday, September 9, 2008

Legislative threat to NIH public access policy

Two Democratic congressmen (Howard Berman or CA-28 and John Conyers, Jr. of MI-14) are planning to introduce a bill into the US House of Representatives that would effectively kill the NIH Public Access Policy. They are responding to complaints (and donations) from the American Association of Publishers (the lobbying wing of the journal publishers).

The bill - Owellianly titled "Fair Copyright in Research Works Act" - would make it illegal for any federal agency to require grantees to transfer to the federal government any aspect of the rights given to authors under copyright. Oddly, this legislation follows on the highly dubious assertion by the AAP that the current NIH policy violates copyright law. But by attempting to modify copyright law to make this transfer illegal, the AAP is saying that it currently is legal.

My sources tell me that this bill is unlikely to pass, but supporters of open access and the NIH public access policy should write the their representatives, especially is they are on the House Judiciary Committee. Hearings are apparently scheduled for September 11th.