Friday, March 15, 2013

On the Meaning of the Word "Function"

A lot of the debate over ENCODE's publicity campaign concerns the meaning of the word "function." In the summary article published in Nature last September the authors said, "These data enabled us to assign biochemical functions for 80% of the genome ...." (The ENCODE Project Consortium, 2012).

Here's how they describe function.

Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure).
What, exactly, do the ENCODE scientists mean? Do they think that junk DNA might contain "functional elements"? If so, that doesn't make a lot of sense, does it?

Ewan Birney tried to address this definitional morass on his blog [ENCODE: My own thoughts] where he says ....
It’s clear that 80% of the genome has a specific biochemical activity – whatever that might be. This question hinges on the word “functional” so let’s try to tackle this first. Like many English language words, “functional” is a very useful but context-dependent word. Does a “functional element” in the genome mean something that changes a biochemical property of the cell (i.e., if the sequence was not here, the biochemistry would be different) or is it something that changes a phenotypically observable trait that affects the whole organism? At their limits (considering all the biochemical activities being a phenotype), these two definitions merge. Having spent a long time thinking about and discussing this, not a single definition of “functional” works for all conversations. We have to be precise about the context. Pragmatically, in ENCODE we define our criteria as “specific biochemical activity” – for example, an assay that identifies a series of bases. This is not the entire genome (so, for example, things like “having a phosphodiester bond” would not qualify). We then subset this into different classes of assay; in decreasing order of coverage these are: RNA, “broad” histone modifications, “narrow” histone modifications, DNaseI hypersensitive sites, Transcription Factor ChIP-seq peaks, DNaseI Footprints, Transcription Factor bound motifs, and finally Exons.
That's about as clear as mud.

We all know what the problem is. It's whether all binding sites have a biological function or whether many of them are just noise arising as a property of DNA binding proteins. It's whether all transcripts have a biological function or whether many of those detected by ENCODE are just spurious transcripts or junk RNA. These questions were debated extensively when the ENCODE pilot project was published in 2007. Every ENCODE scientist should know about this problem so you might expect that they would take steps to distinguish between real biological function and nonfunctional noise.

Their definition of "function" is not helpful. In fact, it seems deliberately designed to obfuscate.

Let's see how other scientist interpret the ENCODE results. In a News & Views article published in Nature last September, Joseph R, Ecker (Salk Institute scientist) said ...
One of the more remarkable findings described in the consortium's 'entre&eacute:' paper is that 80% of the genome contains elements linked to biochemical function, dispatching the widely held view that the human genome is mostly 'junk DNA.'
That makes at least one genomics worker who thinks that "biochemical function" and junk DNA are mutually exclusive.

Recently a representative of GENCODE responded to Dan Graur's criticism [On the annotation of functionality in GENCODE (or: our continuing efforts to understand how a television set works)]. This person (JM) says ...
Q1: Does GENCODE believe that 80% of the genome is functional?

As noted, we will only discuss here the portion of the genome that is transcribed. According to the main ENCODE paper, while 80% of the genome appears to have some biological activity, only “62% of genomic bases are reproducibly represented in sequenced long (>200 nucleotides) RNA molecules or GENCODE exons”. In fact, only 5.5% of this transcription overlaps with GENCODE exons. So we have two things here: existing GENCODE models largely based on mRNA / EST evidence, and novel transcripts inferred from RNAseq data. The suggestion, then, is that there is extensive transcription occurring outside of currently annotated GENCODE exons.
There's another scientist who thinks that 80% of the genome has some biological activity in spite of the fact that the ENCODE paper says it has "biochemical function." I don't think "biological activity" is compatible with "junk DNA," but who knows what they think?

Since this person is part of the ENCODE team, we can assume that at least some of the scientists on the team are confused.

The Sanger Institute (Cambridge, UK) was an important player in the ENCODE Consortium. It put out a press release on the day the papers were published [Google Earth of Biomedical Research]. The opening paragraph is ...
The ENCODE Project, today, announces that most of what was previously considered as 'junk DNA' in the human genome is actually functional. The ENCODE Project has found that 80 per cent of the human genome sequence is linked to biological function.
It looks like the Sanger Institute equates "biochemical function" and "biological function" and it looks like neither one is compatible with junk DNA.

I think the ENCODE leaders, including Ewan Birney, knew exactly what they were doing when they defined function. They meant "biological function" even though they equivocated by saying "biochemical function." And they meant for this to be interpreted as "not junk" even though they are attempting to backtrack in the face of criticism.


The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. (E. Birney, corresponding author)