Friday, September 27, 2013

The Extraordinary Human Epigenome

We learned a lot about genes and gene expression in the second half of the 20th century. We learned that genes are transcribed and we have a pretty good understanding of how transcription initiation complexes are formed and how transcription works.

We learned how transcription is regulated through promoter strength, activators, and repressors. Activators and repressors bind to DNA and those binding sites can lie at some distance from the promoter leading to formation of loops of DNA that bring the regulatory proteins into contact with the transcription complex. Much of our basic understanding of this process was derived from detailed studies of bacteriophage and bacterial genes.

THEME:
Transcription

Later on we learned that eukaryotic genes expression was very similar and regulation also required repressors and activators. We discovered that gene expression was associated with chromatin remodeling that opened up regions of the chromosome that were tightly bound to histones in 30nm or higher order structures.

Building on studies in prokaryotes, we learned about temporal gene regulation and differentiation. Much of the work was done in model organisms like Drosophila, yeast, C. elegans, and various mammalian cells in culture.

By the end of the century I was pretty confident that what I wrote in my textbook was a fair representation of the fundamental concepts in gene expression and regulation.

Turns out I was wrong as I just discovered this morning when I read the opening paragraph of a review by Rivera and Ren (2013). Here's what they say ...
More than a decade has passed since the human genome was completely sequenced, but how genomic information directs spatial- and temporal-specific gene expression programs remains to be elucidated (Lander, 2011). The answer to this question is not only essential for understanding the mechanisms of human development, but also key to studying the phenotypic variations among human populations and the etiology of many human diseases. However, a major challenge remains: each of the more than 200 different cell types in the human body contains an identical copy of the genome but expresses a distinct set of genes. How does a genome guide a limited set of genes to be expressed at different levels in distinct cell types?
Wow! The textbooks need to be rewritten! We didn't learn anything in the last century!

It took me the whole first paragraph of this paper to realize that the rest of it was probably going to be worthless unless you were interested in technical details about the field. That's because I'm not as smart as Dan Graur. He only read the title, "Mapping Human Epigenomes" and the abstract before concluding that the authors were speaking in newspeak1 [A “Leading Edge Review” Reminds Me of Orwell (and #ENCODE)].

The Rivera and Ren paper is a "Leading Edge" review in the prestigious journal Cell. It covers all the techniques used to study methylation, histone modification and binding, transcription factor binding, and nucleosome positioning at the genome level. According to the authors, people like me were fooled by studies on individual genes, purified factors, and in vitro binding assays. That didn't really tell us what was going on.

Apparently, the most effective way of learning about the regulation of gene expression in humans is to analyze the entire genome all at once and read off the data from microarrays and computer monitors. (After shoving it through a bunch of code.)
Overwhelming evidence now indicates that the epigenome serves to instruct the unique gene expression program in each cell type together with its genome. The word "epigenetics," coined half a century ago by combining "epigenesis" and "genetics," describes the mechanisms of cell fate commitment and lineage specification during animal development (Holliday, 1990; Waddington, 1959). Today, the "epigenome" is generally used to describe the global, comprehensive view of sequence-independent processes that modulate gene expression patterns in a cell and has been liberally applied in reference to the collection of DNA methylation state and covalent modification of histone proteins along the genome (Bernstein et al., 2007; Bonasio et al., 2010). The epigenome can differ from cell type to cell type, and in each cell it regulates gene expression in a number of ways—by organizing the nuclear architecture of the chromosomes, restricting or facilitating transcription factor access to DNA, and preserving a memory of past transcriptional activities. Thus, the epigenome represents a second dimension of the genomic sequence and is pivotal for maintaining cell-typespecific gene expression patterns.

Not long ago, there were many points of trepidation about the value and utility of mapping epigenomes in human cells (Madhani et al., 2008). At the time, it was suggested that histone modifications simply reflect activities of transcription factors (TFs), so cataloging their patterns would offer little new information. However, some investigators believed in the value of epigenome maps and advocated for concerted efforts to produce such resources (Feinberg, 2007; Henikoff et al., 2008; Jones and Martienssen, 2005). The last five years have shown that epigenome maps can greatly facilitate the identification of potential functional sequences and thereby annotation of the human genome. Now, we appreciate the utility of epigenomic maps in the delineation of thousands of lincRNA genes and hundreds of thousands of cis-regulatory elements (ENCODE Project Consortium et al., 2012; Ernst et al., 2011; Guttman et al., 2009; Heintzman et al., 2009; Xie et al., 2013b; Zhu et al., 2013), all of which were obtained without prior knowledge of cell-type-specific master transcriptional regulators. Interestingly, bioinformatic analysis of tissue-specific cis-regulatory elements has actually uncovered novel TFs regulating specific cellular states.
So, what are all these new discoveries that now elucidate what was previously unknown; namely, "how genomic information directs spatial- and temporal-specific gene expression programs"?

This is a very long review full of technical details so let's skip right to the conclusions.
Six decades ago, Watson and Crick put forward a model of DNA double helix structure to elucidate how genetic information is faithfully copied and propagated during cell division (Watson and Crick, 1953). Several years later, Crick famously proposed the "central dogma" to describe how information in the DNA sequence is relayed to other biomolecules such as RNA and proteins to sustain a cell’s biological activities (Crick, 1970). Now, with the human genome completely mapped, we face the daunting
task to decipher the information contained in this genetic blueprint. Twelve years ago, when the human genome was first sequenced, only 1.5% of the genome could be annotated as protein coding, whereas the rest of the genome was thought to be mostly "junk" (Lander et al., 2001; Venter et al., 2001). Now, with the help of many epigenome maps, nearly half of the genome is predicted to carry specific biochemical activities and potential regulatory functions (ENCODE Project Consortium, et al., 2012). It is conceivable that in the near future the human genome will be completely annotated, with the catalog of transcription units and their transcriptional regulatory sequences fully mapped.
I hope they hurry up. Not only do I have to re-write my description of the Central Dogma2 but I'm going to have to re-write everything I thought I knew about regulation of gene expression and the organization of information in the human genome. That's going to take time so I hope the epigeneticists will publish lots more whole genome studies in the near future so I can understand the new model of gene expression.

Keep in mind that this paper was published in Cell where it was rigorously reviewed by the leading experts in the field. It must be right.


[Image Credit: Moran, L.A., Horton, H.R., Scrimgeour, K.G., and Perry, M.D. (2012) Principles of Biochemistry 5th ed., Pearson Education Inc. page 647 [Pearson: Principles of Biochemistry 5/E] © 2012 Pearson Education Inc.]

1. Newspeak was first described in 1984 proving, once again, that George Orwell (Eric Arthur Blair) was a really smart and prescient guy. For another example see: What Is "Science" According to George Orwell?.

2. Apparently I didn't read the Crick (1970) paper as carefully as they did.

Rivera, C.M. and Ren, B. (2013) Mapping Human Epigenomes. Cell 155:39-55 [doi: 10.1016/j.cell.2013.09.011]