4.3Evolution of the Genome and the Epigenome
Compared to the genome of E. coli that Jacob and Monod used for their research, the human genome is extremely complex. There are about 4000 genes in E. coli and about 25,000 genes in humans. Although this is not a big difference, one human zygote becomes an adult with 60 trillion cells. Furthermore, in identical twins with nearly identical genomes, genes are regulated so accurately that the twins' phenotypic features are very similar.
Mapping of the human genome has led to the discovery of human DNA sequences and to the understanding of characteristics of genes that respond to changes in the environment. As explained in Chapter 3, in human genes, exons containing information about protein sequences are separated by many large introns (Fig. 4-6A). About 1.3% of human genes contain information about the structures of proteins.
Another characteristic of genes is that they are duplicated and can increase in number during the evolution of organisms. Clusters of "homeotic genes" are involved in forming the shapes of living organisms (Column in Section 2 of Chapter 5). If the homeotic gene cluster in Drosophila is considered 1 set, then there are 4 sets of such gene clusters in humans.
Fig. 4-6. Gene Diversity Produced by Splicing
(A) Structure of a human gene
(B) Activation of transcription. A transcription-related protein such as an activator or a repressor binds to a regulatory sequence called a "promoter," and RNA is synthesized. This RNA is then spliced to produce mature RNA containing only exons, and various proteins are made from this mature RNA.
Diversity Produced by Split Genes
As described above, exons containing information about protein structure are scattered throughout human genes and are split by many large introns. During gene expression, RNA polymerase is activated and transcribes RNA. First, a long RNA including introns is transcribed. Next, the introns are removed, forming mature RNA, which is involved in determining the structure of proteins. The process of removing introns is called "splicing."
Response to the environment requires many proteins. There are about 25,000 genes in the human genome, but this number is remarkably lower than what was first predicted.
In human cells, the genetic information for coding 1 protein is split by many introns. Actually, during splicing that occurs at the time of gene expression, the exons are used in combination, and consequently, a variety of proteins are made from a single gene. For example, the gene in Figure 4-6 has 3 exons, but 7 proteins can be made from exons 1, 2, 3, 1-2, 1-3, 2-3, and 1-2-3 during splicing (Fig. 4-6B). By this method, about 1 million proteins can be made from only 25,000 genes.
The mechanisms that produce the proteins involved in immunity is especially complex. For example, when a pathogenic bacteria or virus enters the body, proteins called "antibodies" are made in defense of the body. Although there are many types of pathogenic bacteria and viruses, the human body can produce more than 10 trillion types of antibodies to eliminate them. DNA recombination performs an important role during the production of antibody proteins (see Column in Section 3 of Chapter 9.)
Mechanism by which New Genes are Created
Amino acid sequences that form proteins are sometimes divided structurally and functionally into multiple units (domains). Various proteins may have been produced by multiple uses of these domains during evolution.
Genetic mutations affect the phenotypes of organisms, thus promoting their evolution. However, if only accumulation of random single-base substitutions in DNA are to produce a new protein with a function that is advantageous to an organism's survival, a tremendous amount of waste would occur due to sheer probability. It would be the same as when multiple dice are rolled and the numbers do not match as often as expected.
On the other hand, there are cases when the abovementioned gene structures that are divided into exon-intron structures play a part in creating new genes. When two genes recombine at the location of a pair of introns, the exons coded with information on amino acids do not change, but the DNA sequences flanking the exons are substituted (Fig. 4-7).
In other words, new exon-intron structure combinations can be created from fragmented gene structures. The newly created genes thus have structures and functions similar to the original genes before recombination, but can also be expected to have new and different functions. This phenomenon of making new genes by combining exon-intron structures is referred to as "exon shuffling." Domains that are necessary for the functions of proteins are shuffled like a deck of cards, and their redistributions create new combinations of functional proteins.
Fig. 4-7. Mechanism for Creating a New Gene by Exon Shuffling
(A) When recombination occurs at an intron, the DNA sequences flanking the exons become substituted, while the amino acid-coding information in the exons is retained. Consequently, new sequences of exon-intron structures are formed.
(B) Even genes that differ between one another often have amino acid sequences (domains) in common. These common domains are thought to have arisen from exon shuffling.
Redundancies Created by Duplicated Genes
Another characteristic of genomes is that as living organisms become more complex, they contain more "repeat sequences," which repeat the same base sequences. Repeat sequences constitute about 50% of the human genome. When these repeat sequences are present, recombination tends to occur, and the number of repeats tends to change. Genes between the repeat sequences are duplicated and increase in number during evolution, thus resulting in many copies of the same and similar genes (Fig. 4-8).
Fig. 4-8. Mechanism by which Repeat Sequences and Genes are Duplicated
When identical genes of a pair of homologous chromosomes are each surrounded by homologous repeat DNA sequences, the genes sometimes get mixed up and recombine. Consequently, genes with identical sequences are sometimes duplicated on the same chromosome. When gene duplication occurs, large homologous repeat sequences are formed. If such DNA repeat sequences are to some degree clustered, the gene duplication will occur more easily and more frequently.
However, even though the genes are duplicated, they are rarely exactly the same. Changes in the base sequences of the genes induce changes in the proteins that they code. Changes in the sequences that regulate the expression of genes provoke changes in terms of where, when, and how much RNA they transcribe. In reality, it is often the case that the sequences of genes and the sequences that regulate them both change (Fig. 4-9).
For example, there are more than 20 similar genes in interferons, which are proteins that protect organisms against viral infections. Even comparatively similar α interferons exist in 13 types. All α interferons act on the same receptor, and thus, have similar functions. Various α interferons are expressed in response to various viruses. These α interferons differ depending on the cell in an effort to make a perfect adaptation for the survival of organisms. In such cases, even if the proteins are the same, what would be of importance is the time and location of their activity. Duplication creates a system that has both waste and flexibility. This system is called "redundancy."
In the process of evolution, living organisms do not rely on random genetic mutations alone. Rather, they combine existing functional genetic information to create new functional genes (diversity by combination), and they duplicate genetic information and use it in various ways for various circumstances (redundancy by duplication). Phenomena such as exon shuffling and gene duplication are the driving forces that increase diversity and redundancy in the genomes of living organisms. This diversity and redundancy is the foundation upon which organisms expand their ability to respond flexibly to changes in the environment.
Abnormalities in the Epigenome, and Diseases
When the epigenome is reset (demethylated) in a zygote, sometimes only the maternal genes are not demethylated. In such cases, only the maternal genes do not function in the child.
If the regulation of the epigenome does not function properly, it is easy for diseases such as cancer to occur. IGF2 is a hormone that stimulates cell proliferation. The maternal gene for IGF2 is methylated and does not function, so the IGF2 hormone is made only of the paternal genes. However, if the maternal IGF2 gene is not methylated, it will also produce the IGF2 hormone and cause stimulation of cell proliferation to double (Column Fig. 4-2). In animal experiments, this phenomenon increases intestinal cancer. It has been discovered that this kind of cancer can be treated by suppression of IGF2 activity. Abnormalities in the epigenome increase the likelihood of developing diseases.
The Genome Modified and Changed after Birth: The Epigenome
In the past, it was thought that the genetic information of living organisms did not change after birth. It is true that normally there are quality assurance mechanisms that prevent mutations in sequences of DNA when it replicates during cell division at the time of formation of 6 billion cells from one zygote in our bodies.
However, it has recently been discovered that our DNA is modified and altered after we are born, as shown in Figure 4-10. DNA comprises the 4 bases indicated by A, G, C, and T. Methyl groups (1 carbon atom and 3 hydrogen atoms:CH3) sometimes attach to cytosines (indicated by C). DNA consists of 2 strands, and if C of the CG sequence on one strand and C on the opposite strand of the double helix are both methylated as shown in Figure 4-10, this methylation is also replicated during replication. Expression of RNA is repressed in genes that contain many methylated Cs in the regulatory sequences that control their expressions. This repression is referred to as "silencing by methylation."
When human DNA is replicated, its methylation modifications are replicated along with it. To describe this kind of genome modification that occurs in the nucleus, the prefix "epi," meaning "after," is added to "genome" to make the word "epigenome" (see Column at the bottom).
The epigenome is involved in producing multiple cell types from one genome. In the human body, about 200 broad categories of specialized cells are known, including cells of organs such as the liver, cells of blood vessels of the entire body, and neurons of nerves. These cells are formed by differentiation from one zygote, but once they have differentiated, they stabilize and exhibit mutually identical characteristics. For example, a transplanted kidney will function as a kidney, and a transplanted liver will function as a liver. Analysis of each kind of cell reveals differences in their DNA methylation. In other words, cells that have different epigenomes become different cells.
Human development begins when the DNA of the father's sperm and mother's egg are inherited in a zygote. Shortly after fertilization, most of the DNA methylation and the like are reset by removal of methyl groups. In undifferentiated cells with the ability to differentiate into various kinds of cells, the epigenome changes as the cell differentiates.
Once the genome is modified, it will be replicated and inherited even when the cell divides as it is. Consequently, the number of genes with repressed expression will increase. Methylated DNA is also increased by various kinds of environmental stimulations. Methylation modifications are replicated, and if such modifications accumulate in the DNA over many years, the amount of non-functional DNA with repressed expression will increase. This type of increase may be accompanied by aging of the cells. On the other hand, when the sperm and egg unite as described above, the methylation in the zygote is reset, and thus, a baby is born with healthy cells and fresh skin.
The Histone Proteins around which DNA is Wound, and the Epigenome
Column Fig. 4-3. Chromosome Structure and Histone Modification
(A) DNA coils itself around histone and forms chromatin.
(B) Histone modifications are replicated during cell division.
DNA, the so-called "thread of life," is a thread-like molecule that forms a double helix structure. In human cells, there are 46 chromosomes of these thread-like DNA molecules, and their total length is 1 meter long. As shown in Column Figure 4-3, chromosomal DNA is wrapped around proteins called "histones" and folded up. This DNA and its histone cores are referred to as "chromatin."
Chromatin changes dynamically. DNA does not function well when it is wrapped around histones and folded up, so that its gene expression can be repressed. However, genes are easily expressed by DNA that has separated from its histones and opened up. It is now known that DNA expression changes when the amino acid lysine in the histone protein core is modified by methylation or acetylation. When proteins that activate expression bind to the DNA, the histone is acetylated, and the DNA opens up. The RNA polymerase binds at that location, and then RNA is transcribed. It is also known, today, that gene expression is sometimes being repressed when histones are methylated. Gene expression changes radically in the combinations of modifications of various amino acids in histones.
It is now known that histone modifications are replicated in the same manner as DNA methylation during cell division and gene replication. The epigenome is determined by the methylation of nuclear DNA as well as the modification of the genome made by the acetylation and methylation of the histones.