After the Discovery of Double-Helix Structure of DNA
Research on the life sciences faced a major turning point after it was found that DNA and its sequences were the substances of genes. Various substances were found to perform various roles; for example, DNA is an information-storage molecule, and proteins are important functional molecules for the maintenance of life. The sequence of DNA was also found to play an important role in life. In addition, since all living organisms on the earth store their genetic information in DNA, many researchers highly expect that DNA can unveil principles common to all life.
Replication, Transcription, and Translation—DNA, RNA, and Proteins
Fig. 3-4. The Flow of Genetic Information
The arrows indicate the direction of the flow of genetic information.
All of the information necessary for life is coded in DNA, which comprises four bases—A, G, C, and T. Information necessary for life is coded in the sequences of these four bases.
The information carried by DNA is converted into proteins including enzymes that catalyze chemical reactions in the organism, including cytoskeletons, for maintaining cell shape. The conversion of information from DNA to proteins is mediated by ribonucleic acid (RNA)*4. While the information of DNA is coded by the four bases A, G, C, and T, the information of RNA uses U instead of T, and is therefore coded by A, G, C, and U.
The relationship between DNA, mRNA, and proteins closely resembles the manufacturing of products in a factory. The blueprint written in the DNA is first transcribed to mRNA, and then the products, i.e. proteins, are manufactured according to the information in the blueprint. DNA is like an encyclopedia containing all the blueprints of all products that can possibly be manufactured. Not all of these products are manufactured all the time; only necessary products at necessary times are transcribed into mRNA.
Genetic information prescribed by DNA sequences in organisms is thus transcribed into mRNA, and the transcribed information is translated into proteins. This flow of information is accepted as a common principle that applies to all living organisms including humans (Fig. 3-4).
Proteins are formed from 20 amino acids that are chemically joined*5 together (see Chapter 2, Fig. 2-3A). Although there are only four bases, they can code for these 20 amino acids because their base sequences are divided into triplets such as GAA or CUG that are treated as one piece of information. The triplets of bases that code for amino acids in this way are called "codons" (Fig. 3-5). Codons can code 43 = 64 bases of information, but can only code for 20 amino acids. Actually, even when the third bases of codons differ, they sometimes code for the same amino acid.
Fig. 3-5. Codon Table
The first base of each codon is written in the vertical heading on the left, and the second base is written in the horizontal heading on the top. The third base is written in the vertical heading on the right in a location corresponding to each combination of the first and second bases. The amino acid coded by each base triplet is indicated by a single letter (see Chapter 2, Figure 2-3A). Three codons, UAG, UAA, and UGA, do not code for amino acids, but stop synthesis of proteins. These are called "stop codons." Translation of all of the genes begins with AUG (methionine). This AUG translation start site is specifically called the "start codon."
*4 The RNA used to transcribe information from DNA is called mRNA. The RNA found in ribosomes, which are gene translation apparatuses, is called rRNA. The RNA used to convert gene codes into amino acids is called tRNA. It has recently been found that there are also "small RNAs" that have important functions within organisms.
*5 This chemical joining is called a peptide bond.
The Word "Gene" and the Concept of "Genome"
Ever since the idea caught on that DNA sequences are important genetic information, attempts have been made to read the DNA of many kinds of organisms. In particular, a groundbreaking technology was developed in the 1970s for sequencing DNA, leading to a sharp increase in the amount of this information. Along with progress in reading of DNA sequences, the important fact was discovered that DNA sequences do more than just code the sequences of proteins. That is, although genes are regulated to be expressed only in necessary quantities and at necessary times, the times and quantities of gene expression are regulated by regions that are also in DNA sequences. Mainly in eukaryotic cells, it was also discovered that DNA sequences that seem meaningless as genetic information exist.
Moreover, further investigation revealed that there are also sequences called "pseudogenes," which closely resemble genes but have no biological purpose and "repeat sequences," which continuously repeat simple sequences such as AGAGAGAGA*6 .
Therefore, the terms "DNA sequence" and "gene" are not at all equivalent, and the total DNA sequences of any organism is referred to as a "genome."*7 Genes are defined as parts of genomic DNA sequences that code information necessary for the production of certain proteins, and also as DNA sequences that code information about the sequences of a special kind of RNA,*8 which performs important functions in the production of proteins. Based on this definition, DNA sequences that regulate the duration and amount of gene expression are genes, not pseudogenes and repetitive sequences.
*6 These sequences are called "junk DNA." However, important physiological functions of junk DNA might be discovered in the future.
*7 This is a coined term combining the word "gene" and the suffix "-ome," which means "total."
*8 tRNA and rRNA
Fig. 3-6. Schematic Diagram of the Genes of Eukaryotes
First, RNA is transcribed in a form containing both exons and introns. Then, after going through a splicing process that removes the introns, mRNA capable of being translated into amino acids is formed.
The genes of eukaryotes such as humans have a special characteristic that is not found in prokaryotes. Information to produce proteins is normally coded in genes. If the information is for a protein consisting of 100 amino acids, it will seem reasonable that DNA sequences continuously correspond to these 100 amino acids. Actually, genetic information in prokaryotes is written in basically just such a mechanism. In eukaryotes, however, DNA sequences for proteins are often split into several regions. The entire sequences of these split genes are transcribed to one RNA molecule, but the sequences that do not code for amino acids are subsequently removed, leaving only the sequences that code for amino acids. The sequences that make up the mRNA are called "exons," the sequences that do not are called "introns," and the process of removing the introns is called "splicing" (Fig. 3-6). Why do exons and introns exist and why does splicing occur? Modern research has shown that this seemingly pointless phenomenon produces genes with diverse functions and has a major effect on the evolution of life through genetic mutations (see Chapter 4).
Summary of the Human Genome
Completion of the attempt to read all DNA sequences necessary to form humans (the Human Genome Project) was announced in 2003. Now let us take a quick look at humans from a genomic perspective.
Humans have 46 chromosomes in each of their cells and receive 23 chromosomes each from their mother and father. One of these chromosomes—the "Y-chromosome"—is only present in the father. There is also an X chromosome, which corresponds to the Y chromosome. These two chromosomes are called sex chromosomes, and the rest of the chromosomes are called "autosomes" (Fig. 3-7). In other words, 22 autosomes and one X sex chromosome are passed on from the mother, and 22 autosomes and either one X chromosome or one Y chromosome are passed on from the father.
The DNA sequences that make up the human genome contain about three billion base pairs. This word "genome" refers to the 22 chromosomes that are combined with the X and Y sex chromosomes and the mitochondrial DNA sequences.
There are about 25,000 human genes, and each gene uses base sequences that code for an average of about 450 amino acids. Regions containing information for proteins occupy only 1.3% of the human genome; the remaining 98.7% are regions that do not code for proteins.
Today, new research with such genomic sequencing is actively going on to search for the causes of human diseases and compare humans and animals.
Fig. 3-7. Overview of Human Chromosomes
This figure shows male chromosomes. Females have the XX pair of sex chromosomes. In cells that make up the human body, chromosomes designated by the same number are present in both the father and the mother, as shown in this figure. These pairs of chromosomes designated by the same number are called "homologous chromosomes."
Can Organisms Be Created If Their Genomic Sequence Is Known?
Modern technology has made possible artificial synthesis of DNA with desired sequences by chemically joining the A, G, C and T bases. Actually, chemical synthesis of DNA oligonucleotides with several dozen bases is common. If it were possible to synthesize the human genomic sequencing, which has 3 billion base pairs, could it be used to artificially create the human organism?
Based at least on the current science and technology, the answer is "no." It is impossible to create humans or even simplest organism such as bacteria. One major reason for it is that the structural cells cannot be rebuilt. Cells are separated from the outside environment by cell membranes made of lipids, and many proteins are embedded in these cell membranes. Various organelles are scattered throughout the cell. In prokaryotes, DNA is naked, but in eukaryotes, DNA is protected inside of a nucleus. Furthermore, DNA three-dimensionally binds to various proteins inside the cell to form complexes. There are myriad examples of such structures, and all of which are necessary for the cell to function. No matter how well the sequence of the genome is understood, unless a complete cell structure is prepared, it will be impossible to make genome sequences function as the DNA information. Ultimately, many hurdles must be overcome before life can be created artificially. It will not be an easy task.