History of Molecular Evolution
The history of molecular evolution starts in the early 20th century with “comparative biochemistry“, but the field of molecular evolution came into its own in the 1960s and 1970s, following the rise of molecular biology. The advent of protein sequencing allowed molecular biologists to create phylogenies based on sequence comparison, and to use the differences between homologous sequences as a molecular clock to estimate the time since the last common ancestor. In the late 1960s, the neutral theory of molecular evolution provided a theoretical basis for the molecular clock, though both the clock and the neutral theory were controversial, since most evolutionary biologists held strongly to panselectionism, with natural selection as the only important cause of evolutionary change. After the 1970s, nucleic acid sequencing allowed molecular evolution to reach beyond proteins to highly conserved ribosomal RNA sequences, the foundation of a reconceptualization of the early history of life.
Before the rise of molecular biology in the 1950s and 1960s, a small number of biologists had explored the possibilities of using biochemical differences between species to study evolution. Alfred Sturtevant predicted the existence of chromosomal inversions in 1921 and with Dobzhansky constructed one of the first molecular phylogenies on 17 Drosophila Pseudo-obscura strains from the accumulation of chromosomal inversions observed from the hybridization of polyten chromosomes. Ernest Baldwin worked extensively on comparative biochemistry beginning in the 1930s, and Marcel Florkin pioneered techniques for constructing phylogenies based on molecular and biochemical characters in the 1940s. However, it was not until the 1950s that biologists developed techniques for producing biochemical data for the quantitative study of molecular evolution.
The first molecular systematics research was based on immunological assays and protein “fingerprinting” methods. Alan Boyden—building on immunological methods of George Nuttall—developed new techniques beginning in 1954, and in the early 1960s Curtis Williams and Morris Goodman used immunological comparisons to study primate phylogeny. Others, such as Linus Pauling and his students, applied newly developed combinations of electrophoresis and paper chromatography to proteins subject to partial digestion by digestive enzymes to create unique two-dimensional patterns, allowing fine-grained comparisons of homologous proteins.
Beginning in the 1950s, a few naturalists also experimented with molecular approaches—notably Ernst Mayr and Charles Sibley. While Mayr quickly soured on paper chromatography, Sibley successfully applied electrophoresis to egg-white proteins to sort out problems in bird taxonomy, soon supplemented that with DNA hybridization techniques—the beginning of a long career built on molecular systematics.
While such early biochemical techniques found grudging acceptance in the biology community, for the most part they did not impact the main theoretical problems of evolution and population genetics. This would change as molecular biology shed more light on the physical and chemical nature of genes.
Genetic load, the classical/balance controversy, and the measurement of heterozygosity
At the time that molecular biology was coming into its own in the 1950s, there was a long-running debate—the classical/balance controversy—over the causes of heterosis, the increase in fitness observed when inbred lines are outcrossed. In 1950, James F. Crow offered two different explanations (later dubbed the classical and balance positions) based on the paradox first articulated by J. B. S. Haldane in 1937: the effect of deleterious mutations on the average fitness of a population depends only on the rate of mutations (not the degree of harm caused by each mutation) because more-harmful mutations are eliminated more quickly by natural selection, while less-harmful mutations remain in the population longer. H. J. Muller dubbed this “genetic load”.
Muller, motivated by his concern about the effects of radiation on human populations, argued that heterosis is primarily the result of deleterious homozygous recessive alleles, the effects of which are masked when separate lines are crossed—this was the dominance hypothesis, part of what Dobzhansky labeled the classical position. Thus, ionizing radiation and the resulting mutations produce considerable genetic load even if death or disease does not occur in the exposed generation, and in the absence of mutation natural selection will gradually increase the level of homozygosity. Bruce Wallace, working with J. C. King, used the overdominance hypothesis to develop the balance position, which left a larger place for overdominance (where the heterozygous state of a gene is more fit than the homozygous states). In that case, heterosis is simply the result of the increased expression of heterozygote advantage. If overdominant loci are common, then a high level of heterozygosity would result from natural selection, and mutation-inducing radiation may in fact facilitate an increase in fitness due to overdominance. (This was also the view of Dobzhansky.)
Debate continued through 1950s, gradually becoming a central focus of population genetics. A 1958 study of Drosophila by Wallace suggested that radiation-induced mutations increased the viability of previously homozygous flies, providing evidence for heterozygote advantage and the balance position; Wallace estimated that 50% of loci in natural Drosophila populations were heterozygous. Motoo Kimura’s subsequent mathematical analyses reinforced what Crow had suggested in 1950: that even if overdominant loci are rare, they could be responsible for a disproportionate amount of genetic variability. Accordingly, Kimura and his mentor Crow came down on the side of the classical position. Further collaboration between Crow and Kimura led to the infinite alleles model, which could be used to calculate the number of different alleles expected in a population, based on population size, mutation rate, and whether the mutant alleles were neutral, overdominant, or deleterious. Thus, the infinite alleles model offered a potential way to decide between the classical and balance positions, if accurate values for the level of heterozygosity could be found.
By the mid-1960s, the techniques of biochemistry and molecular biology—in particular protein electrophoresis—provided a way to measure the level of heterozygosity in natural populations: a possible means to resolve the classical/balance controversy. In 1963, Jack L. Hubby published an electrophoresis study of protein variation in Drosophila; soon after, Hubby began collaborating with Richard Lewontin to apply Hubby’s method to the classical/balance controversy by measuring the proportion of heterozygous loci in natural populations. Their two landmark papers, published in 1966, established a significant level of heterozygosity for Drosophila (12%, on average). However, these findings proved difficult to interpret. Most population geneticists (including Hubby and Lewontin) rejected the possibility of widespread neutral mutations; explanations that did not involve selection were anathema to mainstream evolutionary biology. Hubby and Lewontin also ruled out heterozygote advantage as the main cause because of the segregation load it would entail, though critics argued that the findings actually fit well with overdominance hypothesis.
Protein sequences and the molecular clock
While evolutionary biologists were tentatively branching out into molecular biology, molecular biologists were rapidly turning their attention toward evolution.
After developing the fundamentals of protein sequencing with insulin between 1951 and 1955, Frederick Sanger and his colleagues had published a limited interspecies comparison of the insulin sequence in 1956. Francis Crick, Charles Sibley and others recognized the potential for using biological sequences to construct phylogenies, though few such sequences were yet available. By the early 1960s, techniques for protein sequencing had advanced to the point that direct comparison of homologous amino acid sequences was feasible. In 1961, Emanuel Margoliash and his collaborators completed the sequence for horse cytochrome c (a longer and more widely distributed protein than insulin), followed in short order by a number of other species.
In 1962, Linus Pauling and Emile Zuckerkandl proposed using the number of differences between homologous protein sequences to estimate the time since divergence, an idea Zuckerkandl had conceived around 1960 or 1961. This began with Pauling’s long-time research focus, hemoglobin, which was being sequenced by Walter Schroeder; the sequences not only supported the accepted vertebrate phylogeny, but also the hypothesis (first proposed in 1957) that the different globin chains within a single organism could also be traced to a common ancestral protein. Between 1962 and 1965, Pauling and Zuckerkandl refined and elaborated this idea, which they dubbed the molecular clock, and Emil L. Smith and Emanuel Margoliash expanded the analysis to cytochrome c. Early molecular clock calculations agreed fairly well with established divergence times based on paleontological evidence. However, the essential idea of the molecular clock—that individual proteins evolve at a regular rate independent of a species’ morphological evolution—was extremely provocative (as Pauling and Zuckerkandl intended it to be).
The “molecular wars”
From the early 1960s, molecular biology was increasingly seen as a threat to the traditional core of evolutionary biology. Established evolutionary biologists—particularly Ernst Mayr, Theodosius Dobzhansky and G. G. Simpson, three of the founders of the modern evolutionary synthesis of the 1930s and 1940s—were extremely skeptical of molecular approaches, especially when it came to the connection (or lack thereof) to natural selection. Molecular evolution in general—and the molecular clock in particular—offered little basis for exploring evolutionary causation. According to the molecular clock hypothesis, proteins evolved essentially independently of the environmentally determined forces of selection; this was sharply at odds with the panselectionism prevalent at the time. Moreover, Pauling, Zuckerkandl, and other molecular biologists were increasingly bold in asserting the significance of “informational macromolecules” (DNA, RNA and proteins) for all biological processes, including evolution. The struggle between evolutionary biologists and molecular biologists—with each group holding up their discipline as the center of biology as a whole—was later dubbed the “molecular wars” by Edward O. Wilson, who experienced firsthand the domination of his biology department by young molecular biologists in the late 1950s and the 1960s.
In 1961, Mayr began arguing for a clear distinction between functional biology (which considered proximate causes and asked “how” questions) and evolutionary biology (which considered ultimate causes and asked “why” questions) He argued that both disciplines and individual scientists could be classified on either the functional or evolutionary side, and that the two approaches to biology were complementary. Mayr, Dobzhansky, Simpson and others used this distinction to argue for the continued relevance of organismal biology, which was rapidly losing ground to molecular biology and related disciplines in the competition for funding and university support. It was in that context that Dobzhansky first published his famous statement, “nothing in biology makes sense except in the light of evolution”, in a 1964 paper affirming the importance of organismal biology in the face of the molecular threat; Dobzhansky characterized the molecular disciplines as “Cartesian” (reductionist) and organismal disciplines as “Darwinian”.
Mayr and Simpson attended many of the early conferences where molecular evolution was discussed, critiquing what they saw as the overly simplistic approaches of the molecular clock. The molecular clock, based on uniform rates of genetic change driven by random mutations and drift, seemed incompatible with the varying rates of evolution and environmentally-driven adaptive processes (such as adaptive radiation) that were among the key developments of the evolutionary synthesis. At the 1962 Wenner-Gren conference, the 1964 Colloquium on the Evolution of Blood Proteins in Bruges, Belgium, and the 1964 Conference on Evolving Genes and Proteins at Rutgers University, they engaged directly with the molecular biologists and biochemists, hoping to maintain the central place of Darwinian explanations in evolution as its study spread to new fields.
Gene-centered view of evolution
Though not directly related to molecular evolution, the mid-1960s also saw the rise of the gene-centered view of evolution, spurred by George C. Williams’s Adaptation and Natural Selection (1966). Debate over units of selection, particularly the controversy over group selection, led to increased focus on individual genes (rather than whole organisms or populations) as the theoretical basis for evolution. However, the increased focus on genes did not mean a focus on molecular evolution; in fact, the adaptationism promoted by Williams and other evolutionary theories further marginalized the apparently non-adaptive changes studied by molecular evolutionists.
The neutral theory of molecular evolution
The intellectual threat of molecular evolution became more explicit in 1968, when Motoo Kimura introduced the neutral theory of molecular evolution. Based on the available molecular clock studies (of hemoglobin from a wide variety of mammals, cytochrome c from mammals and birds, and triosephosphate dehydrogenase from rabbits and cows), Kimura (assisted by Tomoko Ohta) calculated an average rate of DNA substitution of one base pair change per 300 base pairs (encoding 100 amino acids) per 28 million years. For mammal genomes, this indicated a substitution rate of one every 1.8 years, which would produce an unsustainably high substitution load unless the preponderance of substitutions was selectively neutral. Kimura argued that neutral mutations occur very frequently, a conclusion compatible with the results of the electrophoretic studies of protein heterozygosity. Kimura also applied his earlier mathematical work on genetic drift to explain how neutral mutations could come to fixation, even in the absence of natural selection; he soon convinced James F. Crow of the potential power of neutral alleles and genetic drift as well.
Kimura’s theory—described only briefly in a letter to Nature—was followed shortly after with a more substantial analysis by Jack L. King and Thomas H. Jukes—who titled their first paper on the subject “non-Darwinian evolution”. Though King and Jukes produced much lower estimates of substitution rates and the resulting genetic load in the case of non-neutral changes, they agreed that neutral mutations driven by genetic drift were both real and significant. The fairly constant rates of evolution observed for individual proteins was not easily explained without invoking neutral substitutions (though G. G. Simpson and Emil Smith had tried). Jukes and King also found a strong correlation between the frequency of amino acids and the number of different codons encoding each amino acid. This pointed to substitutions in protein sequences as being largely the product of random genetic drift.
King and Jukes’ paper, especially with the provocative title, was seen as a direct challenge to mainstream neo-Darwinism, and it brought molecular evolution and the neutral theory to the center of evolutionary biology. It provided a mechanism for the molecular clock and a theoretical basis for exploring deeper issues of molecular evolution, such as the relationship between rate of evolution and functional importance. The rise of the neutral theory marked synthesis of evolutionary biology and molecular biology—though an incomplete one.
With their work on firmer theoretical footing, in 1971 Emile Zuckerkandl and other molecular evolutionists founded the Journal of Molecular Evolution.
The neutralist-selectionist debate and near-neutrality
The critical responses to the neutral theory that soon appeared marked the beginning of the neutralist-selectionist debate. In short, selectionists viewed natural selection as the primary or only cause of evolution, even at the molecular level, while neutralists held that neutral mutations were widespread and that genetic drift was a crucial factor in the evolution of proteins. Kimura became the most prominent defender of the neutral theory—which would be his main focus for the rest of his career. With Ohta, he refocused his arguments on the rate at which drift could fix new mutations in finite populations, the significance of constant protein evolution rates, and the functional constraints on protein evolution that biochemists and molecular biologists had described. Though Kimura had initially developed the neutral theory partly as an outgrowth of the classical position within the classical/balance controversy (predicting high genetic load as a consequence of non-neutral mutations), he gradually deemphasized his original argument that segregational load would be impossibly high without neutral mutations (which many selectionists, and even fellow neutralists King and Jukes, rejected).
From the 1970s through the early 1980s, both selectionists and neutralists could explain the observed high levels of heterozygosity in natural populations, by assuming different values for unknown parameters. Early in the debate, Kimura’s student Tomoko Ohta focused on the interaction between natural selection and genetic drift, which was significant for mutations that were not strictly neutral, but nearly so. In such cases, selection would compete with drift: most slightly deleterious mutations would be eliminated by natural selection or chance; some would move to fixation through drift. The behavior of this type of mutation, described by an equation that combined the mathematics of the neutral theory with classical models, became the basis of Ohta’s nearly neutral theory of molecular evolution.
In 1973, Ohta published a short letter in Nature suggesting that a wide variety of molecular evidence supported the theory that most mutation events at the molecular level are slightly deleterious rather than strictly neutral. Molecular evolutionists were finding that while rates of protein evolution (consistent with the molecular clock) were fairly independent of generation time, rates of noncoding DNA divergence were inversely proportional to generation time. Noting that population size is generally inversely proportional to generation time, Tomoko Ohta proposed that most amino acid substitutions are slightly deleterious while noncoding DNA substitutions are more neutral. In this case, the faster rate of neutral evolution in proteins expected in small populations (due to genetic drift) is offset by longer generation times (and vice versa), but in large populations with short generation times, noncoding DNA evolves faster while protein evolution is retarded by selection (which is more significant than drift for large populations).
Between then and the early 1990s, many studies of molecular evolution used a “shift model” in which the negative effect on the fitness of a population due to deleterious mutations shifts back to an original value when a mutation reaches fixation. In the early 1990s, Ohta developed a “fixed model” that included both beneficial and deleterious mutations, so that no artificial “shift” of overall population fitness was necessary. According to Ohta, however, the nearly neutral theory largely fell out of favor in the late 1980s, because the mathematically simpler neutral theory for the widespread molecular systematics research that flourished after the advent of rapid DNA sequencing. As more detailed systematics studies started to compare the evolution of genome regions subject to strong selection versus weaker selection in the 1990s, the nearly neutral theory and the interaction between selection and drift have once again become an important focus of research.
While early work in molecular evolution focused on readily sequenced proteins and relatively recent evolutionary history, by the late 1960s some molecular biologists were pushing further toward the base of the tree of life by studying highly conserved nucleic acid sequences. Carl Woese, a molecular biologist whose earlier work was on the genetic code and its origin, began using small subunit ribosomal RNA to reclassify bacteria by genetic (rather than morphological) similarity. Work proceeded slowly at first, but accelerated as new sequencing methods were developed in the 1970s and 1980s. By 1977, Woese and George Fox announced that some bacteria, such as methanogens, lacked the rRNA units that Woese’s phylogenetic studies were based on; they argued that these organisms were actually distinct enough from conventional bacteria and the so-called higher organisms to form their own kingdom, which they called archaebacteria. Though controversial at first (and challenged again in the late 1990s), Woese’s work became the basis of the modern three-domain system of Archaea, Bacteria, and Eukarya (replacing the five-domain system that had emerged in the 1960s).
Work on microbial phylogeny also brought molecular evolution closer to cell biology and origin of life research. The differences between archaea pointed to the importance of RNA in the early history of life. In his work with the genetic code, Woese had suggested RNA-based life had preceded the current forms of DNA-based life, as had several others before him—an idea that Walter Gilbert would later call the “RNA world”. In many cases, genomics research in the 1990s produced phylogenies contradicting the rRNA-based results, leading to the recognition of widespread lateral gene transfer across distinct taxa. Combined with the probable endosymbiotic origin of organelle-filled eukarya, this pointed to a far more complex picture of the origin and early history of life, one which might not be describable in the traditional terms of common ancestry.