It has been estimated that Earth formed about 4.54 billion years ago and was probably a hellish place with frequent meteorite bombardments, giant impacts, volcanism, magma oceans (figure 1) and a surface temperature ranging between 55 and 85 oC (1). The cooling down of early Earth, which happened about 4.4 billion years ago, allowed the accumulation of water into oceans. It is generally believed that first life developed in these primordial oceans possibly around 4.28 billion years ago, comprising primitive microorganisms. A key question within biology is whether proteins from these ancient organisms are fundamentally different than modern proteins. Current computational approaches now enable biologists to “travel back in time” to deduce sequences of ancestral proteins and recreate these in the lab. This resurrection of ancient proteins is called ancestral sequence reconstruction (ASR) and is perhaps strongly reminiscent of Jurassic Park. ASR provided important insights into the origins of protein sequence, structure and function. In the absence of DNA from organisms or fossils, ancient protein sequences are inferred from ancestors on a phylogenetic tree, which is constructed from a multiple sequence alignment (2,3,4). The deduced primary sequences are back-translated and used to generate DNA coding sequences of ancestral proteins through gene synthesis. These synthetic DNA constructs are subsequently expressed, allowing experimental characterization of the resurrected proteins. Here I will discuss general trends in the evolution of proteins as well as biochemical and structural features of recreated Precambrian enzymes.
Resurrecting ancestral proteins
Advances in computational approaches allow scientists to go back in evolution to deduce the primary sequence of ancient proteins. The basic steps of this process are shown in figure 2 (adapted from 4). Different online tools for each step of the ASR process are publicly available and are discussed elsewhere (3). First, a multiple sequence alignment is performed using a set of related sequences of modern proteins to build a phylogenetic tree. Target sequences are typically retrieved from databases such as NCBI or UniProt. Next, the primary sequence of ancestral proteins (underlined) at each node of the tree is inferred from the multiple sequence alignment and phylogenetic tree using a statistical algorithm. Finally, DNA coding sequences of the ancestral proteins are obtained through gene synthesis using the back-translated ancestral sequences as basis., expressed in a suitable host organism and experimentally characterized (2,3).
Using this approach, ancestral proteins were recreated from different organisms of all domains of life and from the three main eras, Archean (4000-2500 Mya), Proterozoic (2500-550 Mya) and Cambrian. The latter is further subdivided into Paleozoic (550-250 Mya), Mesozoic (250-65 Mya) and Cenozoic (65-0 Mya). All resurrected proteins are detailed in figure 3 (adapted from 3). Structural information is available from 46 ancestral proteins (listed in italics) with the corresponding structure.
Figure 3 shows that the oldest ancestral protein of which structural information is available is Precambrian thioredoxin. In fact, seven ancient thioredoxin variants originating from different last common ancestors (e.g. common bacterial ancestor, archaeal ancestor and eukaryotic ancestor) up to approximately 4 billion years old were resurrected and crystallized (5). This showed that, despite significant sequence differences with modern thioredoxin variants, the resurrected enzymes adopt a similar structure as extant thioredoxin. This remarkable degree of structural conservation over 4 billion years suggests that protein structures evolve slowly. Moreover, all ancestral thioredoxins displayed an increased thermostability which is in line with an increased surface temperature of the Precambrian environment. Similar findings were reported for ancestral nucleoside diphosphate kinases (NDK) from the last common ancestors of archaea and of bacteria that probably lived 3.8 billion years ago (6). Specifically, the resurrected ancient NDK variants exhibited a superior thermal stability when compared to modern NDK enzymes, while assessment of their structure confirmed a similar hexameric architecture and overall conformation as extant NDKs (6). Additionally, the leucine biosynthetic enzyme LeuB from the last common ancestor of Bacillus of approximately 1 billion years ago was resurrected by means of different phylo-statistical algorithms, resulting in different ancestral variants of LeuB (7). All variants were catalytically active and thermophilic. Elucidation of the structure of the best performing ancestral variant revealed that it closely resembles the structure of the modern thermophilic bacterium Thermotoga maritima. These studies highlight two important features of resurrected ancestral proteins, namely structural conservation and increased thermal stability (8). Additionally, the origins of catalytic specificity were investigated for serine proteases. To this end, the common ancestor of granzyme B, chymase and cathepsin G that existed about 170 million years ago was resurrected (9). This showed that the ancestor had a broad catalytic preference, covering all specificities found in is modern descendants, suggesting that a generalist ancestor gives rise to highly specialized descendants over time. Interestingly, similar results have been reported for other resurrected ancient enzymes (3,4). These findings corroborate the view that a functionally promiscuous enzyme is able to evolve into variants exhibiting a novel or highly specific activity (10).
An example: resurrected Precambrian β-lactamases
β-lactamases a large group of microbial enzymes known to degrade β-lactam antibiotics currently comprises over 2000 variants. These are commonly divided into class A-D based on sequence homology. Members of class A, C and D use an active site serine to hydrolyze the β-lactam ring, while class B enzymes are metallo-enzymes, employing a zinc ion within their active site to facilitate the hydrolysis of a β-lactam (11). A phylogenetic analysis suggested that β-lactamases are ancient enzymes originating approximately 2 billion years ago (12). Recently, several β-lactamases from different common ancestors (e.g. of enterobacteria, Gram-negative bacteria and Gram-positive bacteria) of 1-3 billon years old were resurrected, characterized and crystallized (13). Structures of the resurrected enzymes from the last common ancestor of Gram-negative bacteria (PDB 4B88) and enterobacteria (PDB 3ZDJ) were resolved at 2.0 and 2.4 Å, respectively. Both structures are shown in figure 4. The structure of an extant enzyme, TEM1 β-lactamase from E. coli at 1.8 Å resolution (PDB 1BTL), is shown for reference purposes (14). The coloring of the structures is according to secondary structure. This figure shows that, despite considerable sequence differences, both resurrected enzymes fold into a similar structure as TEM1 β-lactamase. In fact, the structure of recreated β-lactamase from the ancestor of Gram-negative bacteria and the structure of TEM1 β-lactamase are superimposable with no important conformational changes (lower right panel), revealing a significant degree of structural conservation. Subsequent characterization of the resurrected β-lactamase variants showed that they are remarkably robust as evidenced by their melting temperatures, which are about 35OC higher than those of modern variants. The high thermostability of ancestral β-lactamases agrees well with the results of other protein resurrection studies (8). Assessment of their catalytic properties showed that the ancestral enzymes are able to degrade a wide variety of β-lactam antibiotics, while TEM1 β-lactamase displays a limited substrate scope with penicillin as preferred substrate. From these data it can be concluded that the resurrected β-lactamases are promiscuous, whereas modern variants are specialist enzymes. Interestingly, this apparent difference in substrate preference is not reflected by structural differences, suggesting that conformational dynamics may control the selection of substrates. Collectively, the structural and biochemical characterization of the recreated β-lactamases agrees with the notion that Precambrian conditions favored thermostability as well as the evolvement of specialist enzymes from functionally diverse ancestors. Hence, enzyme promiscuity is probably a primitive trait in addition to thermostability.
It is generally assumed that the Precambrian surface temperature ranged between 55 and 85OC (1). Though difficult to confirm experimentally, the results from different protein resurrection studies support the view that the environment of Early Earth was hot because the recreated ancient proteins are in general extremely thermostable (4, 8). Moreover, these molecular resurrection studies have provided profound insight into the evolution of proteins by showing, for example, that protein structures evolve very slowly and today’s specialized enzymes are probably descendants from functionally diverse ancestors. Although different methods of ancestral sequence reconstruction have been described, they are all based on a multiple sequence alignment of homologous sequences of extant proteins to generate a phylogenetic tree. Using a statistical algorithm, the primary sequence of the ancestor at each node of the tree is deduced next. Subsequently, DNA coding sequences are obtained through gene synthesis using the back-translated ancestral sequences as basis. These synthetic DNA constructs are expressed in a suitable host, enabling purification and characterization of the resurrected protein. Despite the promising results of protein resurrection studies, the exact residue at a given position in the deduced ancestral sequence is difficult to confirm (2,3,4). With regards to the latter, it is interesting to note that more robust phylogenetic algorithms have been developed recently, improving the reliability of the ancestral sequence reconstruction (2). However, the correctness of the deduced sequence is of lesser importance in protein engineering. Owing to their extremely thermostable nature and wide substrate scope, resurrected enzymes are an ideal starting point for further protein engineering to obtain variants with a novel enzymatic function, altered cofactor preference or improved solvent stability (2)
1. Kasting JF, Ono S. 2006. Palaeoclimates: the first two billion years. Philos Trans R Soc Lond B Biol Sci. 361: 917-29.
2. Hochberg GKA, Thornton JW. 2017. Reconstructing Ancient Proteins to Understand the Causes of Structure and Function. Annu Rev Biophys. 46: 247-269.
3. Gumulya Y, Gillam EM. 2017. Exploring the past and the future of protein evolution with ancestral sequence reconstruction: the 'retro' approach to protein engineering. Biochem J. 474: 1-19.
4. Wheeler LC, Lim SA, Marqusee S, Harms MJ. 2016. The thermostability and specificity of ancient proteins. Curr Opin Struct Biol. 38: 37-43.
5. Ingles-Prieto A1, Ibarra-Molero B, Delgado-Delgado A, Perez-Jimenez R, Fernandez JM, Gaucher EA, Sanchez-Ruiz JM, Gavira JA. 2013. Conservation of protein structure over four billion years. Structure. 21: 1690-1697.
6. Akanuma S, Nakajima Y, Yokobori S, Kimura M, Nemoto N, Mase T, Miyazono K, Tanokura M, Yamagishi A. 2013. Experimental evidence for the thermophilicity of ancestral life. Proc Natl Acad Sci U S A. 110: 11067-11072.
7. Hobbs JK1, Shepherd C, Saul DJ, Demetras NJ, Haaning S, Monk CR, Daniel RM, Arcus VL. 2012. On the origin and evolution of thermophily: reconstruction of functional precambrian enzymes from ancestors of Bacillus. Mol Biol Evol. 2: 825-35.
8. Risso VA, Gavira JA, Sanchez-Ruiz JM. 2014. Thermostable and promiscuous Precambrian proteins. Environ Microbiol. 6: 1485-1489.
9. Wouters MA, Liu K, Riek P, Husain A. 2003. A despecialization step underlying evolution of a family of serine proteases. Mol Cell. 12: 343-54.
10. Baier F, Copp JN, Tokuriki N. 2016. Evolution of Enzyme Superfamilies: Comprehensive Exploration of Sequence-Function Relationships. Biochemistry. 55: 6375-6388.
11. Bonomo RA. 2017. β-Lactamases: A Focus on Current Challenges. Cold Spring Harb Perspect Med. doi: 10.1101/cshperspect.a025239.
12. Apr; Hall BG, Barlow M. 2004. Evolution of the serine beta-lactamases: past, present and future. Drug Resist Updat. 7: 111-23.
13. Risso VA, Gavira JA, Mejia-Carmona DF, Gaucher EA, Sanchez-Ruiz JM. 2013. Hyperstability and substrate promiscuity in laboratory resurrections of Precambrian β-lactamases. J Am Chem Soc. 135: 2899-2902.
14. Jelsch C, Mourey L, Masson JM, Samama JP. 1993. Crystal structure of Escherichia coli TEM1 beta-lactamase at 1.8 A resolution. Proteins. 16: 364-383.