It is currently well established that proteins represent an important class of biopolymers and play a key role in many biological processes, ranging from immunological responses to cell-cell signaling. However, despite our in-depth functional understanding of proteins, the field of protein (bio)chemistry is still relatively young. In 1789, the French chemist Antoine Fourcroy described three different animal proteins (albumin, fibrin and gelatin). This finding contributed to the understanding that proteins make up a separate group of biological molecules. Asparagine, the first amino acid found in proteins, was discovered by the French chemist Louis Nicolas Vauquelin in 1806. Cysteine, the second amino acid occurring in proteins, was described by William Hyde Wollaston an English chemist and physicist in 1810, while threonine was identified as last of the 20 amino acids that comprise the primary structure of a protein in 1936 by William Cumming Rose an American biochemist. The first detailed assessment on the composition of proteins was performed by the Dutch chemist Gerrit J Mulder in 1837, revealing that proteins contain carbon, hydrogen, nitrogen, oxygen as well as phosphorus and sulfur. In fact, he found that the empirical formula of fibrin and albumin is roughly similar: C400H620N100O120P1S1 and proposed that all proteins share a common core substance (grundstoff). Based on these results, the Swedish chemist Jacob Berzelius coined the term protein to describe this principal core substance. The first attempts to purify a protein from different sources were performed by Fourcroy and during the second half of the 19th century many plant proteins were purified. The optimization of protein purification enabled ultimately assessment of their primary sequence such as the beta chain of insulin by Frederick Sanger in 1951. Moreover, the ability to isolate proteins allowed the crystallization of the first enzyme, jackbean urease, by the American chemist James B Summer in 1926. This was followed by the first detailed X-ray diffraction pattern of crystallized pepsin by J. D. Bernal and Dorothy Crowfoot Hodgkin in 1934 and the low resolution structure of a folded protein, myoglobin, by John C Kendraw in 1958. In 1965, X-ray crystallography was employed to obtain the first three dimensional structure (at 2Å resolution) of an enzyme, lysozyme from henn egg white, by David C Phillips. This detailed structure allowed for the first time assessment of an enzyme’s catalytic mechanism.
A structural view on biology
The ability to obtain detailed structural information has fueled research on the functional and mechanistic aspects of proteins as indicated by 830 publications (listed in pubmed) reporting structural information in 1971 to about 27000 scientific papers dealing with the same topic in 2015. The protein data bank is a crystallographic database for biological macro molecules and currently holds about 131000 experimentally determined structures of which the vast majority is from proteins. The protein data bank, which was established in 1971 by the Cambridge Structure Database and Brookhaven National Laboratory, contains the structures of about 36000 human proteins as well as the structures of over 8500 E. coli and 3600 yeast proteins, respectively. Moreover, a significant number of murine proteins are also included in this database.
Membrane proteins take center stage in a plethora of biochemical processes such as signal transduction, nutrient uptake, cell division and photo synthesis. Moreover, malfolded or dysfunctional membrane proteins are the underlying cause of different human diseases like cystic fibrosis and Wilson’s disease. It is therefore not surprising that membrane proteins are an important class of potential drug targets. Membrane proteins are either an integral part of the membrane or are peripherally associated with it. Here, I will focus only on integral membrane proteins. Structurally, integral membrane proteins are characterized by an alpha helix or beta-barrel, respectively (figure 2) (coloring of all proteins is based on secondary structure). Helix-bundle proteins are part of all biological membranes and typically contain stretches of 20 hydrophobic amino acids, that assume an alpha-helical structure. Based on the results of different genome sequencing projects, it was suggested that 20-25% of all open reading frames encode helix-bundle proteins. Beta-barrel proteins are prominent in the outer membrane of mitochondria, chloroplasts and Gram-negative bacteria. These proteins are made up of large anti-parallel beta-sheets, organized in a cylindrical barrel structure. Unlike helix-bundle proteins, beta-barrel proteins are difficult to recognize in silico and therefore the precise number of these proteins present in a typical proteome is uncertain. However, it was suggested that about 12% of all E. coli open reading frames encode outer membrane proteins.
Two prototypical examples of a helix-bundle protein and a beta-barrel protein are shown in figure 2. YidC represents an integral membrane protein found in the cytoplasmic membrane of bacteria, while homologues are found in the cell membranes of archaea and inner membrane of chloroplasts and mitochondria. This protein facilitates the insertion and folding of newly synthesized membrane proteins and is (partly) associated with the protein translocation machinery (Sec complex) in bacteria, eukaryotes and chloroplasts. The structure of E. coli YidC (3WVF) was elucidated in 2014 at a resolution of 3.2 Å (1) and it clearly shows the six alpha-helical transmembrane segments (red) as well as the large periplasmic domain (yellow). OmpT is a protease found in the outer membrane of E. coli. This protein is involved in the degradation of extracellular proteins and peptides and seems required for colonization of the urinary tract and intestines and has therefore been linked with urinary tract and gastrointestinal infections. The structure of OmpT (1I78) was solved in 2001 at a resolution of 2.6 Å (2), revealing all anti-parallel beta-sheets (yellow) organized into a barrel structure with small periplasmic and extracellular loops (green).
Following insertion into the cytoplasmic membrane, newly synthesized membrane proteins are subsequently folded and often assembled into multimeric complexes. Tough experimentally challenging, the protein data bank currently contains almost 2700 structures of membrane protein complexes. Two interesting examples are shown in figure 3. The SecYEG complex facilitates protein translocation across and insertion of proteins into the cytoplasmic membrane. Protein translocation is energized by the motor ATPase SecA. Several SecYEG structures have been reported from different organisms, providing profound insight into the mechanistic aspects of protein translocation and membrane insertion. The structure shown in figure 3 (5AWW) was elucidated in 2015 at a resolution of 2.7 Å and concerns the core translocon of Thermus thermophiles (3). The structure clearly reveals SecY (purple) and its 10 transmembrane domains , SecG (green) and its two transmembrane domains and SecE (green) with two transmembrane domains. Several small loops of SecY are located at the periplasmic face (top) of the complex. Cytochrome C oxidase is the terminal oxidase of the electron transport chain and facilitates the reduction of molecular oxygen to water, which is coupled to the translocation of protons across the mitochondrial inner membrane. This vectorial transport of protons generates a proton motive force, which ultimately drives the production of ATP catalyzed by the ATPsynthase. Ubiquinol oxidases are part of the bacterial electron transport chain and function as proton pump similar to cytochrome C oxidases. The structure of E. coli ubiquinol oxidase (1FFT) was solved in 2000 at a resolution of 3.5 Å (4), showing that the overall structure of these enzymes is similar to cytochrome c oxidases. Ubiquinol oxidase comprises four subunits (CyoA, B c and D), totaling 25 transmembrane domains. Subunit I is shown in green with 15 transmembrane domains, subunit II is in blue and contains two transmembrane domains as well as a large periplasmic domain, subunit III in pink comprises five transmembrane helices and subunit IV in yellow contains three transmembrane domains.
The biogenesis of beta-barrel OMPs remained enigmatic for a long time. However, in 2005 a protein complex located in the outer membrane of E. coli was described that is required for the assembly of proteins in the outer membrane (5). This complex, known as the Bam complex, comprises five subunits (BamABCDE). Several structures of this complex or of individual subunits have been presented, offering detailed insight into the folding and insertion of OMPs. The first high resolution (2.9 Å) crystal structure of the entire complex was (5D0O), however, solved in 2016 (6). This structure is shown in figure 4 and reveals that BamA (in green) is the central component of this complex and resides with its beta-barrel cylinder into the outer membrane. The other subunits of the complex are all lipoproteins localized at the periplasmic face of the outer membrane and are organized in a ring around BamA. BamB is in blue, BamC is purple, BamD is in yellow and BamE is depicted in red.
Since the pioneering steps of Antoine Fourcroy in 1789 and Louis Nicolas Vauquelin in 1806, the field of protein biochemistry is now well established. Elucidation of the first (low resolution) protein structures represented a turning point in protein science. Following this scientific breakthrough, the number of solved structures has grown extensively. Although of key biological importance, membrane protein structures are highly underrepresented in the protein databank. About 3% of all structures currently in the protein database account for membrane proteins. This is in part because membrane proteins are notoriously difficult to express in recombinant form and hard to purify and manipulate due to their hydrophobic nature. It can be expected that recent technical advances to alleviate these bottlenecks will soon translate into increased structural data of membrane proteins (7). Nevertheless, the available structures have furthered the molecular understanding of their function, mechanism and evolution, while also providing insight into the role of membrane proteins as etiologic agents, contributed to drug design and drug interactions.
1. Kumazaki K, et al. (2014). Crystal structure of Escherichia coli YidC, a membrane protein chaperone and insertase. Sci Rep. 4:7299. doi: 10.1038/srep07299.
2. Vandeputte-Rutten, L. et al. (2001). Crystal structure of the outer membrane protease OmpT from Escherichia coli suggests a novel catalytic site. EMBO J. 20: 5033-5039.
3. Tanaka, Y. et al. (2015). Crystal Structures of SecYEG in Lipidic Cubic Phase Elucidate a Precise Resting and a Peptide-Bound State. Cell Rep. 13: 1561-1568.
4. Abramson, J. et al. (2000). The structure of the ubiquinol oxidase from Escherichia coli and its ubiquinone binding site. Nat.Struct.Biol. 7: 910-917.
5. Wu, T. et al. (2005). Identification of a multicomponent complex required for outer membrane biogenesis in Escherichia coli. Cell. 121: 235-245.
6. Gu, Y. et al. (2016). Structural basis of outer membrane protein insertion by the BAM complex. Nature 531: 47-52.
7. Hardy, D. et al (2016). Overcoming bottlenecks in the membrane protein structural biology pipeline. Biochem Soc Trans. 44: 838-844.