Chapter 02 Proteins
Chapter 02 Proteins
Chapter 02 Proteins
2 Proteins
Charles P. Woodbury, Jr., Ph.D.
CONTENTS
PROTEIN STRUCTURE
CLASSIFICATION OF PROTEINS
Proteins are often put into one of two categories, globular or fibrous, on the basis
of their overall structure. Globular proteins have compactly folded structures and
tend to resemble globes or spheroids in overall shape. Some are soluble in water
and can function in the cytosol or in extracellular fluids; other globular proteins are
closely associated with lipid bilayers, being buried in part or in whole in the
biological membranes where they function. Globular proteins include enzymes,
6 Pharmaceutical Biotechnology
AMINO ACIDS
The building blocks of proteins are the 20 different naturally occurring amino acids.
These amino acids each have a central carbon atom, the alpha (α)-carbon, to which
are attached a carboxyl group, an amino group (or in the case of proline, an imino
group), a hydrogen atom, and a side chain. The side chain is the feature that
distinguishes one amino acid from another; the other three groups about the α-carbon
are common to all 20 amino acids. The α-carbon is a chiral center, which means
that each amino acid can exist in either of two enantiomeric forms (denoted by D
and L); the L form predominates in nature.
Based on the properties of the side chains, the 20 amino acids can be put into
six general classes. The first class contains amino acids whose side chains are
aliphatic, and is usually considered to include glycine, alanine, valine, leucine, and
isoleucine. The second class is composed of the amino acids with polar, nonionic
side chains, and includes serine, threonine, cysteine, and methionine. The cyclic
amino acid proline (actually, an imino acid) constitutes a third class by itself. The
fourth class contains amino acids with aromatic side chains: tyrosine, phenylalanine,
and tryptophan. The fifth class has basic groups on the side chains and is made up
of the three amino acids lysine, arginine, and histidine. The sixth class is composed
of the acidic amino acids and their amides: aspartate and asparagine, and glutamate
and glutamine.
The exact ionic state of the side chains in the last two classes will depend on
the pH of the solution. At pH 7.0 the side chains of glutamate and aspartate have
ionized carboxylates, and the side chains of lysine and arginine have positively
Proteins 7
charged, titrated amino groups. Since the pKa of the imidazolium side chain of
histidine is about 6.0, we expect to find a mixture of uncharged and charged side
chains here, with the uncharged species predominating at pH 7.0.
Amino acids are commonly represented by three-letter abbreviations, for example
Pro for proline. There is also an even more succinct one-letter abbreviation or code to
represent each amino acid. These abbreviations, the structures of the amino acids, and
their isoelectric points (the pH at which the amino acid has no net electrical charge,
i.e., a balance of positive and negative charge has been struck) are summarized in
Table 2.1.
8 Pharmaceutical Biotechnology
TABLE 2.1
Proteins 9
䊝 H H
H3 N
C Nδ+ R CO䊞
R C 2
C
Oδ− H
FIGURE 2.1 The peptide bond, showing the partial double bond character of the amide
linkage and the development of partial charges on the nitrogen and oxygen atoms.
side chain in proline, the cis and trans isomers are more nearly equal in terms of
energy, and the cis isomer occurs relatively frequently.
The constraints on rotation are much weaker for the carbon–carbon bond joining
an amino acid’s α-carbon and its carbonyl carbon; this bond enjoys a great deal of
rotational freedom, especially by comparison to the highly restricted amide bond.
Rotational freedom is also enjoyed by the nitrogen–carbon bond that joins the amide
nitrogen and the adjacent α-carbon. (The exception here is again proline because of
its ring structure.) Thus, of the three covalent bonds contributed by each amino acid
to the protein backbone, one is strongly constrained in terms of rotation while the
other two bonds are relatively free to rotate. This rotational freedom in each amino
acid allows protein chains to wind about in a large number of conformations. Some
of these conformations will be energetically preferred over others, however, leading
to the formation of secondary and higher levels of structure for the polypeptide chain.
The side chains of cysteine residues contain a terminal thiol (−SH) group. These
thiols are sensitive to oxidation/reduction reactions, and can form covalent disulfide
(-S-S-) bridges among themselves. The two joined cysteine residues are then said
to be combined into a single cystine unit. There may be one or several of these
disulfide bridges present in a polypeptide. These disulfide bridges can be formed
between cysteine residues that may be separated by tens or hundreds of residues
along the polypeptide chain, and they can thus bring two distant regions of the
polypeptide chain into close spatial proximity, a factor that may be quite important
in determining the overall shape of the protein. Disulfide bridges can also be formed
between cysteine residues on separate polypeptide chains, and can serve to hold two
chains together covalently.
For naturally occurring proteins the polypeptide chains will have between 50
and 2000 amino acids joined together. On the average an amino acid residue in these
chains will have a molecular weight of about 110, so the typical polypeptide chain
will have a molecular weight between 5500 and 220,000.
10 Pharmaceutical Biotechnology
Carboxy
terminus
R7
R6
R5
R4
R3
R2
Amino R1
terminus
FIGURE 2.2 The alpha (α) helix, showing the pattern of intrachain hydrogen bonds that
stabilize the structure, and the radial extension of amino acid side chains from the helix axis.
(Adapted from Richardson, J.S. [1981]. The anatomy and taxonomy of protein structure.
Advances in Protein Chemistry, 34, 167—339, copyright 1981, with permission from Elsevier
Science.)
Proteins 11
the nitrogen–carbon bond in the pyrrolidine ring of proline cannot rotate into the
proper conformation to keep up the hydrogen bonding network (proline can some-
times be found at the end of an α helix, however).
β sheets come in two types, and both involve almost complete extension of the
protein backbone, in contrast to the α helix. In the parallel β sheet, two or more
peptide chains align their backbones in the same general direction such that the
hydrogen bonding moieties of the amide linkages on adjacent chains line up in a
complementary fashion with donors opposing acceptors. Again, there is a hydrogen
bonding network involving carbonyl oxygens and amide hydrogens, but the bonding
is between chains, not along the same chain as was the case with the α helix. The
antiparallel β sheet is quite similar to the parallel type, with adjacent chains having
parallel but opposite orientations. Frequently, a long peptide chain will fold back
on itself in such a way that two or more regions of the chain will line up next to
each other to form either a parallel or an antiparallel β sheet structure. For either
kind of β sheet structure the side chains will project away from (up or down from)
the plane of the sheet. The two types of β sheet are represented in Figure 2.3.
As with the α helix, some amino acids are found more frequently in β sheets
than are others. Because of packing constraints, small nonpolar amino acids are the
most frequently found here, including glycine, alanine, and serine along with some
others. Proline is sometimes found in β sheets, but tends to disrupt them. This is
because the pyrrolidine ring of proline constrains the protein backbone from adopting
the almost completely extended conformation required to form the β sheet. Amino
acids with bulky or charged side chains also disrupt the packing and alignment
FIGURE 2.3 The antiparallel and the parallel beta (β) sheet or ribbon, showing the pattern
of interchain hydrogen bonds and the protrusion out of the ribbon’s plane of the amino acid
side chains (denoted by R). The arrows indicate the relative directions of the peptide chains.
12 Pharmaceutical Biotechnology
needed to form a β sheet. Thus glutamate, aspartate, arginine, tryptophan, and some
others are not common in either type of β sheet.
The third major type of secondary structure is the β bend or reverse turn, where
the polypeptide chain turns back on itself (for example, to form an antiparallel β
sheet). There are actually several different subtypes of reverse turn that differ in
details of the bond angles in the participating amino acids. Such turns typically
extend over four adjacent amino acids in the chain, with hydrogen bonding of the
carbonyl oxygen of the first residue in the turn with the amide hydrogen of the fourth
residue. Proline is often found as the second residue in these turns; its rigid imino
ring helps to bend the backbone of the chain. Glycine is often found as the third
residue in β bends, because its hydrogen side chain offers little steric repulsion to
the tight packing required in this region of the turn.
Fibrillar collagen, the fibrous protein found in connective tissue, has its own
characteristic primary and higher levels of structure. A strand of fibrillar collagen
is composed of three peptide chains. Each chain typically contains glycine in every
third position, and contains a high proportion of proline, lysine, hydroxyproline, and
hydroxylysine residues. (Hydroxyproline is a modified proline with a hydroxy group
on the ring at the 4 position, while hydroxylysine carries a hydroxyl group on the
number 5 carbon.) The primary structure can be written as Gly-X-Y, where the X
position often contains proline. The hydroxylated amino acids are formed after
biosynthesis of the peptide chain by enzymatic hydroxylation of the corresponding
unmodified amino acids. Enzyme specificity restricts these unusual amino acid
derivatives to the Y position in the Gly-X-Y sequence.
In fibrillar collagen, each participating amino acid chain forms a left-handed helix
(unlike the α helix, which is right-handed), then three of these chains wind about
each other to form a right-handed super helix. This triple-stranded super helix is
stabilized by hydrogen bonds between adjacent peptide strands (again different from
the α helix, where the hydrogen bonds are all among residues along the same strand),
and by covalent chemical cross-links that are formed after the individual chains have
wound around each other. These cross-links are Schiff base linkages between the side
chains of unmodified and modified lysine residues of adjacent peptide strands. Over-
all, the combination of crosslinking and superhelical structure makes for a protein
that is mechanically very strong, one that is rigid and resists bending and stretching—
very desirable characteristics in a protein used in connective tissue.
Proteins 13
FIGURE 2.4 The levels of protein structure. A. The primary level, showing the peptide
backbone and side chains. B. Elements of secondary structure, an α helix and three strands
of an antiparallel β sheet. C. The tertiary structure of hen egg white lysozyme, showing the
packing of α helices and β sheet structures. D. An example of quaternary structure: the dimer
of glycerol phosphate dehydrogenase from E. coli. (Images for B, C, and D created using the
Swiss protein data bank viewer spdbv version 3.7 and protein data bank files 1HEW and 1dc6.)
up independently of other regions of the peptide chain. Domains are of course still
connected one to another by the peptide backbone. Once formed, two or more
domains may pack together in a characteristic fashion that defines the overall tertiary
structure of that region of the protein.
Quaternary structure refers to the specific aggregation or association of separate
protein chains to form a well-defined structure. Part D of Figure 2.4 compares the
quaternary structure of a dimeric protein (two polypeptide chains) to the lower levels
of protein structure. The separate protein chains are often referred to as subunits or
monomers; these subunits may be identical or may be of quite different sequence
14 Pharmaceutical Biotechnology
and structure. The forces holding the subunits together are weak, noncovalent inter-
actions. These include hydrophobic interactions, hydrogen bonds, and van der Waals
interactions. Because of the large number of such possible interactions between two
protein surfaces, the aggregate can be quite stable, despite the weakness of any single
noncovalent interaction. Furthermore, the association can be quite specific in match-
ing protein surfaces to one another: because of the short range of the stabilizing
interactions, surfaces that do not fit closely against one another will lack a large
fraction of the stabilization enjoyed by those protein surfaces that do fit snugly
together. Thus, mismatched subunits will not form aggregates that are as stable as
subunits that are properly matched.
It is common practice to include under the umbrella of quaternary structure other
kinds of complexes between biopolymers. For example, the complex of DNA with
histones to form nucleosomes may be said to have quaternary structure, the DNA
also being regarded as a component subunit.
PROTEIN BIOSYNTHESIS
MESSENGER RNA AND RNA POLYMERASE
The primary repository of genetic information in the cell is DNA. For this informa-
tion to be expressed in the form of an enzyme, an antibody, or other protein, an
intermediate between the protein and the DNA is used. This intermediate is com-
posed of RNA, and (after some intermediate processing) is called messenger RNA,
or mRNA. The general flow of information, from DNA to RNA and finally to protein,
was summarized by Francis Crick as the “Central Dogma” for molecular biology.
Figure 2.5 presents schematically the main features of the Central Dogma as applied
to a eukaryotic cell with a nucleus.
For a given gene, the corresponding mRNA is complementary to (matches the
base pairing of) the strand of the DNA where the information in the gene is stored.
The conventions on nucleic acid nomenclature, nucleic acid structure, and the Wat-
son-Crick rules for complementary base-pairing are given in Chapter 3.
The copying of the template DNA into a strand of RNA is called transcription,
and the RNA that results is referred to as a transcript. It is conventional to write
double-stranded DNA sequences from left to right, with the top strand having its 5′
end on the left and its 3′ end on the right (the numbering convention here refers to
particular carbons of the nucleotide sugar moieties). Also by convention, the bottom
DNA strand is the template for the RNA transcript; this is the DNA strand that is
complementary in sequence to the RNA transcript. The RNA transcript is identical
in sequence to the top (or “coding”) DNA strand, except that the RNA contains the
base uracil where the coding DNA contains thymine, and the sugar deoxyribose of
DNA is replaced by ribose in RNA.
Enzymatic synthesis of RNA from a DNA template runs along the template
strand toward its 5′ end. In a gene, this corresponds to movement in the 5′ to 3′
direction along the complementary coding strand. The 5′ to 3′ direction on the coding
Proteins 15
FIGURE 2.5 The “Central Dogma”: Double-stranded DNA is transcribed to messenger RNA
(in eukaryotes, with processing of the transcript), which in turn is translated by the ribosome
into the chain of amino acids making up a protein.
16 Pharmaceutical Biotechnology
MRNA PROCESSING
In prokaryotes (single-cell organisms lacking organelles), the mRNA can immedi-
ately be used to direct protein synthesis. This is not so for eukaryotes (higher
organisms consisting of nucleated cells), where the mRNA receives extensive pro-
cessing before it is ready to be used in protein synthesis. This processing includes
removal of certain RNA sequences, the addition of RNA bases at the ends of the
molecule (these additional bases are not directly specified by the transcribed gene),
and the chemical modification of certain bases in the RNA.
In eukaryotes, transcription takes place in the cell’s nucleus (Figure 2.5). The
primary RNA transcript often includes copies of DNA regions that do not code for
protein sequences; these DNA regions are the so-called intervening regions, or
introns, found in many eukaryotic genes. The DNA regions that actually code for
amino acid sequences are known as exons (expressible regions). The parts of the
RNA transcript corresponding to the introns must be removed before the RNA can
faithfully direct synthesis of the protein from the copy of the exons. The process of
removing the intron RNA is referred to as splicing, and it occurs before the transcript
leaves the nucleus. Figure 2.6 presents this series of operation in RNA processing.
There are further modifications of the transcript, however. To protect the tran-
script from being digested by RNA nucleases outside the nucleus, the RNA gains a
run of polyadenylic acid residues (known as a poly A tail) at the 3′ end. At the 5′
end, the transcript gains a “cap” of a guanine residue attached via an unusual 5-5
triphosphate link (Figure 2.7). Furthermore, this guanine is usually methylated at
the N7 position, and in some cases the sugar residues of the first one or two bases
in the original transcript are methylated at the 2′ hydroxyl group. Besides protecting
the RNA transcript against degradation, the cap serves as a binding point for the
ribosome. Finally, after all these processing steps, the transcript RNA is ready to be
FIGURE 2.6 Processing of mRNA in the eukaryotic cell, emphasizing the splicing of the
messenger with excision of intervening sequences (introns) away from the RNA that codes
for amino acid sequences (exons). The reversed G indicates the methylated guanine cap found
at the 5′ end of processed messenger RNA (mRNA), and the symbol An indicates the run of
adenine residues that form a tail at the 3′ end of the mRNA.
Proteins 17
FIGURE 2.7 The “cap” structure found in eukaryotic cells, showing the modified terminal
nucleotide containing N7-guanine, the unusual linkage of this nucleotide to the rest of the
chain, and the methylation of two neighboring sugar groups.
used in protein biosynthesis, and it passes out of the nucleus to the endoplasmic
reticulum or into the cytosol as a fully processed messenger RNA.
18 Pharmaceutical Biotechnology
TABLE 2.2
The Genetic Code
First Third
Base Second Base Second Base Second Base Second Base Base
(5′-end) U C A G (3′-end)
codons for some amino acids. This is sometimes referred to as “degeneracy” in the
genetic code.
Since a messenger RNA is an exact copy of a gene’s coding strand (with uracil
replacing thymine), the mRNA carries a copy of the codons that determine the amino
acid sequence in the protein as specified by the gene. The codons here are in RNA
form, not in DNA form. This genetic information can be read from the mRNA by
a ribosome, codon by codon, as the ribosome covalently joins together the corre-
sponding amino acids, in exactly the order of the codons on the original coding
strand of DNA.
A standard version of the genetic code for the triplet codons on a messenger
RNA is given in Table 2.2.
Proteins 19
3’ End 5’ End
Anticodon
FIGURE 2.8 The general structure of transfer RNA (tRNA) showing the phosphodiester
backbone as a ribbon, with bases or base pairs projecting from the backbone. (Adapted from
Rich, A. [1977]. Three-dimensional structure and biological function of transfer RNA
Accounts of Chem. Res., 10, 388—396, copyright 1977, American Chemical Society, with
permission from Accounts of Chemical Research.)
(called stems) and short single-stranded loops at the ends of most of the stems. The
overall spatial shape resembles a lumpy L (Figure 2.8).
At one end of the tRNA strand there is a short stem with a protruding single-
stranded region that has a free 3′ hydroxyl group. This is the site where an amino
acid is attached covalently via its carboxyl group. The joining of the proper amino
acid to its corresponding tRNA is catalyzed by a special enzyme, one of the ami-
noacyl-tRNA synthetases. There is a specific synthetase for joining each type of
amino acid and its corresponding tRNA. This avoids the problem of having, say, an
alanine residue attached to the tRNA for glutamine, which might lead to insertion
of the wrong amino acid residue in a protein and the consequent loss of activity or
specificity in that protein.
On each of the tRNA molecules, one of the single-stranded loops contains a
trinucleotide sequence that is complementary to the triplet codon sequence used in
the genetic code to specify a particular amino acid. This loop on the tRNA is known
as the anticodon loop, and it is used to match the tRNA with a complementary codon
on the mRNA. In this way the amino acids carried by the tRNA molecules can be
aligned in the proper sequence for polymerization into a functional protein.
20 Pharmaceutical Biotechnology
for simplicity this discussion will be centered on the synthesis of proteins in the bacterium
Escherichia coli. The general organization and functions are the same, however, in both
prokaryotic and eukaryotic systems.
All ribosomes have two subunits, and each subunit contains several protein
chains and one or more chains of RNA (ribosomal RNA, or rRNA). In the ribosome
from E. coli, the smaller of the two subunits is known as the 30S subunit and the
larger is referred to as the 50S subunit. (The unit S stands for Svedberg, a measure
of how rapidly a particle sediments in a centrifuge.) The two subunits combine to
form the active 70S ribosomal assembly. The special RNA molecules that are a part
of the ribosome are quite distinct from messenger or transfer RNA molecules, and
they play important roles in forming the overall ribosomal quaternary structure and
in aligning mRNA and tRNA molecules during protein biosynthesis.
At one point or another during protein synthesis, several other proteins will be
associated with the ribosome. These include factors that help in initiating the syn-
thetic process, others that help in elongating the peptide chain, and yet others that
play a role in terminating the synthesis of a peptide chain. Beyond this, there is also
the mRNA to consider, as well as the aminoacylated tRNA molecules. Finally, since
protein biosynthesis consumes energy, there is the hydrolysis of ATP and GTP to
AMP and GDP, respectively, by the ribosome.
Proteins 21
(RF-1, RF-2, and RF-3 in E. coli) help to dissociate the newly synthesized protein
chain, the mRNA and the ribosomal subunits from one another. The mRNA is free
to be used in another cycle of protein synthesis, as are the ribosomal subunits.
Protein synthesis can be carried out by ribosomes free in the cytosol. In eukary-
otes, ribosomes also carry out protein synthesis while bound to the surface of the
endoplasmic reticulum. In addition, a given mRNA molecule usually has more than
one active ribosome translating it into protein; an assembly of several ribosomes on
a single mRNA is called a polyribosome, or polysome for short.
PROTEIN MODIFICATION
TYPES OF MODIFICATION
During protein synthesis and afterwards, proteins can undergo substantial covalent
modification. The types(s) and extent of modification will depend on the protein,
and often play an important role in the biological function of the protein. In general,
protein modifications can be divided into two major classes: reactions on the side
chains of the amino acids, and cleavages of the peptide backbone.
The side chains of the amino acids may be modified by
One very important type of modification involves the joining of sulfhydryl groups
from two cysteine residues into a disulfide bridge, as mentioned earlier in the dis-
cussion of the primary level of protein structure. The pairing of the cysteines for
bridge formation is done quite specifically, with help from specialized enzyme sys-
tems. Disulfide bridge formation frequently occurs during the folding of the protein.
Some are even formed on an incompletely synthesized chain during translation.
Bridge formation is an important factor in guiding the folding process toward the
correct product. The bridges help to stabilize the protein against denaturation and
loss of activity. Proteins with mispaired cysteines and incorrectly formed disulfide
bridges are often folded in the wrong fashion and are usually biologically inactive.
Backbone cleavages may involve removal of the N-terminal formyl moiety from
the methionine, the removal of one or more amino acids at either the N- or C-
terminus of the chain, and cuts in the backbone at one or more sites, possibly with
the removal of internal peptide sequences.
Besides these modifications, certain proteins can acquire prosthetic groups (e.g.,
hemes, flavins, iron-sulfur centers, and others). The prosthetic groups may be
attached covalently (usually to amino acid side chains) in some cases, noncovalently
22 Pharmaceutical Biotechnology
in others, but in either case they cause a substantial change in the properties of the
protein.
PROTEIN STABILITY
THE IMPORTANCE OF NONCOVALENT INTERACTIONS
Although a protein’s primary structure is determined by covalent bonds along the
peptide backbone, the secondary and higher levels of structure depend in large part
Proteins 23
24 Pharmaceutical Biotechnology
FIGURE 2.10 Weak interactions that stabilize protein structures. A, Hydrophobic interac-
tions; B, hydrogen bonding; C, ionic interactions.
bonds have a range around 2 Å (0.2 nm). The three atoms participating in the
interaction typically are collinear with a bond angle of 180°, though distortions of
the bond angle of 20°–30° are common. A hydrogen bond would typically cost 2
to 9 kcal/mol (8 to 38 kJ/mol) to disrupt.
Electrostatic interactions can be either attractive or repulsive, depending on the
charges and/or the orientation of dipoles involved; charges of opposite sign attract
and like charges repel one another. The strength of the interaction depends on the
local dielectric constant, and this in turn is strongly dependent on the nature and
organization of the medium surrounding the interacting species. For example, in the
interior of a protein the dielectric constant may vary from 2 to 40, so that electrostatic
interactions are very strong there, while in an aqueous environment the interactions
(with the same species, distances, and orientations) can be up to 40 times weaker
because the dielectric constant is around 80. The interaction energy for a pair of
Proteins 25
26 Pharmaceutical Biotechnology
follows. The energy changes in a system at constant temperature and pressure are
conveniently represented by ∆H, the change in enthalpy, while the changes in order
or organization can be represented by ∆S, the change in entropy. As is well known,
∆H and ∆S can be combined into a single thermodynamic function, the free energy
change ∆G:
∆G = ∆H T∆S
Proteins 27
folds up. (Hydration of the folded protein is limited mainly to the protein’s surface
and its crevices and corrugations.) Exchange interactions put an upper limit on how
tightly the amino acids can be packed in the interior of a folded protein, and of
course play a role in the constraints on secondary and tertiary structure. Solvation
effects favor the placement of highly polar and ionic groups where they are exposed
to solvent and not buried in the (generally nonpolar) interior of a protein.
28 Pharmaceutical Biotechnology
TABLE 2.3
Thermodynamic Data on the Denaturation of Selected Proteinsa
Molecular
Protein Weight ∆H (298 K) ∆S (298 K) ∆CP (323 K)
respect to secondary structure, and the prediction of tertiary structure is often a good
deal less accurate. But a great deal of progress has been made and a general
mechanism for protein folding has emerged.
Biophysical studies have shown that many denatured proteins can spontaneously
refold in vitro, upon removal of a denaturing agent (urea, detergent, acid, and so
on). Certain enzymes and other proteins can accelerate the folding process in vitro,
and it has been concluded that their role in vivo is to prevent misfolding or aggre-
gation. However, these protein factors serve more to facilitate the folding process
than to specify a particular spatial shape for the product.
The biophysical studies have also shown that folding is a relatively rapid process,
with folding being completed in a matter of seconds to perhaps a few minutes in
most cases. This rules out a “trial and error” approach to folding, where the protein
randomly folds into all possible conformations in a search for the most stable
conformation. For a typical protein of 100 residues, with about 10 different confor-
mations for each amino acid residue, the total number of conformations would be
around 10100, far too many to be searched in a reasonable amount of time.
Instead of proceeding at random, protein folding follows a stepwise process,
with three main steps. These are: (1) formation of unstable, fluctuating regions of
secondary structure; (2) aggregation or collapse of these embryonic structures into
a more compact intermediate structure or structures; and (3) rearrangement or adjust-
ment of the intermediate(s) into the final conformation (or family of conformations)
that we identify as the native structure. From in vitro studies it is known that the
first step is very fast, typically occurring within 0.01 seconds. The second step is
fast, taking about 1 second for most proteins; while the third step can be rather slow
in vitro, running from a few seconds up to 2500 seconds. The slowness of this last
step is often connected with isomerization of proline from the cis to the trans isomer,
and what happens in vivo may be considerably faster, thanks to facilitation by proline
isomerase enzymes.
Proteins 29
The solution of the protein folding problem will have wide ramifications. For
example, the completion of the Human Genome Project, with the full DNA sequence
of the human genome, has generated the sequences of many genes whose protein
products are as yet uncharacterized. It would be a great help in deducing the function
of each of these gene products if it were possible to predict the overall three-
dimensional folding of the protein, and thus relate it to possible enzymatic, structural,
or signalling functions in the cell. Eventually, improved diagnostic tools or therapies
would result from this. Also, molecular biology laboratories are developing novel
artificial proteins, not derived from any preexisting gene, in attempts to obtain more
efficient enzymes, enzyme inhibitors, and signalling factors, for use in medicine and
in industrial chemical processes. An effective way of predicting the final folded
structure from a peptide’s sequence would greatly increase the efficiency of these
research and development efforts.
FURTHER READING
Berman, H.M., Goodsell, D.S., and Bourne, P.E. (2003). Protein structures: From famine to
feast. American Scientist, 90, 350–359.
Dill, K.A. and Chan, H.S. (1997). From Levinthal to pathways to funnels. Nature Structural
Biol., 4, 10–19.
King, J., Haase-Pettingell, C., and Gossard, D. (2002). Protein folding and misfolding. Amer-
ican Scientist, 90, 445–453.
Privalov, P.L. and Gill, S.J. (1988). Stability of protein structure and hydrophobic interaction.
Adv. Protein Chem., 39, 191–234.
Stryer, L. (1995). Biochemistry, 4th ed. W. H. Freeman, New York.