Chapter 02 Proteins

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

PH1873_C002.

fm Page 5 Friday, June 24, 2005 11:35 AM

2 Proteins
Charles P. Woodbury, Jr., Ph.D.

CONTENTS

Protein Structure ........................................................................................................5


Classification of Proteins...................................................................................5
Amino Acids ......................................................................................................6
Primary Protein Structure ..................................................................................7
Protein Secondary Structure ..............................................................................9
Tertiary and Quaternary Structure in Globular Proteins.................................12
Protein Biosynthesis ................................................................................................14
Messenger RNA and RNA Polymerase ..........................................................14
mRNA Processing............................................................................................16
The Genetic Code............................................................................................17
Activated Amino Acids and Transfer RNA ....................................................18
The Ribosome and Associated Factors ...........................................................19
Message Translation and Protein Synthesis....................................................20
Protein Modification ................................................................................................21
Types of Modification......................................................................................21
Protein Transport and Modification ................................................................22
Protein Stability .......................................................................................................22
The Importance of Noncovalent Interactions..................................................22
Types of Noncovalent Interactions..................................................................23
The Hydrophobic Effect ..................................................................................25
Protein Folding and Noncovalent Interactions................................................26
Thermodynamics and Kinetics of Protein Folding.........................................27
Further Reading .......................................................................................................29

PROTEIN STRUCTURE
CLASSIFICATION OF PROTEINS
Proteins are often put into one of two categories, globular or fibrous, on the basis
of their overall structure. Globular proteins have compactly folded structures and
tend to resemble globes or spheroids in overall shape. Some are soluble in water
and can function in the cytosol or in extracellular fluids; other globular proteins are
closely associated with lipid bilayers, being buried in part or in whole in the
biological membranes where they function. Globular proteins include enzymes,

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 6 Friday, June 24, 2005 11:35 AM

6 Pharmaceutical Biotechnology

antibodies, membrane-bound receptors, and so on. Fibrous proteins, in comparison,


are generally insoluble in water or in lipid bilayers, and they have extended confor-
mations, often forming rodlike structures or filaments. In connection with their
solubility, fibrous proteins are generally found as aggregates. Their major biological
role is a structural or mechanical one, to support or connect cells or tissues. These
proteins include keratin (a major component of skin and hair), collagen (in connec-
tive tissue), fibroin (the protein found in silk), and many others.
Another way of classifying proteins is by their biological function. Viewed this
way, proteins can be put into one of several categories. Below are some of these
categories and one or more examples for each category.

• Structural support and connection: Collagen, elastin


• Catalysis: Enzymes such as alcohol dehydrogenase, acetyl cholinesterase,
or lysozyme
• Communication: Peptide hormones, hormone receptors
• Defense: Antibodies, toxins from snake venom (these toxins may them-
selves be enzymes)
• Transport: Hemoglobin, albumin, ion “pumps”
• Energy capture: Rhodopsin
• Mechanical work: Actin, myosin, tubulin, dynein

AMINO ACIDS
The building blocks of proteins are the 20 different naturally occurring amino acids.
These amino acids each have a central carbon atom, the alpha (α)-carbon, to which
are attached a carboxyl group, an amino group (or in the case of proline, an imino
group), a hydrogen atom, and a side chain. The side chain is the feature that
distinguishes one amino acid from another; the other three groups about the α-carbon
are common to all 20 amino acids. The α-carbon is a chiral center, which means
that each amino acid can exist in either of two enantiomeric forms (denoted by D
and L); the L form predominates in nature.
Based on the properties of the side chains, the 20 amino acids can be put into
six general classes. The first class contains amino acids whose side chains are
aliphatic, and is usually considered to include glycine, alanine, valine, leucine, and
isoleucine. The second class is composed of the amino acids with polar, nonionic
side chains, and includes serine, threonine, cysteine, and methionine. The cyclic
amino acid proline (actually, an imino acid) constitutes a third class by itself. The
fourth class contains amino acids with aromatic side chains: tyrosine, phenylalanine,
and tryptophan. The fifth class has basic groups on the side chains and is made up
of the three amino acids lysine, arginine, and histidine. The sixth class is composed
of the acidic amino acids and their amides: aspartate and asparagine, and glutamate
and glutamine.
The exact ionic state of the side chains in the last two classes will depend on
the pH of the solution. At pH 7.0 the side chains of glutamate and aspartate have
ionized carboxylates, and the side chains of lysine and arginine have positively

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 7 Friday, June 24, 2005 11:35 AM

Proteins 7

charged, titrated amino groups. Since the pKa of the imidazolium side chain of
histidine is about 6.0, we expect to find a mixture of uncharged and charged side
chains here, with the uncharged species predominating at pH 7.0.
Amino acids are commonly represented by three-letter abbreviations, for example
Pro for proline. There is also an even more succinct one-letter abbreviation or code to
represent each amino acid. These abbreviations, the structures of the amino acids, and
their isoelectric points (the pH at which the amino acid has no net electrical charge,
i.e., a balance of positive and negative charge has been struck) are summarized in
Table 2.1.

PRIMARY PROTEIN STRUCTURE


In a protein the amino acids are joined together in a linear order by amide linkages,
also known as peptide bonds. The sequence of the covalently linked amino acids is
referred to as the primary level of structure for the protein. In writing the sequence
of amino acids in a chain, it is conventional to orient the chain so that the amino acid
on the left is the one with a free amino group on its α-carbon, while the last amino
acid on the right is the one whose α-carbon carboxylate is free. In other words, the
amino- or N-terminus of the peptide chain is written on the left, and the carboxyl-
or C-terminus is written on the right. One more convention: the term “backbone” for
a protein refers to the series of covalent bonds joining one α-carbon in a chain to the
next α-carbon.
In a peptide bond, the carboxylate on the α-carbon of one amino acid is joined
to the amino group on the α-carbon of the next amino acid. The peptide bond that
joins the amino acids in a protein has partial double bond character. In a standard
representation of an amide group, the carbonyl oxygen and carbon share a double
bond, while the amide nitrogen carries a lone pair of electrons. Actually, there are
more ways than this to share the electrons among the atoms involved. The result is
a shift of electron density into the region between the carbon and nitrogen atoms,
with development of a partial positive charge on the nitrogen and partial double
bond between the carbon and nitrogen. Because of the electronegativity of the
oxygen in the carbonyl group, another result is a shift in electron density toward
that atom and development of a partial negative charge (Figure 2.1).
The partial double bond character of the carbon–nitrogen bond requires co-
planarity of the carbonyl carbon and oxygen, the amide nitrogen and its hydrogen,
and the α-carbons of both amino acids involved—a total of six atoms. The double
bond character of the linkage would be disrupted by rotation about the bond joining
the amide carbon and nitrogen, and so the peptide linkage resists torsional rotation
and tends to stay planar. Because of the planarity of the peptide link, there are
essentially only two different configurations allowed here, cis and trans. In the cis
configuration the α-carbons of the first and second amino acids are closer together
than they are in the trans configuration. This leads to steric repulsion between the
side chains attached to these two atoms, and so the cis configuration is energetically
much less favorable than the trans. The great majority of amino acids in proteins
are thus found to be joined by peptide linkages in the trans configuration. Proline
is the only major exception to this rule. Because of steric constraints on the cyclic

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 8 Friday, June 24, 2005 11:35 AM

8 Pharmaceutical Biotechnology

TABLE 2.1

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 9 Friday, June 24, 2005 11:35 AM

Proteins 9

䊝 H H
H3 N
C Nδ+ R CO䊞
R C 2
C
Oδ− H

FIGURE 2.1 The peptide bond, showing the partial double bond character of the amide
linkage and the development of partial charges on the nitrogen and oxygen atoms.

side chain in proline, the cis and trans isomers are more nearly equal in terms of
energy, and the cis isomer occurs relatively frequently.
The constraints on rotation are much weaker for the carbon–carbon bond joining
an amino acid’s α-carbon and its carbonyl carbon; this bond enjoys a great deal of
rotational freedom, especially by comparison to the highly restricted amide bond.
Rotational freedom is also enjoyed by the nitrogen–carbon bond that joins the amide
nitrogen and the adjacent α-carbon. (The exception here is again proline because of
its ring structure.) Thus, of the three covalent bonds contributed by each amino acid
to the protein backbone, one is strongly constrained in terms of rotation while the
other two bonds are relatively free to rotate. This rotational freedom in each amino
acid allows protein chains to wind about in a large number of conformations. Some
of these conformations will be energetically preferred over others, however, leading
to the formation of secondary and higher levels of structure for the polypeptide chain.
The side chains of cysteine residues contain a terminal thiol (−SH) group. These
thiols are sensitive to oxidation/reduction reactions, and can form covalent disulfide
(-S-S-) bridges among themselves. The two joined cysteine residues are then said
to be combined into a single cystine unit. There may be one or several of these
disulfide bridges present in a polypeptide. These disulfide bridges can be formed
between cysteine residues that may be separated by tens or hundreds of residues
along the polypeptide chain, and they can thus bring two distant regions of the
polypeptide chain into close spatial proximity, a factor that may be quite important
in determining the overall shape of the protein. Disulfide bridges can also be formed
between cysteine residues on separate polypeptide chains, and can serve to hold two
chains together covalently.
For naturally occurring proteins the polypeptide chains will have between 50
and 2000 amino acids joined together. On the average an amino acid residue in these
chains will have a molecular weight of about 110, so the typical polypeptide chain
will have a molecular weight between 5500 and 220,000.

PROTEIN SECONDARY STRUCTURE


Secondary structure refers to regularities or repeating features in the conformation
of the protein chain’s backbone. Four major types of secondary structure in proteins
are: (1) the alpha (α) helix, formed from a single strand of amino acids; (2) the beta
(β) sheet, formed from two or more amino acid strands (from either the same chain
or from different chains); (3) the beta (β) bend or reverse turn, in a single strand;
and (4) the collagen helix, composed of three strands of amino acids.

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 10 Friday, June 24, 2005 11:35 AM

10 Pharmaceutical Biotechnology

Carboxy
terminus

R7

R6

R5
R4

R3
R2

Amino R1
terminus

FIGURE 2.2 The alpha (α) helix, showing the pattern of intrachain hydrogen bonds that
stabilize the structure, and the radial extension of amino acid side chains from the helix axis.
(Adapted from Richardson, J.S. [1981]. The anatomy and taxonomy of protein structure.
Advances in Protein Chemistry, 34, 167—339, copyright 1981, with permission from Elsevier
Science.)

The α helix most commonly found in proteins is represented in Figure 2.2.


This is a right-handed helix built from L-amino acids. The α helix is stabilized
principally by a network of hydrogen bonds. These hydrogen bonds link amino
acids that are otherwise separated along the protein chain. Specifically, an amino
acid residue at position i in the chain will be hydrogen bonded to the amino acid
at position i + 3; the hydrogen bond is between the carbonyl oxygen of residue i
and the amide hydrogen of residue i + 3, and the bond is nearly parallel to the long
axis of the helix. This interaction among neighboring residues compacts the
polypeptide chain by comparison to a fully extended conformation. Functional groups
on the amino acid side chains do not participate in the network of stabilizing hydrogen
bonds. Instead, the side chains extend radially outward as the backbone forms the
helical coil.
Some amino acids are found more frequently in α helices than are others. Some
amino acids often found in α helices are alanine, phenylalanine, and leucine, since
it is relatively easy for these to rotate into the proper conformation without steric
clash or other repulsions between the side chains. Some other amino acids are rarely
found in α helices, primarily because of such unfavorable interactions; these include
arginine and glutamate. Also, proline is not found in the middle of helices because

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 11 Friday, June 24, 2005 11:35 AM

Proteins 11

the nitrogen–carbon bond in the pyrrolidine ring of proline cannot rotate into the
proper conformation to keep up the hydrogen bonding network (proline can some-
times be found at the end of an α helix, however).
β sheets come in two types, and both involve almost complete extension of the
protein backbone, in contrast to the α helix. In the parallel β sheet, two or more
peptide chains align their backbones in the same general direction such that the
hydrogen bonding moieties of the amide linkages on adjacent chains line up in a
complementary fashion with donors opposing acceptors. Again, there is a hydrogen
bonding network involving carbonyl oxygens and amide hydrogens, but the bonding
is between chains, not along the same chain as was the case with the α helix. The
antiparallel β sheet is quite similar to the parallel type, with adjacent chains having
parallel but opposite orientations. Frequently, a long peptide chain will fold back
on itself in such a way that two or more regions of the chain will line up next to
each other to form either a parallel or an antiparallel β sheet structure. For either
kind of β sheet structure the side chains will project away from (up or down from)
the plane of the sheet. The two types of β sheet are represented in Figure 2.3.
As with the α helix, some amino acids are found more frequently in β sheets
than are others. Because of packing constraints, small nonpolar amino acids are the
most frequently found here, including glycine, alanine, and serine along with some
others. Proline is sometimes found in β sheets, but tends to disrupt them. This is
because the pyrrolidine ring of proline constrains the protein backbone from adopting
the almost completely extended conformation required to form the β sheet. Amino
acids with bulky or charged side chains also disrupt the packing and alignment

FIGURE 2.3 The antiparallel and the parallel beta (β) sheet or ribbon, showing the pattern
of interchain hydrogen bonds and the protrusion out of the ribbon’s plane of the amino acid
side chains (denoted by R). The arrows indicate the relative directions of the peptide chains.

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 12 Friday, June 24, 2005 11:35 AM

12 Pharmaceutical Biotechnology

needed to form a β sheet. Thus glutamate, aspartate, arginine, tryptophan, and some
others are not common in either type of β sheet.
The third major type of secondary structure is the β bend or reverse turn, where
the polypeptide chain turns back on itself (for example, to form an antiparallel β
sheet). There are actually several different subtypes of reverse turn that differ in
details of the bond angles in the participating amino acids. Such turns typically
extend over four adjacent amino acids in the chain, with hydrogen bonding of the
carbonyl oxygen of the first residue in the turn with the amide hydrogen of the fourth
residue. Proline is often found as the second residue in these turns; its rigid imino
ring helps to bend the backbone of the chain. Glycine is often found as the third
residue in β bends, because its hydrogen side chain offers little steric repulsion to
the tight packing required in this region of the turn.
Fibrillar collagen, the fibrous protein found in connective tissue, has its own
characteristic primary and higher levels of structure. A strand of fibrillar collagen
is composed of three peptide chains. Each chain typically contains glycine in every
third position, and contains a high proportion of proline, lysine, hydroxyproline, and
hydroxylysine residues. (Hydroxyproline is a modified proline with a hydroxy group
on the ring at the 4 position, while hydroxylysine carries a hydroxyl group on the
number 5 carbon.) The primary structure can be written as Gly-X-Y, where the X
position often contains proline. The hydroxylated amino acids are formed after
biosynthesis of the peptide chain by enzymatic hydroxylation of the corresponding
unmodified amino acids. Enzyme specificity restricts these unusual amino acid
derivatives to the Y position in the Gly-X-Y sequence.
In fibrillar collagen, each participating amino acid chain forms a left-handed helix
(unlike the α helix, which is right-handed), then three of these chains wind about
each other to form a right-handed super helix. This triple-stranded super helix is
stabilized by hydrogen bonds between adjacent peptide strands (again different from
the α helix, where the hydrogen bonds are all among residues along the same strand),
and by covalent chemical cross-links that are formed after the individual chains have
wound around each other. These cross-links are Schiff base linkages between the side
chains of unmodified and modified lysine residues of adjacent peptide strands. Over-
all, the combination of crosslinking and superhelical structure makes for a protein
that is mechanically very strong, one that is rigid and resists bending and stretching—
very desirable characteristics in a protein used in connective tissue.

TERTIARY AND QUATERNARY STRUCTURE IN GLOBULAR PROTEINS


Tertiary structure is defined by the packing in space of the various elements of
secondary structure. A globular protein may have several α helices and two or more
regions of β sheet structure that are all tightly packed together in a way characteristic
of the protein. This regularity in packing of α helices and β sheets, together with
the β bends, constitutes the third level of structure of proteins. Parts A, B, and C of
Figure 2.4 compare the primary, secondary, and tertiary levels of structure for a
polypeptide chain.
In large proteins, tertiary structures can often be divided into domains. A domain
is a region of a single peptide chain with a relatively compact structure; it has folded

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 13 Friday, June 24, 2005 11:35 AM

Proteins 13

FIGURE 2.4 The levels of protein structure. A. The primary level, showing the peptide
backbone and side chains. B. Elements of secondary structure, an α helix and three strands
of an antiparallel β sheet. C. The tertiary structure of hen egg white lysozyme, showing the
packing of α helices and β sheet structures. D. An example of quaternary structure: the dimer
of glycerol phosphate dehydrogenase from E. coli. (Images for B, C, and D created using the
Swiss protein data bank viewer spdbv version 3.7 and protein data bank files 1HEW and 1dc6.)

up independently of other regions of the peptide chain. Domains are of course still
connected one to another by the peptide backbone. Once formed, two or more
domains may pack together in a characteristic fashion that defines the overall tertiary
structure of that region of the protein.
Quaternary structure refers to the specific aggregation or association of separate
protein chains to form a well-defined structure. Part D of Figure 2.4 compares the
quaternary structure of a dimeric protein (two polypeptide chains) to the lower levels
of protein structure. The separate protein chains are often referred to as subunits or
monomers; these subunits may be identical or may be of quite different sequence

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 14 Friday, June 24, 2005 11:35 AM

14 Pharmaceutical Biotechnology

and structure. The forces holding the subunits together are weak, noncovalent inter-
actions. These include hydrophobic interactions, hydrogen bonds, and van der Waals
interactions. Because of the large number of such possible interactions between two
protein surfaces, the aggregate can be quite stable, despite the weakness of any single
noncovalent interaction. Furthermore, the association can be quite specific in match-
ing protein surfaces to one another: because of the short range of the stabilizing
interactions, surfaces that do not fit closely against one another will lack a large
fraction of the stabilization enjoyed by those protein surfaces that do fit snugly
together. Thus, mismatched subunits will not form aggregates that are as stable as
subunits that are properly matched.
It is common practice to include under the umbrella of quaternary structure other
kinds of complexes between biopolymers. For example, the complex of DNA with
histones to form nucleosomes may be said to have quaternary structure, the DNA
also being regarded as a component subunit.

PROTEIN BIOSYNTHESIS
MESSENGER RNA AND RNA POLYMERASE
The primary repository of genetic information in the cell is DNA. For this informa-
tion to be expressed in the form of an enzyme, an antibody, or other protein, an
intermediate between the protein and the DNA is used. This intermediate is com-
posed of RNA, and (after some intermediate processing) is called messenger RNA,
or mRNA. The general flow of information, from DNA to RNA and finally to protein,
was summarized by Francis Crick as the “Central Dogma” for molecular biology.
Figure 2.5 presents schematically the main features of the Central Dogma as applied
to a eukaryotic cell with a nucleus.
For a given gene, the corresponding mRNA is complementary to (matches the
base pairing of) the strand of the DNA where the information in the gene is stored.
The conventions on nucleic acid nomenclature, nucleic acid structure, and the Wat-
son-Crick rules for complementary base-pairing are given in Chapter 3.
The copying of the template DNA into a strand of RNA is called transcription,
and the RNA that results is referred to as a transcript. It is conventional to write
double-stranded DNA sequences from left to right, with the top strand having its 5′
end on the left and its 3′ end on the right (the numbering convention here refers to
particular carbons of the nucleotide sugar moieties). Also by convention, the bottom
DNA strand is the template for the RNA transcript; this is the DNA strand that is
complementary in sequence to the RNA transcript. The RNA transcript is identical
in sequence to the top (or “coding”) DNA strand, except that the RNA contains the
base uracil where the coding DNA contains thymine, and the sugar deoxyribose of
DNA is replaced by ribose in RNA.
Enzymatic synthesis of RNA from a DNA template runs along the template
strand toward its 5′ end. In a gene, this corresponds to movement in the 5′ to 3′
direction along the complementary coding strand. The 5′ to 3′ direction on the coding

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 15 Friday, June 24, 2005 11:35 AM

Proteins 15

FIGURE 2.5 The “Central Dogma”: Double-stranded DNA is transcribed to messenger RNA
(in eukaryotes, with processing of the transcript), which in turn is translated by the ribosome
into the chain of amino acids making up a protein.

strand is referred to as “downstream” in the gene, while the 3′ to 5′ direction along


the coding strand is described as “upstream.”
The process of transcription runs under quite complicated control mechanisms.
Among other things, these mechanisms direct the RNA synthesis to start and stop
in precise places on the DNA, and they control the rate of mRNA synthesis. In this
way gene expression can be integrated with the metabolic state of the cell, and with
the cell cycle for cellular replication and differentiation.
The synthesis of mRNA is catalyzed by the enzyme RNA polymerase. Bacteria
have only one type of RNA polymerase, which is responsible for virtually all
bacterial RNA polynucleotide synthesis. Higher organisms have three types of RNA
polymerase. Type II RNA polymerase transcribes genes into mRNA molecules and
synthesizes certain small RNAs found in the nucleus. This enzyme is the one closest
in general function to the bacterial RNA polymerase. Type I RNA polymerase in
eukaryotes transcribes genes for certain RNA species that are a part of the ribosome
(ribosomal RNA, or rRNA; see below). Type III RNA polymerase transcribes genes
for certain types of small RNA molecules used in RNA processing and in protein
transport. It also transcribes certain genes involved in protein synthesis, including
the genes for a small ribosomal RNA species and for all the transfer RNA species.

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 16 Friday, June 24, 2005 11:35 AM

16 Pharmaceutical Biotechnology

MRNA PROCESSING
In prokaryotes (single-cell organisms lacking organelles), the mRNA can immedi-
ately be used to direct protein synthesis. This is not so for eukaryotes (higher
organisms consisting of nucleated cells), where the mRNA receives extensive pro-
cessing before it is ready to be used in protein synthesis. This processing includes
removal of certain RNA sequences, the addition of RNA bases at the ends of the
molecule (these additional bases are not directly specified by the transcribed gene),
and the chemical modification of certain bases in the RNA.
In eukaryotes, transcription takes place in the cell’s nucleus (Figure 2.5). The
primary RNA transcript often includes copies of DNA regions that do not code for
protein sequences; these DNA regions are the so-called intervening regions, or
introns, found in many eukaryotic genes. The DNA regions that actually code for
amino acid sequences are known as exons (expressible regions). The parts of the
RNA transcript corresponding to the introns must be removed before the RNA can
faithfully direct synthesis of the protein from the copy of the exons. The process of
removing the intron RNA is referred to as splicing, and it occurs before the transcript
leaves the nucleus. Figure 2.6 presents this series of operation in RNA processing.
There are further modifications of the transcript, however. To protect the tran-
script from being digested by RNA nucleases outside the nucleus, the RNA gains a
run of polyadenylic acid residues (known as a poly A tail) at the 3′ end. At the 5′
end, the transcript gains a “cap” of a guanine residue attached via an unusual 5-5
triphosphate link (Figure 2.7). Furthermore, this guanine is usually methylated at
the N7 position, and in some cases the sugar residues of the first one or two bases
in the original transcript are methylated at the 2′ hydroxyl group. Besides protecting
the RNA transcript against degradation, the cap serves as a binding point for the
ribosome. Finally, after all these processing steps, the transcript RNA is ready to be

FIGURE 2.6 Processing of mRNA in the eukaryotic cell, emphasizing the splicing of the
messenger with excision of intervening sequences (introns) away from the RNA that codes
for amino acid sequences (exons). The reversed G indicates the methylated guanine cap found
at the 5′ end of processed messenger RNA (mRNA), and the symbol An indicates the run of
adenine residues that form a tail at the 3′ end of the mRNA.

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 17 Friday, June 24, 2005 11:35 AM

Proteins 17

FIGURE 2.7 The “cap” structure found in eukaryotic cells, showing the modified terminal
nucleotide containing N7-guanine, the unusual linkage of this nucleotide to the rest of the
chain, and the methylation of two neighboring sugar groups.

used in protein biosynthesis, and it passes out of the nucleus to the endoplasmic
reticulum or into the cytosol as a fully processed messenger RNA.

THE GENETIC CODE


The way in which the coding strand of DNA specifies the sequence of amino acids
in a protein is known as the genetic code. The code is made up of triplets of nucleic
acid bases, known as codons. A given series of three bases in the coding strand DNA
of a gene will unambiguously specify a particular amino acid and no other.
There are 64 possible triplet codons (64 = 43; there are four choices of base at
each of the three positions in a codon). However, only 61 codons are used to designate
amino acids; three codons are used to signal the end of the amino acid sequence for
a protein, and these three codons are called termination or stop codons. With 20
amino acids used in most proteins and 61 codons to specify amino acids, some
amino acids are coded for by more than one codon, i.e., there are synonymous

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 18 Friday, June 24, 2005 11:35 AM

18 Pharmaceutical Biotechnology

TABLE 2.2
The Genetic Code
First Third
Base Second Base Second Base Second Base Second Base Base
(5′-end) U C A G (3′-end)

U Phe Ser Tyr Cys U


Phe Ser Tyr Cys C
Leu Ser STOP STOP A
Leu Ser STOP Trp G
C Leu Pro His Arg U
Leu Pro His Arg C
Leu Pro Gln Arg A
Leu Pro Gln Arg G
A Ile Thr Asn Ser U
Ile Thr Asn Ser C
Ile Thr Lys Arg A
Met Thr Lys Arg G
G Val Ala Asp Gly U
Val Ala Asp Gly C
Val Ala Glu Gly A
Val Ala Glu Gly G

codons for some amino acids. This is sometimes referred to as “degeneracy” in the
genetic code.
Since a messenger RNA is an exact copy of a gene’s coding strand (with uracil
replacing thymine), the mRNA carries a copy of the codons that determine the amino
acid sequence in the protein as specified by the gene. The codons here are in RNA
form, not in DNA form. This genetic information can be read from the mRNA by
a ribosome, codon by codon, as the ribosome covalently joins together the corre-
sponding amino acids, in exactly the order of the codons on the original coding
strand of DNA.
A standard version of the genetic code for the triplet codons on a messenger
RNA is given in Table 2.2.

ACTIVATED AMINO ACIDS AND TRANSFER RNA


Before amino acids can be joined together by the ribosome, they must be activated
by attachment to a special species of RNA known as transfer RNA (tRNA). The
activation serves two purposes: first, activation chemically prepares the amino acid
for forming an amide linkage to another amino acid; and second, its attachment to
a tRNA helps to direct the proper sequential joining of the amino acids in a protein.
Transfer RNA comes in many different types that can be distinguished by details
in their nucleotide sequence. Regardless of sequence differences, all tRNA molecules
fold up into the same general structure with several short double-helical regions

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 19 Friday, June 24, 2005 11:35 AM

Proteins 19

3’ End 5’ End

Anticodon

FIGURE 2.8 The general structure of transfer RNA (tRNA) showing the phosphodiester
backbone as a ribbon, with bases or base pairs projecting from the backbone. (Adapted from
Rich, A. [1977]. Three-dimensional structure and biological function of transfer RNA
Accounts of Chem. Res., 10, 388—396, copyright 1977, American Chemical Society, with
permission from Accounts of Chemical Research.)

(called stems) and short single-stranded loops at the ends of most of the stems. The
overall spatial shape resembles a lumpy L (Figure 2.8).
At one end of the tRNA strand there is a short stem with a protruding single-
stranded region that has a free 3′ hydroxyl group. This is the site where an amino
acid is attached covalently via its carboxyl group. The joining of the proper amino
acid to its corresponding tRNA is catalyzed by a special enzyme, one of the ami-
noacyl-tRNA synthetases. There is a specific synthetase for joining each type of
amino acid and its corresponding tRNA. This avoids the problem of having, say, an
alanine residue attached to the tRNA for glutamine, which might lead to insertion
of the wrong amino acid residue in a protein and the consequent loss of activity or
specificity in that protein.
On each of the tRNA molecules, one of the single-stranded loops contains a
trinucleotide sequence that is complementary to the triplet codon sequence used in
the genetic code to specify a particular amino acid. This loop on the tRNA is known
as the anticodon loop, and it is used to match the tRNA with a complementary codon
on the mRNA. In this way the amino acids carried by the tRNA molecules can be
aligned in the proper sequence for polymerization into a functional protein.

THE RIBOSOME AND ASSOCIATED FACTORS


The ribosome is a huge complex of protein and nucleic acid that catalyzes protein
synthesis. There are differences between prokaryotic and eukaryotic ribosomal structure;

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 20 Friday, June 24, 2005 11:35 AM

20 Pharmaceutical Biotechnology

for simplicity this discussion will be centered on the synthesis of proteins in the bacterium
Escherichia coli. The general organization and functions are the same, however, in both
prokaryotic and eukaryotic systems.
All ribosomes have two subunits, and each subunit contains several protein
chains and one or more chains of RNA (ribosomal RNA, or rRNA). In the ribosome
from E. coli, the smaller of the two subunits is known as the 30S subunit and the
larger is referred to as the 50S subunit. (The unit S stands for Svedberg, a measure
of how rapidly a particle sediments in a centrifuge.) The two subunits combine to
form the active 70S ribosomal assembly. The special RNA molecules that are a part
of the ribosome are quite distinct from messenger or transfer RNA molecules, and
they play important roles in forming the overall ribosomal quaternary structure and
in aligning mRNA and tRNA molecules during protein biosynthesis.
At one point or another during protein synthesis, several other proteins will be
associated with the ribosome. These include factors that help in initiating the syn-
thetic process, others that help in elongating the peptide chain, and yet others that
play a role in terminating the synthesis of a peptide chain. Beyond this, there is also
the mRNA to consider, as well as the aminoacylated tRNA molecules. Finally, since
protein biosynthesis consumes energy, there is the hydrolysis of ATP and GTP to
AMP and GDP, respectively, by the ribosome.

MESSAGE TRANSLATION AND PROTEIN SYNTHESIS


Protein synthesis has three main stages: initiation, elongation, and termination. In
the initiation stage, the 30S subunit binds a mRNA molecule, then binds a 50S
subunit. The mRNA is aligned for proper translation by complementary hydrogen
bonding to a portion of a rRNA molecule found in the 30S subunit. The assembly
now binds a particular tRNA that has been aminoacylated with methionine (more
precisely, with formyl-methionine). This tRNA pairs up with the first codon on the
mRNA, and so guarantees that the first amino acid in the peptide chain will be
methionine. In E. coli, three different proteins, called initiation factors (IF-1, IF-2,
and IF-3), also aid in forming this initiation complex.
After initiation comes peptide chain elongation. In this stage, the mRNA mol-
ecule is read by the ribosome from its 5′ end toward its 3′ end. Activated amino
acids are added step by step to the growing peptide chain, the sequence of addition
being governed by the order of codons on the mRNA and the binding and alignment
of the appropriate aminoacylated tRNA species. This phase of peptide synthesis
involves the formation of peptide bonds between the amino acids. The peptide chain
is elongated from amino terminus to carboxy terminus, i.e., the first codon on the
mRNA specifies the amino acid at the N-terminus of the protein to be synthesized.
During chain elongation two additional proteins help in the binding of tRNA and
in peptide bond formation (elongation factors EF-Tu and EF-Ts in E. coli). The
ribosome reads the mRNA at the rate of about 15 codons per second, so that the
synthesis of a protein with 300 amino acids takes about 20 seconds.
The last stage of peptide chain synthesis is termination. The genetic code spec-
ifies three stop codons, indicating the termination of a coding sequence. When the
ribosome encounters one of these stop codons on the mRNA, certain release factors

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 21 Friday, June 24, 2005 11:35 AM

Proteins 21

(RF-1, RF-2, and RF-3 in E. coli) help to dissociate the newly synthesized protein
chain, the mRNA and the ribosomal subunits from one another. The mRNA is free
to be used in another cycle of protein synthesis, as are the ribosomal subunits.
Protein synthesis can be carried out by ribosomes free in the cytosol. In eukary-
otes, ribosomes also carry out protein synthesis while bound to the surface of the
endoplasmic reticulum. In addition, a given mRNA molecule usually has more than
one active ribosome translating it into protein; an assembly of several ribosomes on
a single mRNA is called a polyribosome, or polysome for short.

PROTEIN MODIFICATION
TYPES OF MODIFICATION
During protein synthesis and afterwards, proteins can undergo substantial covalent
modification. The types(s) and extent of modification will depend on the protein,
and often play an important role in the biological function of the protein. In general,
protein modifications can be divided into two major classes: reactions on the side
chains of the amino acids, and cleavages of the peptide backbone.
The side chains of the amino acids may be modified by

• Hydroxylation (e.g., proline to hydroxyproline)


• Methylation (e.g., serine or threonine, at the hydroxyl group)
• Acylation (e.g., attachment of a fatty acid, such as stearic, myristic or
palmitic acid, to one of several different amino acid side chains)
• Acetylation (at the side chain amino group of lysine residues in histones)
• Phosphorylation (of serine, threonine, or tyrosine)
• Carbohydrate attachment (to serine, threonine, or asparagine; this glycosy-
lation gives rise to “glycoproteins”)

One very important type of modification involves the joining of sulfhydryl groups
from two cysteine residues into a disulfide bridge, as mentioned earlier in the dis-
cussion of the primary level of protein structure. The pairing of the cysteines for
bridge formation is done quite specifically, with help from specialized enzyme sys-
tems. Disulfide bridge formation frequently occurs during the folding of the protein.
Some are even formed on an incompletely synthesized chain during translation.
Bridge formation is an important factor in guiding the folding process toward the
correct product. The bridges help to stabilize the protein against denaturation and
loss of activity. Proteins with mispaired cysteines and incorrectly formed disulfide
bridges are often folded in the wrong fashion and are usually biologically inactive.
Backbone cleavages may involve removal of the N-terminal formyl moiety from
the methionine, the removal of one or more amino acids at either the N- or C-
terminus of the chain, and cuts in the backbone at one or more sites, possibly with
the removal of internal peptide sequences.
Besides these modifications, certain proteins can acquire prosthetic groups (e.g.,
hemes, flavins, iron-sulfur centers, and others). The prosthetic groups may be
attached covalently (usually to amino acid side chains) in some cases, noncovalently

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 22 Friday, June 24, 2005 11:35 AM

22 Pharmaceutical Biotechnology

FIGURE 2.9 The protein processing pathway in eukaryotes.

in others, but in either case they cause a substantial change in the properties of the
protein.

PROTEIN TRANSPORT AND MODIFICATION


As the ribosome synthesizes a new peptide chain, the chain usually begins to fold,
creating regions of secondary and even incomplete tertiary structure. Enzymes then
act on these folded residues to modify them. As noted above, an important modification
that occurs at this stage is the formation of disulfide bridges. Another is the cleavage
of the peptide backbone at specific sites, which may be important for the transport of
the protein across membranes in the cell. These modifications occur during the process
of translation, and so they are described as cotranslational modifications.
In eukaryotes, the mature form of a protein may be separated by several mem-
brane barriers from its site of synthesis. For example, proteins secreted from the cell
must pass through the membrane of the endoplasmic reticulum (ER) into its lumen,
then through channels of the ER (where glycosylation usually occurs) to the Golgi
complex. Inside the Golgi complex, there may be further glycosylation; there may
also be removal of certain carbohydrate residues or proteolytic cleavage of the
protein. Finally the protein passes to secretory vesicles which release the protein
outside the plasma membrane of the cell. Proteins destined for other locations inside
the cell will receive different types and levels of glycosylation and proteolytic
cleavage. Figure 2.9 summarizes the overall pathway for protein processing.

PROTEIN STABILITY
THE IMPORTANCE OF NONCOVALENT INTERACTIONS
Although a protein’s primary structure is determined by covalent bonds along the
peptide backbone, the secondary and higher levels of structure depend in large part

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 23 Friday, June 24, 2005 11:35 AM

Proteins 23

on relatively weak, noncovalent interactions for stability. In terms of standard-free


energy change, the strength of these interactions is typically in the range of 0 to 7
kcal/mol (0 to 30 kJ/mol), much less than that involved in covalent bond formation.
Furthermore, such weak interactions can be individually disrupted fairly easily by
thermal agitation under physiological conditions, unlike covalent bonds which sel-
dom break under these conditions.
The persistence of secondary and higher levels of structure is due to the large
number of these weak interactions in a folded protein. While it is possible to disrupt
one or a few of these weak interactions at a given moment, in a folded protein there
are many more undisrupted noncovalent interactions that maintain the folded struc-
ture. Only on raising the temperature, perturbing the solution pH, or otherwise
changing conditions away from the physiological, will one see a significant shift in
the equilibrium from the native to the denatured conformation.
On the other hand, the weakness of these interactions does give a protein in solution
an appreciable amount of conformational flexibility. The solution conformation of a
protein is not absolutely rigid. Instead, because of momentary disruptions of weak
interactions, most proteins in solution are constantly flexing, stretching, and bending,
while still maintaining their overall shape. The result is that the protein molecules are
distributed in a dynamic equilibrium over a host of closely related conformations, all
very nearly the same in stability. This flexibility may be quite important in the bio-
logical functioning of proteins. Enzymes, for example, often undergo conformational
changes on binding substrate, and a number of membrane-bound receptor proteins
change conformation on binding their respective chemical messengers.

TYPES OF NONCOVALENT INTERACTIONS


The fundamental noncovalent interactions of interest here include: (1) so-called
exchange interactions, the very short range repulsion between atoms due to the Pauli
exclusion effect; (2) polarization interactions, due to changes in the electron distri-
bution about the interacting atoms; (3) electrostatic interactions, including interac-
tions among charged groups and dipoles; and (4) the hydrogen bond, where two
electronegative atoms partially share a hydrogen atom between them. The term van
der Waals interaction is often used to summarize interactions in classes 1 and 2
above. Figure 2.10 presents these different interactions as they might occur between
two polypeptide chains.
Exchange interactions are very short ranged, on the order of 1.2 to 1.5 Å
(0.12–0.15 nm) for carbon, nitrogen, and hydrogen; these interactions essentially
define what we usually regard as atomic and molecular shapes. Exchange interactions
are strong enough to block any substantial interpenetration of atoms, and they are
primarily responsible for what is called steric repulsion.
The hydrogen bond is a fairly short-range attractive interaction. The hydrogen
atom involved lies between two electronegative atoms, usually oxygen, nitrogen, or
a halogen species, although sulfur can also participate. The hydrogen atom usually
has a strong covalent attachment to one of the electronegative atoms, and thus it lies
closer to this atom than to the other. This group is referred to as the donor of the
hydrogen bond, and the other electronegative center is the acceptor group. Hydrogen

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 24 Friday, June 24, 2005 11:35 AM

24 Pharmaceutical Biotechnology

FIGURE 2.10 Weak interactions that stabilize protein structures. A, Hydrophobic interac-
tions; B, hydrogen bonding; C, ionic interactions.

bonds have a range around 2 Å (0.2 nm). The three atoms participating in the
interaction typically are collinear with a bond angle of 180°, though distortions of
the bond angle of 20°–30° are common. A hydrogen bond would typically cost 2
to 9 kcal/mol (8 to 38 kJ/mol) to disrupt.
Electrostatic interactions can be either attractive or repulsive, depending on the
charges and/or the orientation of dipoles involved; charges of opposite sign attract
and like charges repel one another. The strength of the interaction depends on the
local dielectric constant, and this in turn is strongly dependent on the nature and
organization of the medium surrounding the interacting species. For example, in the
interior of a protein the dielectric constant may vary from 2 to 40, so that electrostatic
interactions are very strong there, while in an aqueous environment the interactions
(with the same species, distances, and orientations) can be up to 40 times weaker
because the dielectric constant is around 80. The interaction energy for a pair of

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 25 Friday, June 24, 2005 11:35 AM

Proteins 25

singly charged ions in water at 298 K separated by 1 nm is about one-half of a


kcal/mol (about 2 kJ/mol), while in the interior of a membrane, with a dielectric
constant around 2, it would be about 17 kcal/mol (70 kJ/mol).
Polarization interactions for atoms and small molecules or functional groups are
much weaker than the other interactions listed above. For example, in vacuum the
attractive energy between two methyl groups is only about 0.15 kcal/mol (0.6 kJ/mol)
at a separation of 0.4 nm. However, polarization interactions are additive, so that
for large bodies with many individual polarization interactions (e.g., a protein bind-
ing a large substrate molecule) the overall contribution may be 10 to 20 kcal/mol
(40–80 kJ/mol). Furthermore, these interactions will be present for both nonpolar
and polar (even ionic) groups.

THE HYDROPHOBIC EFFECT


Nonpolar molecules tend to have low solubilities in water, and large nonpolar solutes
tend to form aggregates in aqueous solution. In the past these tendencies were
sometimes explained by invoking a special “hydrophobic bond” between nonpolar
groups. However, “bond” is a misnomer here, and it is better to refer to an “effect,”
because there is no exchange of bonding electrons involved in either of the tendencies
noted above. Instead, the hydrophobic effect is a combination of several of the
fundamental noncovalent interactions, and it involves details of the organization of
water molecules around nonpolar solute molecules.
In the bulk phase, water molecules are hydrogen bonded to one another and
attract one another by polarization interactions. A single water molecule can partic-
ipate in up to four hydrogen bonds at the same time, with the bonds oriented in
tetrahedral fashion about the central oxygen atom. In ice, this leads to a rigid three-
dimensional lattice of molecules. In liquid water, the hydrogen bonding interactions
are broken up enough so that there is no longer any long-range order or rigidity;
but on average, water molecules will still have about four hydrogen-bonded partners
around them and the local (as opposed to long-range) structure will still resemble
that of the ice lattice.
When a small nonpolar solute molecule is dissolved in water, the local network
of hydrogen bonds will rearrange to make room for the new solute. Polarization
interactions among the water molecules will also be disrupted. There will be, how-
ever, new polarization interactions between the solute and the surrounding water
molecules, and for the most part these new polarization interactions will almost
exactly balance the lost ones. The rearrangement of the hydrogen bonding network
is more important. Because of the strength of the hydrogen bond, there will be a
strong tendency for the water molecules immediately surrounding the nonpolar
solute to reorient themselves so as to each keep four hydrogen bonding partners.
This results in a cage-like water structure about the solute which can fluctuate in
shape, and which mostly maintains the favorable hydrogen bonding energies. The
local water structure is, however, somewhat more organized than the loose structure
in pure liquid water.
Elementary thermodynamics provides a framework to interpret these changes in
energy and structural organization. The key points can be summarized briefly as

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 26 Friday, June 24, 2005 11:35 AM

26 Pharmaceutical Biotechnology

follows. The energy changes in a system at constant temperature and pressure are
conveniently represented by ∆H, the change in enthalpy, while the changes in order
or organization can be represented by ∆S, the change in entropy. As is well known,
∆H and ∆S can be combined into a single thermodynamic function, the free energy
change ∆G:

∆G = ∆H T∆S

where T is the absolute temperature in Kelvins. The quantity ∆G determines whether


a process will occur spontaneously; it can also be used to determine the point of
equilibrium in chemical and biochemical systems. Any process where ∆G is negative
will occur spontaneously, and equilibrium is reached when ∆G for the process goes
to zero.
The thermodynamic relations above can be applied to the dissolution of nonpolar
solutes in water. The enthalpic contribution to the free energy change is rather small,
because of the compensations in hydrogen bonding and polarization interactions.
The organization of water molecules around the nonpolar solute (an increase in the
system’s order) lowers the entropy of the system, that is, ∆S is negative for this
process. This makes T∆S positive, which results in an unfavorable contribution to
the free energy change for dissolving the nonpolar solute. The magnitude of the T∆S
term for such processes is usually much larger than that of the ∆H term, and entropy
changes dominate the thermodynamics. Since the entropy change is unfavorable,
the overall free energy change is unfavorable, and nonpolar solutes tend to have low
solubilities.
The tendency for large nonpolar solutes to aggregate can be explained on the
same basis. There will be a sheath of water about each isolated solute molecule, and
the water in the sheath is again more organized than the water in the bulk solution
(though its organization at an extensive nonpolar surface is different than the cagelike
structure formed around small nonpolar solutes). When two such large hydrated
molecules form an aggregate, some of the organized water that was between the
solutes will be released to the bulk solution as the solutes make contact with one
another. Again, there will be relatively small changes in the energy (enthalpy)
because the hydrogen bonding and polarization interactions are largely compensated,
but there will be a major change in the entropy. The water molecules released from
the hydration sheaths will pass to a more disorganized state, which is entropically
favorable. This entropic contribution dominates the enthalpic term in the free energy
of aggregation, and makes the process favorable overall.

PROTEIN FOLDING AND NONCOVALENT INTERACTIONS


The folding of proteins into their characteristic three-dimensional shape is governed
primarily by noncovalent interactions. Hydrogen bonding governs the formation of
α helices and β sheets and bends, while hydrophobic effects tend to drive the
association of nonpolar side chains. Hydrophobicity also helps to stabilize the overall
compact native structure of a protein over its extended conformation in the denatured
state, because of the release of water from the chain’s hydration sheath as the protein

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 27 Friday, June 24, 2005 11:35 AM

Proteins 27

folds up. (Hydration of the folded protein is limited mainly to the protein’s surface
and its crevices and corrugations.) Exchange interactions put an upper limit on how
tightly the amino acids can be packed in the interior of a folded protein, and of
course play a role in the constraints on secondary and tertiary structure. Solvation
effects favor the placement of highly polar and ionic groups where they are exposed
to solvent and not buried in the (generally nonpolar) interior of a protein.

THERMODYNAMICS AND KINETICS OF PROTEIN FOLDING


Calorimetric studies of protein denaturation have revealed the following general
points. First, proteins generally are folded into the most stable conformation avail-
able; the native structure has the lowest free energy among all conformations.
Second, for small globular proteins, the transition from native to denatured form
can be adequately represented as a two state process. That is, the ∆H, ∆S, and ∆G
values are consistent with a model where there are only two macroscopic states, the
native and the denatured. This in turn implies that the transition is highly cooperative
(“all-or-none”); the individual amino acids do not switch states independently of
one another, but instead tend to switch states in a concerted way. Third, for large
proteins with several structural blocks or domains, the native structure is lost in
stages, each stage corresponding to the all-or-none denaturation of individual struc-
tural blocks. Fourth, these cooperative transitions are accompanied by characteristic
increases in the heat capacity of the system. Furthermore, the change in heat capacity
is strongly correlated with an increase in the surface area of the protein chain that
is exposed to solvent. This points to the importance of the contact of nonpolar groups
with water in determining the thermodynamics of the unfolding or folding of pro-
teins. The increase in heat capacity upon denaturation is likely due to ordering of
water about the nonpolar groups that are exposed as the protein unfolds, and the
order of these water molecules decreases more rapidly than that of bulk water as
the temperature rises.
Table 2.3 presents thermodynamic data on the denaturation of selected proteins
that have a single compact globular shape and that fit the pattern of an all-or-none
transition from native to denatured form. In general, both ∆H(298 K) and ∆S(298 K)
for the transition are positive, and ∆H and T∆S nearly cancel one another. This
means that ∆G for denaturation is in general small, though positive, at 298 K. A
typical free energy of denaturation at 298 K would be around 20 to 50 kJ/mol of
protein. Experiments and theory are consistent with a temperature of maximum
stability between 273 and 298 K (0–25°C) for most proteins. As might be expected,
raising the temperature destabilizes the native form of the protein with respect to
the denatured form. Interestingly, cooling the protein below the temperature of
maximum stability also destabilizes the native form, and can lead to the curious
phenomenon of cold denaturation.
An important topic of current research is how the sequence of amino acids in a
newly synthesized protein can direct the folding of the chain into a precise, biolog-
ically active shape. Can the amino acid sequence be used to predict the final three-
dimensional shape of the protein? The short answer to this question is, “Not com-
pletely, not yet.” Present computer-aided predictions are about 70% accurate with

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 28 Friday, June 24, 2005 11:35 AM

28 Pharmaceutical Biotechnology

TABLE 2.3
Thermodynamic Data on the Denaturation of Selected Proteinsa
Molecular
Protein Weight ∆H (298 K) ∆S (298 K) ∆CP (323 K)

Ribonuclease 13,600 2.37 6.70 43.5


Carbonic anhydrase 29,000 0.80 1.76 63.3
Lysozyme (hen) 14,300 2.02 5.52 51.7
Papain 23,400 0.93 1.60 60.1
Myoglobin 17,900 0.04 − 0.80 74.5
Cytochrome c 12,400 0.65 0.90 67.3
a ∆H in kJ/mol of amino acid residue; ∆S in J/K per mole of amino acid residue; ∆CP
in J/K-mol of amino acid residue.

Data adapted from Privalov, P.L., and Gill, S.J. (1988).

respect to secondary structure, and the prediction of tertiary structure is often a good
deal less accurate. But a great deal of progress has been made and a general
mechanism for protein folding has emerged.
Biophysical studies have shown that many denatured proteins can spontaneously
refold in vitro, upon removal of a denaturing agent (urea, detergent, acid, and so
on). Certain enzymes and other proteins can accelerate the folding process in vitro,
and it has been concluded that their role in vivo is to prevent misfolding or aggre-
gation. However, these protein factors serve more to facilitate the folding process
than to specify a particular spatial shape for the product.
The biophysical studies have also shown that folding is a relatively rapid process,
with folding being completed in a matter of seconds to perhaps a few minutes in
most cases. This rules out a “trial and error” approach to folding, where the protein
randomly folds into all possible conformations in a search for the most stable
conformation. For a typical protein of 100 residues, with about 10 different confor-
mations for each amino acid residue, the total number of conformations would be
around 10100, far too many to be searched in a reasonable amount of time.
Instead of proceeding at random, protein folding follows a stepwise process,
with three main steps. These are: (1) formation of unstable, fluctuating regions of
secondary structure; (2) aggregation or collapse of these embryonic structures into
a more compact intermediate structure or structures; and (3) rearrangement or adjust-
ment of the intermediate(s) into the final conformation (or family of conformations)
that we identify as the native structure. From in vitro studies it is known that the
first step is very fast, typically occurring within 0.01 seconds. The second step is
fast, taking about 1 second for most proteins; while the third step can be rather slow
in vitro, running from a few seconds up to 2500 seconds. The slowness of this last
step is often connected with isomerization of proline from the cis to the trans isomer,
and what happens in vivo may be considerably faster, thanks to facilitation by proline
isomerase enzymes.

© 2006 by Taylor & Francis Group, LLC


PH1873_C002.fm Page 29 Friday, June 24, 2005 11:35 AM

Proteins 29

The solution of the protein folding problem will have wide ramifications. For
example, the completion of the Human Genome Project, with the full DNA sequence
of the human genome, has generated the sequences of many genes whose protein
products are as yet uncharacterized. It would be a great help in deducing the function
of each of these gene products if it were possible to predict the overall three-
dimensional folding of the protein, and thus relate it to possible enzymatic, structural,
or signalling functions in the cell. Eventually, improved diagnostic tools or therapies
would result from this. Also, molecular biology laboratories are developing novel
artificial proteins, not derived from any preexisting gene, in attempts to obtain more
efficient enzymes, enzyme inhibitors, and signalling factors, for use in medicine and
in industrial chemical processes. An effective way of predicting the final folded
structure from a peptide’s sequence would greatly increase the efficiency of these
research and development efforts.

FURTHER READING
Berman, H.M., Goodsell, D.S., and Bourne, P.E. (2003). Protein structures: From famine to
feast. American Scientist, 90, 350–359.
Dill, K.A. and Chan, H.S. (1997). From Levinthal to pathways to funnels. Nature Structural
Biol., 4, 10–19.
King, J., Haase-Pettingell, C., and Gossard, D. (2002). Protein folding and misfolding. Amer-
ican Scientist, 90, 445–453.
Privalov, P.L. and Gill, S.J. (1988). Stability of protein structure and hydrophobic interaction.
Adv. Protein Chem., 39, 191–234.
Stryer, L. (1995). Biochemistry, 4th ed. W. H. Freeman, New York.

© 2006 by Taylor & Francis Group, LLC

You might also like