Chapter 4 - Alberts - Bài 1+2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

PART

I II III IV V
BASIC GENETIC MECHANISMS

CHAPTER
DNA, Chromosomes,
and Genomes 4
Life depends on the ability of cells to store, retrieve, and translate the genetic IN THIS CHAPTER
instructions required to make and maintain a living organism. This hereditary
information is passed on from a cell to its daughter cells at cell division, and from THE STRUCTURE AND
one generation of an organism to the next through the organism’s reproductive FUNCTION OF DNA
cells. The instructions are stored within every living cell as its genes, the infor-
mation-containing elements that determine the characteristics of a species as a CHROMOSOMAL DNA AND
whole and of the individuals within it. ITS PACKAGING IN THE
As soon as genetics emerged as a science at the beginning of the twentieth cen- CHROMATIN FIBER
tury, scientists became intrigued by the chemical structure of genes. The informa-
tion in genes is copied and transmitted from cell to daughter cell millions of times CHROMATIN STRUCTURE AND
during the life of a multicellular organism, and it survives the process essentially FUNCTION
unchanged. What form of molecule could be capable of such accurate and almost
unlimited replication and also be able to exert precise control, directing multi- THE GLOBAL STRUCTURE OF
cellular development as well as the daily life of every cell? What kind of instruc- CHROMOSOMES
tions does the genetic information contain? And how can the enormous amount
of information required for the development and maintenance of an organism fit HOW GENOMES EVOLVE
within the tiny space of a cell?
The answers to several of these questions began to emerge in the 1940s. At
this time researchers discovered, from studies in simple fungi, that genetic infor-
mation consists largely of instructions for making proteins. Proteins are phenom-
enally versatile macromolecules that perform most cell functions. As we saw in
Chapter 3, they serve as building blocks for cell structures and form the enzymes
that catalyze most of the cell’s chemical reactions. They also regulate gene expres-
sion (Chapter 7), and they enable cells to communicate with each other (Chapter
15) and to move (Chapter 16). The properties and functions of cells and organisms
are determined to a great extent by the proteins that they are able to make.
Painstaking observations of cells and embryos in the late nineteenth century
had led to the recognition that the hereditary information is carried on chro-
mosomes—threadlike structures in the nucleus of a eukaryotic cell that become
visible by light microscopy as the cell begins to divide (Figure 4–1). Later, when
biochemical analysis became possible, chromosomes were found to consist of
deoxyribonucleic acid (DNA) and protein, with both being present in roughly the
same amounts. For many decades, the DNA was thought to be merely a structural
174 Chapter 4: DNA, Chromosomes, and Genomes

Figure 4–1 Chromosomes in cells. (A) Two adjacent plant cells


photographed through a light microscope. The DNA has been stained with a
fluorescent dye (DAPI) that binds to it. The DNA is present in chromosomes,
which become visible as distinct structures in the light microscope only when
they become compact, sausage-shaped structures in preparation for cell
division, as shown on the left. The cell on the right, which is not dividing,
contains identical chromosomes, but they cannot be clearly distinguished
at this phase in the cell’s life cycle, because they are in a more extended
conformation. (B) Schematic diagram of the outlines of the two cells along
with their chromosomes. (A, courtesy of Peter Shaw.)
(A) nondividing cell
dividing cell

element. However, the other crucial advance made in the 1940s was the identifica-
tion of DNA as the likely carrier of genetic information. This breakthrough in our
understanding of cells came from studies of inheritance in bacteria (Figure 4–2).
But still, as the 1950s began, both how proteins could be specified by instructions
in the DNA and how this information might be copied for transmission from cell
to cell seemed completely mysterious. The puzzle was suddenly solved in 1953,
when James Watson and Francis Crick derived the mechanism from their model
of DNA structure. As outlined in Chapter 1, the determination of the double-he-
lical structure of DNA immediately solved the problem of how the information (B)
in this molecule might be copied, or replicated. It also provided the first clues as 10 μm
to how a molecule of DNA might use the sequence of its subunits to encode the
instructions for making proteins. Today, the fact that DNA is the genetic material
is so fundamental to biological thought that it is difficult to appreciate the enor-
mous intellectual gap that was filled by this breakthrough discovery.
We begin this chapter by describing the structure of DNA. We see how, despite
its chemical simplicity, the structure and chemical properties of DNA make it
ideally suited as the raw material of genes. We then consider how the many pro-
MBoC6 m4.01/4.01
teins in chromosomes arrange and package this DNA. The packing has to be done
in an orderly fashion so that the chromosomes can be replicated and apportioned
correctly between the two daughter cells at each cell division. And it must also
allow access to chromosomal DNA, both for the enzymes that repair DNA damage
and for the specialized proteins that direct the expression of its many genes.
In the past two decades, there has been a revolution in our ability to deter-
mine the exact order of subunits in DNA molecules. As a result, we now know the

smooth pathogenic bacterium


S strain S strain cells
causes pneumonia

Figure 4–2 The first experimental


demonstration that DNA is the genetic
RANDOM MUTATION FRACTIONATION OF CELL-FREE material. These experiments, carried out
EXTRACT INTO CLASSES OF in the 1920s (A) and 1940s (B), showed
rough nonpathogenic PURIFIED MOLECULES
R strain that adding purified DNA to a bacterium
mutant bacterium
changed the bacterium’s properties and
that this change was faithfully passed
RNA protein DNA lipid carbohydrate
on to subsequent generations. Two
live R strain cells grown in closely related strains of the bacterium
presence of either heat-killed Streptococcus pneumoniae differ from
S strain cells or cell-free each other in both their appearance under
molecules tested for transformation of R strain cells
extract of S strain cells
the microscope and their pathogenicity.
TRANSFORMATION
One strain appears smooth (S) and causes
death when injected into mice, and the
other appears rough (R) and is nonlethal.
some R strain cells are (A) An initial experiment shows that some
transformed to S strain substance present in the S strain can
cells, whose daughters
R R S R R change (or transform) the R strain into the
are pathogenic and
S strain strain strain strain strain strain S strain and that this change is inherited by
cause pneumonia
subsequent generations of bacteria.
(B) This experiment, in which the R strain
CONCLUSION: Molecules that can CONCLUSION: The molecule that has been incubated with various classes
carry heritable information are carries the heritable information
present in S strain cells. is DNA.
of biological molecules purified from the
S strain, identifies the active substance
(A) (B) as DNA.
THE STRUCTURE AND FUNCTION OF DNA 175

sequence of the 3.2 billion nucleotide pairs that provide the information for pro-
ducing a human adult from a fertilized egg, as well as having the DNA sequences
for thousands of other organisms. Detailed analyses of these sequences are pro-
viding exciting insights into the process of evolution, and it is with this subject that
the chapter ends.
This is the first of four chapters that deal with basic genetic mechanisms—the
ways in which the cell maintains, replicates, and expresses the genetic informa-
tion carried in its DNA. In the next chapter (Chapter 5), we shall discuss the mech-
anisms by which the cell accurately replicates and repairs DNA; we also describe
how DNA sequences can be rearranged through the process of genetic recombi-
nation. Gene expression—the process through which the information encoded in
DNA is interpreted by the cell to guide the synthesis of proteins—is the main topic
of Chapter 6. In Chapter 7, we describe how this gene expression is controlled by
the cell to ensure that each of the many thousands of proteins and RNA molecules
encrypted in its DNA is manufactured only at the proper time and place in the life
of a cell.

THE STRUCTURE AND FUNCTION OF DNA


Biologists in the 1940s had difficulty in conceiving how DNA could be the genetic
material. The molecule seemed too simple: a long polymer composed of only four
types of nucleotide subunits, which resemble one another chemically. Early in the
1950s, DNA was examined by x-ray diffraction analysis, a technique for determin-
ing the three-dimensional atomic structure of a molecule (discussed in Chapter
8). The early x-ray diffraction results indicated that DNA was composed of two
strands of the polymer wound into a helix. The observation that DNA was dou-
ble-stranded provided one of the major clues that led to the Watson–Crick model
for DNA structure that, as soon as it was proposed in 1953, made DNA’s potential
for replication and information storage apparent.

A DNA Molecule Consists of Two Complementary Chains of


Nucleotides
A deoxyribonucleic acid (DNA) molecule consists of two long polynucleotide
chains composed of four types of nucleotide subunits. Each of these chains is
known as a DNA chain, or a DNA strand. The chains run antiparallel to each other,
and hydrogen bonds between the base portions of the nucleotides hold the two
chains together (Figure 4–3). As we saw in Chapter 2 (Panel 2–6, pp. 100–101),
nucleotides are composed of a five-carbon sugar to which are attached one or
more phosphate groups and a nitrogen-containing base. In the case of the nucle-
otides in DNA, the sugar is deoxyribose attached to a single phosphate group
(hence the name deoxyribonucleic acid), and the base may be either adenine (A),
cytosine (C), guanine (G), or thymine (T). The nucleotides are covalently linked
together in a chain through the sugars and phosphates, which thus form a “back-
bone” of alternating sugar–phosphate–sugar–phosphate. Because only the base
differs in each of the four types of nucleotide subunit, each polynucleotide chain
in DNA is analogous to a sugar-phosphate necklace (the backbone), from which
hang the four types of beads (the bases A, C, G, and T). These same symbols (A,
C, G, and T) are commonly used to denote either the four bases or the four entire
nucleotides—that is, the bases with their attached sugar and phosphate groups.
The way in which the nucleotides are linked together gives a DNA strand a
chemical polarity. If we think of each sugar as a block with a protruding knob (the
5ʹ phosphate) on one side and a hole (the 3ʹ hydroxyl) on the other (see Figure
4–3), each completed chain, formed by interlocking knobs with holes, will have
all of its subunits lined up in the same orientation. Moreover, the two ends of the
chain will be easily distinguishable, as one has a hole (the 3ʹ hydroxyl) and the
other a knob (the 5ʹ phosphate) at its terminus. This polarity in a DNA chain is
indicated by referring to one end as the 3ʹ end and the other as the 5ʹ end, names
derived from the orientation of the deoxyribose sugar. With respect to DNA’s
176 Chapter 4: DNA, Chromosomes, and Genomes

building blocks of DNA DNA strand Figure 4–3 DNA and its building blocks.
DNA is made of four types of nucleotides,
sugar
which are linked covalently into a
phosphate polynucleotide chain (a DNA strand) with
+ G 5′ 3′ a sugar-phosphate backbone from which
the bases (A, C, G, and T) extend. A DNA
G C A T
sugar- base G
molecule is composed of two antiparallel
phosphate (guanine)
nucleotide DNA strands held together by hydrogen
bonds between the paired bases. The
double-stranded DNA DNA double helix arrowheads at the ends of the DNA strands
3′
indicate the polarities of the two strands. In
3′ the diagram at the bottom left of the figure,
5′
5′
the DNA molecule is shown straightened
G C
out; in reality, it is twisted into a double
G C
helix, as shown on the right. For details,
see Figure 4–5 and Movie 4.1.
T A T A

A T A T

A T A

C G sugar-phosphate G C
backbone
G C C G

G C C G

A
T A

C G C G

A T A T

5′ 5′
3′ 3′

hydrogen-bonded
base pairs

information-carrying capacity, the chain of nucleotides in a DNA strand, being


MBoC6 m4.03/4.03
both directional and linear, can be read in much the same way as the letters on
this page.
The three-dimensional structure of DNA—the DNA double helix—arises from
the chemical and structural features of its two polynucleotide chains. Because
these two chains are held together by hydrogen-bonding between the bases on
the different strands, all the bases are on the inside of the double helix, and the
sugar-phosphate backbones are on the outside (see Figure 4–3). In each case, a
bulkier two-ring base (a purine; see Panel 2–6, pp. 100–101) is paired with a sin-
gle-ring base (a pyrimidine): A always pairs with T, and G with C (Figure 4–4).
This complementary base-pairing enables the base pairs to be packed in the ener-
getically most favorable arrangement in the interior of the double helix. In this
arrangement, each base pair is of similar width, thus holding the sugar-phosphate
backbones a constant distance apart along the DNA molecule. To maximize the
efficiency of base-pair packing, the two sugar-phosphate backbones wind around
each other to form a right-handed double helix, with one complete turn every ten
base pairs (Figure 4–5).
The members of each base pair can fit together within the double helix only
if the two strands of the helix are antiparallel—that is, only if the polarity of one
strand is oriented opposite to that of the other strand (see Figures 4–3 and 4–4).
A consequence of DNA’s structure and base-pairing requirements is that each
strand of a DNA molecule contains a sequence of nucleotides that is exactly com-
plementary to the nucleotide sequence of its partner strand.
THE STRUCTURE AND FUNCTION OF DNA 177

3′ Figure 4–4 Complementary base pairs


5′ in the DNA double helix. The shapes
and chemical structures of the bases
H O allow hydrogen bonds to form efficiently
N C C N only between A and T and between G
and C, because atoms that are able to
N C A N H N T C H form hydrogen bonds (see Panel 2–3,
pp. 94–95) can then be brought close
C C C C together without distorting the double helix.
C As indicated, two hydrogen bonds form
N N H O CH3
H between A and T, while three form between
adenine thymine G and C. The bases can pair in this way
H
only if the two polynucleotide chains that
contain them are antiparallel to each other.
sugar– H
phosphate
backbone N H O
N C C N

N C G N H N C C H

C C C C
C
N H
O H N
H
guanine H cytosine
hydrogen 5′
3′ bond

1 nm
(A)

The Structure of DNA Provides a Mechanism for Heredity


The discovery of the structure of DNA immediately suggested answers to the two
most fundamental questions about heredity. First, how could the information to
specify an organism be carriedMBoC6 m4.04/4.04
in a chemical form? And second, how could this
information be duplicated and copied from generation to generation?
The answer to the first question came from the realization that DNA is a linear
polymer of four different kinds of monomer, strung out in a defined sequence like
the letters of a document written in an alphabetic script.
The answer to the second question came from the double-stranded nature of
the structure: because each strand of DNA contains a sequence of nucleotides
that is exactly complementary to the nucleotide sequence of its partner strand,
each strand can act as a template, or mold, for the synthesis of a new complemen-
tary strand. In other words, if we designate the two DNA strands as S and Sʹ, strand

_
5′ end O
P
O O Figure 4–5 The DNA double helix.
5′ (A) A space-filling model of 1.5 turns of
bases the DNA double helix. Each turn of DNA is
C O
O made up of 10.4 nucleotide pairs, and the
minor
P O center-to-center distance between adjacent
groove 3′ end _ nucleotide pairs is 0.34 nm. The coiling of
O O O the two strands around each other creates
_ G
O 3′ 0.34 nm two grooves in the double helix: the wider
major O O G O
groove P O groove is called the major groove, and the
_ O O smaller the minor groove, as indicated.
O
O C (B) A short section of the double helix
P
P O O O _ O viewed from its side, showing four base
O O O
O P pairs. The nucleotides are linked together
_ T sugar covalently by phosphodiester bonds that
O O O A
G O_ join the 3ʹ-hydroxyl (–OH) group of one
5′ O P O sugar to the 5ʹ-hydroxyl group of the next
O O
O O sugar. Thus, each polynucleotide strand
O P O
_ C has a chemical polarity; that is, its two
hydrogen bond 3′ O phosphodiester
ends are chemically different. The 5ʹ end
bond
2 nm of the DNA polymer is by convention often
5′ end
3′ end illustrated carrying a phosphate group,
(A) (B) while the 3ʹ end is shown with a hydroxyl.
178 Chapter 4: DNA, Chromosomes, and Genomes

template S strand Figure 4–6 DNA as a template for its


5′ 3′ own duplication. Because the nucleotide
A successfully pairs only with T, and G
pairs with C, each strand of DNA can act
S strand 3′ 5′ as a template to specify the sequence of
5′ 3′ new S′ strand nucleotides in its complementary strand.
In this way, double-helical DNA can be
copied precisely, with each parental DNA
3′ 5′ new S strand helix producing two identical daughter DNA
S′ strand 5′ 3′ helices.
parental DNA double helix
3′ 5′
template S′ strand

daughter DNA double helices

S can serve as a template for making a new strand Sʹ, while strand Sʹ can serve as a
template for making a new strand S (Figure 4–6). Thus, the genetic information in
DNA can be accurately copied by the beautifully simple process in which strand
S separates from strand Sʹ, and each separated strand then serves as a template
for the production of a new complementary partner strand that is identical to its
former partner.
The ability of each strand of a DNA molecule to act as a template for producing
MBoC6 m4.08/4.08
a complementary strand enables a cell to copy, or replicate, its genome before
passing it on to its descendants. We shall describe the elegant machinery that the
cell uses to perform this task in Chapter 5.
Organisms differ from one another because their respective DNA molecules
have different nucleotide sequences and, consequently, carry different biological
messages. But how is the nucleotide alphabet used to make messages, and what
do they spell out?
As discussed above, it was known well before the structure of DNA was deter-
mined that genes contain the instructions for producing proteins. If genes are
made of DNA, the DNA must therefore somehow encode proteins (Figure 4–7).
As discussed in Chapter 3, the properties of a protein, which are responsible for its
biological function, are determined by its three-dimensional structure. This struc-
ture is determined in turn by the linear sequence of the amino acids of which it is
composed. The linear sequence of nucleotides in a gene must therefore somehow
spell out the linear sequence of amino acids in a protein. The exact correspon-
dence between the four-letter nucleotide alphabet of DNA and the twenty-letter
amino acid alphabet of proteins—the genetic code—is not at all obvious from the
DNA structure, and it took over a decade after the discovery of the double helix
before it was worked out. In Chapter 6, we will describe this code in detail in the
course of elaborating the process of gene expression, through which a cell converts
the nucleotide sequence of a gene first into the nucleotide sequence of an RNA
molecule, and then into the amino acid sequence of a protein.
The complete store of information in an organism’s DNA is called its genome,
and it specifies all the RNA molecules and proteins that the organism will ever
synthesize. (The term genome is also used to describe the DNA that carries this
information.) The amount of information contained in genomes is staggering. The
nucleotide sequence of a very small human gene, written out in the four-letter
nucleotide alphabet, occupies a quarter of a page of text (Figure 4–8), while the
complete sequence of nucleotides in the human genome would fill more than a gene A gene B gene C

thousand books the size of this one. In addition to other critical information, it
includes roughly 21,000 protein-coding genes, which (through alternative splic- DNA GENE
double EXPRESSION
ing; see p. 415) give rise to a much greater number of distinct proteins. helix

In Eukaryotes, DNA Is Enclosed in a Cell Nucleus protein A protein B protein C

As described in Chapter 1, nearly all the DNA in a eukaryotic cell is sequestered in Figure 4–7 The relationship between
a nucleus, which in many cells occupies about 10% of the total cell volume. This genetic information carried in DNA and
compartment is delimited by a nuclear envelope formed by two concentric lipid proteins. (Discussed in Chapter 1.)

MBoC6 m4.06/4.06
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 179

Figure 4–7 The nucleotide sequence of the human β-globin gene. By


convention, a nucleotide sequence is written from its 5ʹ end to its 3ʹ end,
and it should be read from left to right in successive lines down the page as
though it were normal English text. This gene carries the information for the
amino acid sequence of one of the two types of subunits of the hemoglobin
molecule; a different gene, the α-globin gene, carries the information for the
other. (Hemoglobin, the protein that carries oxygen in the blood, has four
subunits, two of each type.) Only one of the two strands of the DNA double
helix containing the β-globin gene is shown; the other strand has the exact
complementary sequence. The DNA sequences highlighted in yellow show
the three regions of the gene that specify the amino acid sequence for the
β-globin protein. We shall see in Chapter 6 how the cell splices these three
sequences together at the level of messenger RNA in order to synthesize a
full-length β-globin protein.

bilayer membranes (Figure 4–9). These membranes are punctured at intervals


by large nuclear pores, through which molecules move between the nucleus and
the cytosol. The nuclear envelope is directly connected to the extensive system
of intracellular membranes called the endoplasmic reticulum, which extend out
from it into the cytoplasm. And it is mechanically supported by a network of inter-
mediate filaments called the nuclear lamina—a thin feltlike mesh just beneath
the inner nuclear membrane (see Figure 4–9B).
The nuclear envelope allows the many proteins that act on DNA to be concen-
trated where they are needed in the cell, and, as we see in subsequent chapters, it
also keeps nuclear and cytosolic enzymes separate, a feature that is crucial for the
proper functioning of eukaryotic cells.

Summary
Genetic information is carried in the linear sequence of nucleotides in DNA. Each
molecule of DNA is a double helix formed from two complementary antiparallel
strands of nucleotides held together by hydrogen bonds between G-C and A-T base
pairs. Duplication of the genetic information occurs by the use of one DNA strand
as a template for the formation of a complementary strand. The genetic information
stored in an organism’s DNA contains the instructions for all the RNA molecules and
proteins that the organism will ever synthesize and is said to comprise its genome.
In eukaryotes, DNA is contained in the cell nucleus, a large membrane-bound com-
partment.

CHROMOSOMAL DNA AND ITS PACKAGING IN THE


CHROMATIN FIBER
The most important function of DNA is to carry genes, the information that spec-
ifies all the RNA molecules and proteins that make up an organism—including
information about when, in what types of cells, and in what quantity each RNA
molecule and protein is to be made. The nuclear DNA of eukaryotes is divided up
into chromosomes, and in this section we see how genes are typically arranged on
each chromosome. In addition, we describe the specialized DNA sequences that
are required for a chromosome to be accurately duplicated as a separate entity
and passed on from one generation to the next.
We also confront the serious challenge of DNA packaging. If the double helices
comprising all 46 chromosomes in a human cell could be laid end to end, they
would reach approximately 2 meters; yet the nucleus, which contains the DNA, is
only about 6 μm in diameter. This is geometrically equivalent to packing 40 km (24
miles) of extremely fine thread into a tennis ball. The complex task of packaging
DNA is accomplished by specialized proteins that bind to the DNA and fold it,
generating a series of organized coils and loops that provide increasingly higher
levels of organization, and prevent the DNA from becoming an unmanageable
tangle. Amazingly, although the DNA is very tightly compacted, it nevertheless
remains accessible to the many enzymes in the cell that replicate it, repair it, and
use its genes to produce RNA molecules and proteins.
180 Chapter 4: DNA, Chromosomes, and Genomes

heterochromatin endoplasmic reticulum heterochromatin


DNA and associated
proteins (chromatin),
nuclear
plus many RNA and
envelope
protein molecules

nucleolus

nucleolus centrosome

microtubule

nuclear lamina

nuclear pore

outer nuclear membrane nuclear envelope


(A) (B) inner nuclear membrane
2 µm
Figure 4–9 A cross-sectional view of a typical cell nucleus. (A) Electron micrograph of a thin section through the nucleus of
a human fibroblast. (B) Schematic drawing, showing that the nuclear envelope consists of two membranes, the outer one being
continuous with the endoplasmic reticulum (ER) membrane (see also Figure 12–7). The space inside the endoplasmic reticulum
(the ER lumen) is colored yellow; it is continuous with the space between the two nuclear membranes. The lipid bilayers of the
inner and outer nuclear membranes are connected at each nuclear pore. A sheetlike network of intermediate filaments (brown)
inside the nucleus forms the nuclear lamina (brown), providing mechanical support for the nuclear envelope (for details, see
Chapter 12). The dark-staining heterochromatin contains specially condensed regions of DNA that will be discussed later. (A,
courtesy of E.G. Jordan and J. McGovern.)

Eukaryotic DNA Is Packaged into a Set of Chromosomes


MBoC6 m4.09/4.09
Each chromosome in a eukaryotic cell consists of a single, enormously long linear
DNA molecule along with the proteins that fold and pack the fine DNA thread into
a more compact structure. In addition to the proteins involved in packaging, chro-
mosomes are also associated with many other proteins (as well as numerous RNA
molecules). These are required for the processes of gene expression, DNA repli-
cation, and DNA repair. The complex of DNA and tightly bound protein is called
chromatin (from the Greek chroma, “color,” because of its staining properties).
Bacteria lack a special nuclear compartment, and they generally carry their
genes on a single DNA molecule, which is often circular (see Figure 1–24). This
DNA is also associated with proteins that package and condense it, but they are
different from the proteins that perform these functions in eukaryotes. Although
the bacterial DNA with its attendant proteins is often called the bacterial “chro-
mosome,” it does not have the same structure as eukaryotic chromosomes, and
less is known about how the bacterial DNA is packaged. Therefore, our discussion
of chromosome structure will focus almost entirely on eukaryotic chromosomes.
With the exception of the gametes (eggs and sperm) and a few highly special-
ized cell types that cannot multiply and either lack DNA altogether (for example,
red blood cells) or have replicated their DNA without completing cell division (for
example, megakaryocytes), each human cell nucleus contains two copies of each
chromosome, one inherited from the mother and one from the father. The mater-
nal and paternal chromosomes of a pair are called homologous chromosomes
(homologs). The only nonhomologous chromosome pairs are the sex chromo-
somes in males, where a Y chromosome is inherited from the father and an X
chromosome from the mother. Thus, each human cell contains a total of 46 chro-
mosomes—22 pairs common to both males and females, plus two so-called sex
chromosomes (X and Y in males, two Xs in females). These human chromosomes
can be readily distinguished by “painting” each one a different color using a tech-
nique based on DNA hybridization (Figure 4–10). In this method (described in
detail in Chapter 8), a short strand of nucleic acid tagged with a fluorescent dye
serves as a “probe” that picks out its complementary DNA sequence, lighting up
the target chromosome at any site where it binds. Chromosome painting is most
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 181

Figure 4–10 The complete set of human


chromosomes. These chromosomes,
from a female, were isolated from a cell
undergoing nuclear division (mitosis)
and are therefore highly compacted.
1 2 3 4 5
Each chromosome has been “painted” a
different color to permit its unambiguous
identification under the fluorescence
microscope, using a technique called
6 7 8 9 10 11 12
“spectral karyotyping.” Chromosome
painting can be performed by exposing
the chromosomes to a large collection of
13 14 15 16 17 18
DNA molecules whose sequence matches
known DNA sequences from the human
19 20 21 22 genome. The set of sequences matching
X X
each chromosome is coupled to a different
(A) (B) combination of fluorescent dyes. DNA
10 µm
molecules derived from chromosome 1 are
labeled with one specific dye combination,
frequently done at the stage in the cell cycle called mitosis, when chromosomes those from chromosome 2 with another,
are especially compacted and easy to visualize (see below). and so on. Because the labeled DNA can
Another more traditional way to distinguish one chromosome from another form base pairs, or hybridize, only to the
is to stain them with dyes that reveal a striking and reproducible pattern of bands chromosome from which it was derived,
each chromosome becomes labeled
along each mitotic chromosome (Figure 4–11). These banding patterns presum-
with a different combination of dyes. For
ably reflect variations in chromatin structure, but their basis is not well under- such experiments, the chromosomes are
stood. Nevertheless, the pattern of bands on each type of chromosome is unique, subjected to treatments that separate
MBoC6 to
and it provided the initial means n4.444/4.10
identify and number each human chromo- the two strands of double-helical DNA in
some reliably. a way that permits base-pairing with the
single-stranded labeled DNA, but keeps
the overall chromosome structure relatively
intact. (A) The chromosomes visualized as
they originally spilled from the lysed cell.
(B) The same chromosomes artificially
lined up in their numerical order. This
arrangement of the full chromosome
set is called a karyotype. (Adapted from
N. McNeil and T. Ried, Expert Rev. Mol.
Med. 2:1–14, 2000. With permission from
Cambridge University Press.)

Figure 4–11 The banding patterns of


human chromosomes. Chromosomes
1–22 are numbered in approximate order
of size. A typical human cell contains two
10 11
of each of these chromosomes, plus two
9 12 sex chromosomes—two X chromosomes
8
7 in a female, one X and one Y chromosome
3
4 6 in a male. The chromosomes used to
5 make these maps were stained at an early
1 stage in mitosis, when the chromosomes
2 are incompletely compacted. The
horizontal red line represents the position
of the centromere (see Figure 4–19),
which appears as a constriction on
mitotic chromosomes. The red knobs on
chromosomes 13, 14, 15, 21, and 22
indicate the positions of genes that code
Y for the large ribosomal RNAs (discussed
21
19 in Chapter 6). These banding patterns are
20
22 obtained by staining chromosomes with
16 18 Giemsa stain, and they can be observed
17 50 million
under the light microscope. (Adapted from
nucleotide pairs
15 1 µm U. Francke, Cytogenet. Cell Genet. 31:24–
14
13 32, 1981. With permission from the author.)
X
182 Chapter 4: DNA, Chromosomes, and Genomes

Figure 4–12 Aberrant human chromosomes. (A) Two normal human


chromosomes, 4 and 6. (B) In an individual carrying a balanced chromosomal
translocation, the DNA double helix in one chromosome has crossed over
with the DNA double helix in the other chromosome due to an abnormal
recombination event. The chromosome painting technique used on the
chromosomes in each of the sets allows the identification of even short
pieces of chromosomes that have become translocated, a frequent event in
cancer cells. (Courtesy of Zhenya Tang and the NIGMS Human Genetic Cell
Repository at the Coriell Institute for Medical Research: GM21880.)

The display of the 46 human chromosomes at mitosis is called the human (A) chromosome 6 chromosome 4

karyotype. If parts of chromosomes are lost or are switched between chromo-


somes, these changes can be detected either by changes in the banding patterns
or—with greater sensitivity—by changes in the pattern of chromosome painting
(Figure 4–12). Cytogeneticists use these alterations to detect inherited chromo-
some abnormalities and to reveal the chromosome rearrangements that occur in
cancer cells as they progress to malignancy (discussed in Chapter 20).

Chromosomes Contain Long Strings of Genes


Chromosomes carry genes—the functional units of heredity. A gene is often (B) reciprocal chromosomal translocation
defined as a segment of DNA that contains the instructions for making a particu-
lar protein (or a set of closely related proteins), but this definition is too narrow.
Genes that code for protein are indeed the majority, and most of the genes with MBoC6 n4.546/4.12
clear-cut mutant phenotypes fall under this heading. In addition, however, there
are many “RNA genes”—segments of DNA that generate a functionally significant
RNA molecule, instead of a protein, as their final product. We shall say more about
the RNA genes and their products later.
As might be expected, some correlation exists between the complexity of
an organism and the number of genes in its genome (see Table 1–2, p. 29). For
example, some simple bacteria have only 500 genes, compared to about 30,000
for humans. Bacteria, archaea, and some single-celled eukaryotes, such as yeast,
have concise genomes, consisting of little more than strings of closely packed
genes. However, the genomes of multicellular plants and animals, as well as many
other eukaryotes, contain, in addition to genes, a large quantity of interspersed
DNA whose function is poorly understood (Figure 4–13). Some of this additional
DNA is crucial for the proper control of gene expression, and this may in part
explain why there is so much of it in multicellular organisms, whose genes have to
be switched on and off according to complicated rules during development (dis-
cussed in Chapters 7 and 21).
Differences in the amount of DNA interspersed between genes, far more than
differences in numbers of genes, account for the astonishing variations in genome
size that we see when we compare one species with another (see Figure 1–32). For
example, the human genome is 200 times larger than that of the yeast Saccharo-
myces cerevisiae, but 30 times smaller than that of some plants and amphibians
and 200 times smaller than that of a species of amoeba. Moreover, because of dif-
ferences in the amount of noncoding DNA, the genomes of closely related organ- Figure 4–13 The arrangement of
isms (bony fish, for example) can vary several hundredfold in their DNA content, genes in the genome of S. cerevisiae
even though they contain roughly the same number of genes. Whatever the excess compared to humans. (A) S. cerevisiae is
a budding yeast widely used for brewing
(A) Saccharomyces cerevisiae and baking. The genome of this single-
celled eukaryote is distributed over 16
chromosomes. A small region of one
chromosome has been arbitrarily selected
0 10 20 30 kilobases to show its high density of genes. (B) A
region of the human genome of equal
(B) human
length to the yeast segment in (A). The
human genes are much less densely
packed and the amount of interspersed
0 10 20 30 kilobases DNA sequence is far greater. Not shown in
this sample of human DNA is the fact that
gene genome-wide repeat most human genes are much longer than
yeast genes (see Figure 4–15).
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 183

Y2 X Y1

X Y
Chinese muntjac Indian muntjac

DNA may do, it seems clear that it is not a great handicap for a eukaryotic cell to Figure 4–14 Two closely related species
carry a large amount of it. of deer with very different chromosome
numbers. In the evolution of the Indian
How the genome is divided into chromosomes also differs from one eukaryotic muntjac, initially separate chromosomes
species to the next. For example, while the cells of humans have 46 chromosomes, fused, without having a major effect on the
those of some small deer have only 6, while those of the common carp contain animal. These two species contain a similar
over 100. Even closely related species with similar genome sizes can have very number of genes. (Chinese muntjac photo
courtesy of Deborah Carreno, Natural
different numbers and sizes of chromosomes (Figure 4–14). Thus, there is no sim-
Wonders Photography.)
ple relationship between chromosome number, complexity of the organism, and
total genome size. Rather, the genomes and chromosomes of modern-day species
have each been shaped by a unique history of seemingly random genetic events,
acted on by poorly understood selection pressures over long evolutionary times.
MBoC6 m4.14/4.14
Figure 4–15 The organization of genes on
The Nucleotide Sequence of the Human Genome Shows How a human chromosome. (A) Chromosome
Our Genes Are Arranged 22, one of the smallest human chromosomes,
contains 48 × 106 nucleotide pairs and
With the publication of the full DNA sequence of the human genome in 2004, it makes up approximately 1.5% of the human
became possible to see in detail how the genes are arranged along each of our genome. Most of the left arm of chromosome
chromosomes (Figure 4–15). It will be many decades before the information con- 22 consists of short repeated sequences
tained in the human genome sequence is fully analyzed, but it has already stimu- of DNA that are packaged in a particularly
compact form of chromatin (heterochromatin)
lated new experiments that have had major effects on the content of every chapter discussed later in this chapter. (B) A tenfold
in this book. expansion of a portion of chromosome 22,
with about 40 genes indicated. Those in dark
(A) human chromosome 22 in its mitotic conformation, composed of two
double-stranded DNA molecules, each 48 × 106 nucleotide pairs long
brown are known genes and those in red are
predicted genes. (C) An expanded portion of
(B) showing four genes. (D) The intron–exon
arrangement of a typical gene is shown
after a further tenfold expansion. Each exon
heterochromatin (red) codes for a portion of the protein, while
×10 the DNA sequence of the introns (gray) is
relatively unimportant, as discussed in detail
in Chapter 6.
10% of chromosome arm ~40 genes The human genome (3.2 × 109 nucleotide
(B) pairs) is the totality of genetic information
belonging to our species. Almost all of this
genome is distributed over the 22 different
×10 autosomes and 2 sex chromosomes (see
1% of chromosome arm containing 4 genes
Figures 4–10 and 4–11) found within the
nucleus. A minute fraction of the human
(C) genome (16,569 nucleotide pairs—in multiple
copies per cell) is found in the mitochondria
×10 (introduced in Chapter 1, and discussed
in detail in Chapter 14). The term human
one gene of 3.4 × 104 nucleotide pairs
genome sequence refers to the complete
nucleotide sequence of DNA in the 24
(D) nuclear chromosomes and the mitochondria.
exon intron gene expression Being diploid, a human somatic cell nucleus
regulatory DNA
sequences
contains roughly twice the haploid amount of
RNA DNA, or 6.4 × 109 nucleotide pairs, when not
duplicating its chromosomes in preparation
protein for division. (Adapted from International
Human Genome Sequencing Consortium,
Nature 409:860–921, 2001. With permission
folded protein
from Macmillan Publishers Ltd.)

MBoC6 m4.15/4.15
184 Chapter 4: DNA, Chromosomes, and Genomes

TABLE 4–1 Some Vital Statistics for the Human Genome


Human genome
DNA length 3.2 × 109 nucleotide pairs*
Number of genes coding for proteins Approximately 21,000
Largest gene coding for protein 2.4 × 106 nucleotide pairs
Mean size for protein-coding genes 27,000 nucleotide pairs
Smallest number of exons per gene 1
Largest number of exons per gene 178
Mean number of exons per gene 10.4
Largest exon size 17,106 nucleotide pairs
Mean exon size 145 nucleotide pairs
Number of noncoding RNA genes Approximately 9000**
Number of pseudogenes*** More than 20,000
Percentage of DNA sequence in exons (protein-coding 1.5%
sequences)
Percentage of DNA in other highly conserved 3.5%
sequences****
Percentage of DNA in high-copy-number repetitive Approximately 50%
elements
* The sequence of 2.85 billion nucleotides is known precisely (error rate of only about 1 in
100,000 nucleotides). The remaining DNA primarily consists of short sequences that are
tandemly repeated many times over, with repeat numbers differing from one individual to the
next. These highly repetitive blocks are hard to sequence accurately.
** This number is only a very rough estimate.
*** A pseudogene is a DNA sequence closely resembling that of a functional gene, but
containing numerous mutations that prevent its proper expression or function. Most (A)
pseudogenes arise from the duplication of a functional gene followed by the accumulation of
damaging mutations in one copy.
**** These conserved functional regions include DNA encoding 5ʹ and 3ʹ UTRs (untranslated
regions of mRNA), DNA specifying structural and functional RNAs, and DNA with conserved
protein-binding sites.

The first striking feature of the human genome is how little of it (only a few
percent) codes for proteins (Table 4–1 and Figure 4–16). It is also notable that
nearly half of the chromosomal DNA is made up of mobile pieces of DNA that
have gradually inserted themselves in the chromosomes over evolutionary time,
multiplying like parasites in the genome (see Figure 4–62). We discuss these trans-
posable elements in detail in later chapters.
A second notable feature of the human genome is the large average gene
size—about 27,000 nucleotide pairs. As discussed above, a typical gene carries in
its linear sequence of nucleotides the information for the linear sequence of the (B)
amino acids of a protein. Only about 1300 nucleotide pairs are required to encode
a protein of average size (about 430 amino acids in humans). Most of the remain- Figure 4–16 Scale of the human genome.
ing sequence in a gene consists of long stretches of noncoding DNA that interrupt If drawn with a 1 mm space between each
nucleotide pair, as in (A), the human genome
the relatively short segments of DNA that code for protein. As will be discussed in would extend 3200 km (approximately
detail in Chapter 6, the coding sequences are called exons; the intervening (non- 2000 miles), far enough to stretch across
coding) sequences in genes are called introns (see Figure 4–15 and Table 4–1). the center of Africa,m4.16/4.16
MBoC6 the site of our human
The majority of human genes thus consist of a long string of alternating exons and origins (red line in B). At this scale, there
would be, on average, a protein-coding
introns, with most of the gene consisting of introns. In contrast, the majority of
gene every 150 m. An average gene would
genes from organisms with concise genomes lack introns. This accounts for the extend for 30 m, but the coding sequences
much smaller size of their genes (about one-twentieth that of human genes), as in this gene would add up to only just over
well as for the much higher fraction of coding DNA in their chromosomes. a meter.
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 185

In addition to introns and exons, each gene is associated with regulatory DNA
sequences, which are responsible for ensuring that the gene is turned on or off at
the proper time, expressed at the appropriate level, and only in the proper type of
cell. In humans, the regulatory sequences for a typical gene are spread out over
tens of thousands of nucleotide pairs. As would be expected, these regulatory
sequences are much more compressed in organisms with concise genomes. We
discuss how regulatory DNA sequences work in Chapter 7.
Research in the last decade has surprised biologists with the discovery that,
in addition to 21,000 protein-coding genes, the human genome contains many
thousands of genes that encode RNA molecules that do not produce proteins, but
instead have a variety of other important functions. What is thus far known about
these molecules will be presented in Chapters 6 and 7. Last, but not least, the
nucleotide sequence of the human genome has revealed that the archive of infor-
mation needed to produce a human seems to be in an alarming state of chaos. As
one commentator described our genome, “In some ways it may resemble your
garage/bedroom/refrigerator/life: highly individualistic, but unkempt; little evi-
dence of organization; much accumulated clutter (referred to by the uninitiated
as ‘junk’); virtually nothing ever discarded; and the few patently valuable items
indiscriminately, apparently carelessly, scattered throughout.” We shall discuss
how this is thought to have come about in the final sections of this chapter entitled
“How Genomes Evolve.” Figure 4–17 A simplified view of the
eukaryotic cell cycle. During interphase,
the cell is actively expressing its genes
Each DNA Molecule That Forms a Linear Chromosome Must and is therefore synthesizing proteins.
Contain a Centromere, Two Telomeres, and Replication Origins Also, during interphase and before cell
division, the DNA is replicated and each
To form a functional chromosome, a DNA molecule must be able to do more than chromosome is duplicated to produce two
simply carry genes: it must be able to replicate, and the replicated copies must be closely paired sister DNA molecules (called
sister chromatids). A cell with only one type
separated and reliably partitioned into daughter cells at each cell division. This
of chromosome, present in maternal and
process occurs through an ordered series of stages, collectively known as the cell paternal copies, is illustrated here. Once
cycle, which provides for a temporal separation between the duplication of chro- DNA replication is complete, the cell can
mosomes and their segregation into two daughter cells. The cell cycle is briefly enter M phase, when mitosis occurs and
summarized in Figure 4–17, and it is discussed in detail in Chapter 17. Briefly, the nucleus is divided into two daughter
nuclei. During this stage, the chromosomes
during a long interphase, genes are expressed and chromosomes are replicated, condense, the nuclear envelope breaks
with the two replicas remaining together as a pair of sister chromatids. Through- down, and the mitotic spindle forms from
out this time, the chromosomes are extended and much of their chromatin exists microtubules and other proteins. The
as long threads in the nucleus so that individual chromosomes cannot be easily condensed mitotic chromosomes are
distinguished. It is only during a much briefer period of mitosis that each chro- captured by the mitotic spindle, and one
complete set of chromosomes is then
mosome condenses so that its two sister chromatids can be separated and dis- pulled to each end of the cell by separating
tributed to the two daughter nuclei. The highly condensed chromosomes in a the members of each sister-chromatid pair.
dividing cell are known as mitotic chromosomes (Figure 4–18). This is the form A nuclear envelope re-forms around each
in which chromosomes are most easily visualized; in fact, the images of chromo- chromosome set, and in the final step of
M phase, the cell divides to produce two
somes shown so far in the chapter are of chromosomes in mitosis.
daughter cells. Most of the time in the cell
Each chromosome operates as a distinct structural unit: for a copy to be passed cycle is spent in interphase; M phase is
on to each daughter cell at division, each chromosome must be able to replicate, brief in comparison, occupying only about
and the newly replicated copies must subsequently be separated and partitioned an hour in many mammalian cells.

paternal interphase chromosome mitotic


maternal interphase chromosome spindle

GENE EXPRESSION MITOSIS CELL


AND CHROMOSOME DIVISION
DUPLICATION

nuclear envelope
mitotic
surrounding the nucleus
chromosome

INTERPHASE M PHASE INTERPHASE


186 Chapter 4: DNA, Chromosomes, and Genomes

correctly into the two daughter cells. These basic functions are controlled by three
types of specialized nucleotide sequences in the DNA, each of which binds spe-
cific proteins that guide the machinery that replicates and segregates chromo-
somes (Figure 4–19).
Experiments in yeasts, whose chromosomes are relatively small and easy to
manipulate, have identified the minimal DNA sequence elements responsible for
each of these functions. One type of nucleotide sequence acts as a DNA repli-
cation origin, the location at which duplication of the DNA begins. Eukaryotic
chromosomes contain many origins of replication to ensure that the entire chro-
mosome can be replicated rapidly, as discussed in detail in Chapter 5.
After DNA replication, the two sister chromatids that form each chromosome
remain attached to one another and, as the cell cycle proceeds, are condensed
further to produce mitotic chromosomes. The presence of a second specialized
DNA sequence, called a centromere, allows one copy of each duplicated and con-
densed chromosome to be pulled into each daughter cell when a cell divides. A
protein complex called a kinetochore forms at the centromere and attaches the
duplicated chromosomes to the mitotic spindle, allowing them to be pulled apart
(discussed in Chapter 17).
The third specialized DNA sequence forms telomeres, the ends of a chromo- 1 µm
some. Telomeres contain repeated nucleotide sequences that enable the ends of
chromosomes to be efficiently replicated. Telomeres also perform another func- Figure 4–18 A mitotic chromosome.
tion: the repeated telomere DNA sequences, together with the regions adjoining A mitotic chromosome is a condensed
them, form structures that protect the end of the chromosome from being mis- duplicated chromosome in which the
taken by the cell for a broken DNA molecule in need of repair. We discuss both this two new chromosomes, called sister
type of repair and the structure and function of telomeres in Chapter 5. chromatids, are still linked together (see
Figure 4–17). The constricted region
In yeast cells, the three types of sequences required to propagate a chromo- indicates the position of the centromere.
some are relatively short (typically less than 1000 base pairs each) and therefore (Courtesy of Terry m4.20/4.18
MBoC6 D. Allen.)
use only a tiny fraction of the information-carrying capacity of a chromosome.
Although telomere sequences are fairly simple and short in all eukaryotes, the
DNA sequences that form centromeres and replication origins in more complex
organisms are much longer than their yeast counterparts. For example, experi-
ments suggest that a human centromere can contain up to a million nucleotide
pairs and that it may not require a stretch of DNA with a defined nucleotide
sequence. Instead, as we shall discuss later in this chapter, a human centromere
is thought to consist of a large, regularly repeating protein–nucleic acid structure
that can be inherited when a chromosome replicates.

Figure 4–19 The three DNA sequences


INTERPHASE MITOSIS INTERPHASE required to produce a eukaryotic
chromosome that can be replicated and
telomere then segregated accurately at mitosis.
Each chromosome has multiple origins
of replication, one centromere, and two
telomeres. Shown here is the sequence of
replication events that a typical chromosome follows
origin during the cell cycle. The DNA replicates
CELL in interphase, beginning at the origins of
DIVISION replication and proceeding bidirectionally
+
from the origins across the chromosome.
In M phase, the centromere attaches the
centromere duplicated chromosomes to the mitotic
spindle so that a copy of the entire genome
is distributed to each daughter cell during
mitosis; the special structure that attaches
the centromere to the spindle is a protein
complex called the kinetochore (dark
portion of green). The centromere also helps to hold
mitotic spindle the duplicated chromosomes together
replicated duplicated
chromosome chromosomes until they are ready to be moved apart.
in separate The telomeres form special caps at each
daughter cells chromosome end.
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 187

DNA Molecules Are Highly Condensed in Chromosomes


All eukaryotic organisms have special ways of packaging DNA into chromosomes.
For example, if the 48 million nucleotide pairs of DNA in human chromosome
22 could be laid out as one long perfect double helix, the molecule would extend
for about 1.5 cm if stretched out end to end. But chromosome 22 measures only
about 2 μm in length in mitosis (see Figures 4–10 and 4–11), representing an end-
to-end compaction ratio of over 7000-fold. This remarkable feat of compression
is performed by proteins that successively coil and fold the DNA into higher and
higher levels of organization. Although much less condensed than mitotic chro-
mosomes, the DNA of human interphase chromosomes is still tightly packed.
In reading these sections it is important to keep in mind that chromosome
structure is dynamic. We have seen that each chromosome condenses to an
extreme degree in the M phase of the cell cycle. Much less visible, but of enormous
interest and importance, specific regions of interphase chromosomes decon-
dense to allow access to specific DNA sequences for gene expression, DNA repair,
and replication—and then recondense when these processes are completed. The
packaging of chromosomes is therefore accomplished in a way that allows rapid
localized, on-demand access to the DNA. In the next sections, we discuss the spe-
cialized proteins that make this type of packaging possible.

Nucleosomes Are a Basic Unit of Eukaryotic Chromosome


Structure
The proteins that bind to the DNA to form eukaryotic chromosomes are tradi-
tionally divided into two classes: the histones and the non-histone chromosomal
proteins, each contributing about the same mass to a chromosome as the DNA.
The complex of both classes of protein with the nuclear DNA of eukaryotic cells is
known as chromatin (Figure 4–20).
Histones are responsible for the first and most basic level of chromosome
packing, the nucleosome, a protein–DNA complex discovered in 1974. When
interphase nuclei are broken open very gently and their contents examined under
the electron microscope, most of the chromatin appears to be in the form of a
fiber with a diameter of about 30 nm (Figure 4–21A). If this chromatin is sub-
jected to treatments that cause it to unfold partially, it can be seen under the elec-
tron microscope as a series of “beads on a string” (Figure 4–21B). The string is
DNA, and each bead is a “nucleosome core particle” that consists of DNA wound
around a histone core (Movie 4.2).
The structural organization of nucleosomes was determined after first isolat-
ing them from unfolded chromatin by digestion with particular enzymes (called
nucleases) that break down DNA by cutting between the nucleosomes. After
digestion for a short period, the exposed DNA between the nucleosome core par-
ticles, the linker DNA, is degraded. Each individual nucleosome core particle con-
sists of a complex of eight histone proteins—two molecules each of histones H2A,

chromatin

DNA
Figure 4–20 Chromatin. As illustrated,
chromatin consists of DNA bound to both
histone and non-histone proteins. The
mass of histone protein present is about
equal to the total mass of non-histone
protein, but—as schematically indicated
here—the latter class is composed of an
enormous number of different species. In
total, a chromosome is about one-third
histone non-histone proteins DNA and two-thirds protein by mass.
188 Chapter 4: DNA, Chromosomes, and Genomes

(A) Figure 4–21 Nucleosomes as seen in


the electron microscope. (A) Chromatin
isolated directly from an interphase nucleus
appears in the electron microscope as a
thread about 30 nm thick. (B) This electron
micrograph shows a length of chromatin
that has been experimentally unpacked,
or decondensed, after isolation to show
(B) the nucleosomes. (A, courtesy of Barbara
Hamkalo; B, courtesy of Victoria Foe.)

50 nm

H2B, H3, and H4—and double-stranded DNA that is 147 nucleotide pairs long.
The histone octamer forms a protein core around which the double-stranded DNA
is wound (Figure 4–22).
The region of linker DNA that separates each nucleosome core particle from
MBoC6
the next can vary in length from m4.22/4.20
a few nucleotide pairs up to about 80. (The term
nucleosome technically refers to a nucleosome core particle plus one of its adjacent
DNA linkers, but it is often used synonymously with nucleosome core particle.)
On average, therefore, nucleosomes repeat at intervals of about 200 nucleotide
pairs. For example, a diploid human cell with 6.4 × 109 nucleotide pairs contains
core histones
approximately 30 million nucleosomes. The formation of nucleosomes converts a linker DNA of nucleosome
DNA molecule into a chromatin thread about one-third of its initial length.

The Structure of the Nucleosome Core Particle Reveals How DNA nucleosome includes
“beads-on-a-string”
Is Packaged form of chromatin ~200 nucleotide
pairs of DNA
The high-resolution structure of a nucleosome core particle, solved in 1997, NUCLEASE
revealed a disc-shaped histone core around which the DNA was tightly wrapped DIGESTS
in a left-handed coil of 1.7 turns (Figure 4–23). All four of the histones that make LINKER DNA

up the core of the nucleosome are relatively small proteins (102–135 amino acids),
and they share a structural motif, known as the histone fold, formed from three α
helices connected by two loops (Figure 4–24). In assembling a nucleosome, the
histone folds first bind to each other to form H3–H4 and H2A–H2B dimers, and
the H3–H4 dimers combine to form tetramers. An H3–H4 tetramer then further
combines with two H2A–H2B dimers to form the compact octamer core, around released
which the DNA is wound. nucleosome 11 nm
core particle
The interface between DNA and histone is extensive: 142 hydrogen bonds are
formed between DNA and the histone core in each nucleosome. Nearly half of
these bonds form between the amino acid backbone of the histones and the sug- DISSOCIATION
WITH HIGH
ar-phosphate backbone of the DNA. Numerous hydrophobic interactions and salt CONCENTRATION
linkages also hold DNA and protein together in the nucleosome. More than one- OF SALT
fifth of the amino acids in each of the core histones are either lysine or arginine
(two amino acids with basic side chains), and their positive charges can effectively

octameric 147-nucleotide-pair
Figure 4–22 Structural organization of the nucleosome. A nucleosome histone core DNA double helix
contains a protein core made of eight histone molecules. In biochemical
experiments, the nucleosome core particle can be released from isolated
chromatin by digestion of the linker DNA with a nuclease, an enzyme that DISSOCIATION
breaks down DNA. (The nuclease can degrade the exposed linker DNA but
cannot attack the DNA wound tightly around the nucleosome core.) After
dissociation of the isolated nucleosome into its protein core and DNA, the
length of the DNA that was wound around the core can be determined.
This length of 147 nucleotide pairs is sufficient to wrap 1.7 times around the
histone core. H2A H2B H3 H4
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 189

Figure 4–23 The structure of a nucleosome


core particle, as determined by x-ray
diffraction analyses of crystals. Each
histone is colored according to the scheme in
Figure 4–22, with the DNA double helix in light
gray. (Adapted from K. Luger et al., Nature
389:251–260, 1997. With permission from
Macmillan Publishers Ltd.)

DNA double helix

side view edge view

histone H2A histone H2B histone H3 histone H4

neutralize the negatively charged DNA backbone. These numerous interactions


explain in part why DNA of virtually any sequence can be bound on a histone
octamer core. The path of the DNA around the histone core is not smooth; rather,
several kinks are seen in the DNA, as expected from the nonuniform surface of the

(A)
H2A
N C

H2B N C
MBoC6 m4.24/4.22
H3 N C

H4 N C

N-terminal tail histone fold


(B) (D)
N

N C
N

histone
octamer
(C)
N

C
C N
N
N
N N

N N

Figure 4–24 The overall structural organization of the core histones. (A) Each of the core
histones contains an N-terminal tail, which is subject to several forms of covalent modification, and
a histone fold region, as indicated. (B) The structure of the histone fold, which is formed by all four
of the core histones. (C) Histones 2A and 2B form a dimer through an interaction known as the
“handshake.” Histones H3 and H4 form a dimer through the same type of interaction. (D) The final
histone octamer on DNA. Note that all eight N-terminal tails of the histones protrude from the disc-
shaped core structure. Their conformations are highly flexible, and they serve as binding sites for
sets of other proteins.
190 Chapter 4: DNA, Chromosomes, and Genomes

core. The bending requires a substantial compression of the minor groove of the G-C preferred here
(minor groove outside)
DNA helix. Certain dinucleotides in the minor groove are especially easy to com-
press, and some nucleotide sequences bind the nucleosome more tightly than
others (Figure 4–25). This probably explains some striking, but unusual, cases
of very precise positioning of nucleosomes along a stretch of DNA. However, the
sequence preference of nucleosomes must be weak enough to allow other factors
to dominate, inasmuch as nucleosomes can occupy any one of a number of posi-
tions relative to the DNA sequence in most chromosomal regions.
In addition to its histone fold, each of the core histones has an N-terminal AA, TT, and TA dinucleotides
amino acid “tail,” which extends out from the DNA–histone core (see Figure preferred here
4–24D). These histone tails are subject to several different types of covalent mod- (minor groove inside)

ifications that in turn control critical aspects of chromatin structure and function, histone core DNA of
of nucleosome nucleosome
as we shall discuss shortly. (histone octamer)
As a reflection of their fundamental role in DNA function through controlling
chromatin structure, the histones are among the most highly conserved eukary- Figure 4–25 The bending of DNA in a
otic proteins. For example, the amino acid sequence of histone H4 from a pea nucleosome. The DNA helix makes
differs from that of a cow at only 2 of the 102 positions. This strong evolution- 1.7 tight turns around the histone octamer.
ary conservation suggests that the functions of histones involve nearly all of their This diagram illustrates how the minor
groove is compressed on the inside of the
amino acids, so that a change in any position is deleterious to the cell. But in addi- turn. Owing to structural features of the
tion to this remarkable conservation, eukaryotic organisms also produce smaller DNA molecule, the indicated dinucleotides
amounts of specialized variant core histones that differ in amino acid sequence are preferentially accommodated in such
from the main ones. As discussed later, these variants, combined with the surpris- a narrow minor groove, which helps to
explain why certain DNA sequences
ingly large number of covalent modifications that can be added to the histones in
will bind more tightly than others to the
nucleosomes, give rise to a variety of chromatin structures in cells. nucleosome core.

Nucleosomes Have a Dynamic Structure, and Are Frequently


MBoC6 m4.27/4.24
Subjected to Changes Catalyzed by ATP-Dependent Chromatin
Remodeling Complexes
For many years biologists thought that, once formed in a particular position on
DNA, a nucleosome would remain fixed in place because of the very tight asso-
ciation between its core histones and DNA. If true, this would pose problems for
genetic readout mechanisms, which in principle require easy access to many
specific DNA sequences. It would also hinder the rapid passage of the DNA tran-
scription and replication machinery through chromatin. But kinetic experiments
show that the DNA in an isolated nucleosome unwraps from each end at a rate of
about four times per second, remaining exposed for 10 to 50 milliseconds before
the partially unwrapped structure recloses. Thus, most of the DNA in an isolated
nucleosome is in principle available for binding other proteins.
For the chromatin in a cell, a further loosening of DNA–histone contacts is
clearly required, because eukaryotic cells contain a large variety of ATP-depen-
dent chromatin remodeling complexes. These complexes include a subunit that
hydrolyzes ATP (an ATPase evolutionarily related to the DNA helicases discussed
in Chapter 5). This subunit binds both to the protein core of the nucleosome and
to the double-stranded DNA that winds around it. By using the energy of ATP
hydrolysis to move this DNA relative to the core, the protein complex changes the
structure of a nucleosome temporarily, making the DNA less tightly bound to the
histone core. Through repeated cycles of ATP hydrolysis that pull the nucleosome
core along the DNA double helix, the remodeling complexes can catalyze nucle-
osome sliding. In this way, they can reposition nucleosomes to expose specific
regions of DNA, thereby making them available to other proteins in the cell (Fig-
ure 4–26). In addition, by cooperating with a variety of other proteins that bind to
histones and serve as histone chaperones, some remodeling complexes are able to
remove either all or part of the nucleosome core from a nucleosome—catalyzing
either an exchange of its H2A–H2B histones, or the complete removal of the oct-
americ core from the DNA (Figure 4–27). As a result of such processes, measure-
ments reveal that a typical nucleosome is replaced on the DNA every one or two
hours inside the cell.
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 191

Cells contain dozens of different ATP-dependent chromatin remodeling com-


plexes that are specialized for different roles. Most are large protein complexes
that can contain 10 or more subunits, some of which bind to specific modifica-
tions on histones (see Figure 4–26C). The activity of these complexes is carefully
controlled by the cell. As genes are turned on and off, chromatin remodeling com-
plexes are brought to specific regions of DNA where they act locally to influence
chromatin structure (discussed in Chapter 7; see also Figure 4–40, below).
Although some DNA sequences bind more tightly than others to the nucleo-
some core (see Figure 4–25), the most important influence on nucleosome posi-
tioning appears to be the presence of other tightly bound proteins on the DNA.
Some bound proteins favor the formation of a nucleosome adjacent to them.
Others create obstacles that force the nucleosomes to move elsewhere. The exact
positions of nucleosomes along a stretch of DNA therefore depend mainly on the
presence and nature of other proteins bound to the DNA. And due to the presence
of ATP-dependent chromatin remodeling complexes, the arrangement of nucle-
osomes on DNA can be highly dynamic, changing rapidly according to the needs
of the cell.

Nucleosomes Are Usually Packed Together into a Compact


Chromatin Fiber
Although enormously long strings of nucleosomes form on the chromosomal
DNA, chromatin in a living cell probably rarely adopts the extended “beads-on-a-
string” form. Instead, the nucleosomes are packed on top of one another, gener-
ating arrays in which the DNA is even more highly condensed. Thus, when nuclei
are very gently lysed onto an electron microscope grid, much of the chromatin is
seen to be in the form of a fiber with a diameter of about 30 nm, which is consid-
erably wider than chromatin in the “beads-on-a-string” form (see Figure 4–21).

ATP-dependent
chromatin remodeling
complex

Figure 4–26 The nucleosome sliding


catalyzed by ATP-dependent chromatin
remodeling complexes. (A) Using the
energy of ATP hydrolysis, the remodeling
complex is thought to push on the DNA
of its bound nucleosome and loosen its
ATP ADP attachment to the nucleosome core. Each
cycle of ATP binding, ATP hydrolysis, and
release of the ADP and Pi products thereby
CATALYSIS OF
moves the DNA with respect to the histone
NUCLEOSOME SLIDING octamer in the direction of the arrow in this
diagram. It requires many such cycles to
produce the nucleosome sliding shown.
(A) (B) The structure of a nucleosome-bound
dimer of the two identical ATPase subunits
(green) that slide nucleosomes back and
forth in the ISW1 family of chromatin
remodeling complexes. (C) The structure
of a large chromatin remodeling complex,
showing how it is thought to wrap around a
nucleosome. Modeled in green is the yeast
RSC complex, which contains 15 subunits—
including an ATPase and at least four
subunits with domains that recognize specific
covalently modified histones. (B, from
L.R. Racki et al., Nature 462:1016–1021,
2009. With permission from Macmillan
Publishers Ltd; C, adapted from
A.E. Leschziner et al., Proc. Natl Acad. Sci.
10 nm
(B) (C) USA 104:4913–4918, 2007.)
192 Chapter 4: DNA, Chromosomes, and Genomes

Figure 4–27 Nucleosome removal and histone


exchange catalyzed by ATP-dependent chromatin
histone chaperone remodeling complexes. By cooperating with specific
members of a large family of different histone chaperones,
some chromatin remodeling complexes can remove
the H2A–H2B dimers from a nucleosome (top series of
ATP-dependent reactions) and replace them with dimers that contain a
chromatin variant histone, such as the H2AZ–H2B dimer (see Figure
remodeling
complex 4–35). Other remodeling complexes are attracted to
specific sites on chromatin and cooperate with histone
chaperones to remove the histone octamer completely
EXCHANGE OF and/or to replace it with a different nucleosome core
ATP ADP
H2A–H2B DIMERS (bottom series of reactions). Highly simplified views of the
processes are illustrated here.

ATP ADP ATP ADP

EXCHANGE OF
DNA lacking
NUCLEOSOME CORE
nucleosome
(HISTONE OCTAMER)
histone
chaperone

How nucleosomes are organized into condensed arrays is unclear. The struc-
ture of a tetranucleosome (a complex of four nucleosomes) obtained by x-ray
crystallography and high-resolution electron microscopy of reconstituted chro-
matin have been used to support a zigzag model for the stacking of nucleosomes
in a 30-nm fiber (Figure 4–28). But cryoelectron microscopy of carefully prepared
nuclei suggests that most regions of chromatin are less regularly structured.
What causes nucleosomes to stack so tightly on each other? Nucleosome-to-
nucleosome linkages that involve histone tails, most notably the H4 tail, consti-
tute one important factor (Figure 4–29). Another important factor is an additional
MBoC6 m4.30/4.26
histone that is often present in a 1-to-1 ratio with nucleosome cores, known as Figure 4–28 A zigzag model for the 30-
histone H1. This so-called linker histone is larger than the individual core histones nm chromatin fiber. (A) The conformation
and it has been considerably less well conserved during evolution. A single his- of two of the four nucleosomes in a
tetranucleosome, from a structure
tone H1 molecule binds to each nucleosome, contacting both DNA and protein, determined by x-ray crystallography.
and changing the path of the DNA as it exits from the nucleosome. This change in (B) Schematic of the entire tetranucleosome;
the exit path of DNA is thought to help compact nucleosomal DNA (Figure 4–30). the fourth nucleosome is not visible, being
stacked on the bottom nucleosome and
behind it in this diagram. (C) Diagrammatic
illustration of a possible zigzag structure
that could account for the 30-nm chromatin
fiber. (A, PDB code: 1ZBB; C, adapted
from C.L. Woodcock, Nat. Struct. Mol. Biol.
12:639–640, 2005. With permission from
Macmillan Publishers Ltd.)

(B) (C)

(A)
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 193

H4 tail
H2B tail
H2A tail H3 tail

H2A tail
H4 tail

H2B tail

H3 tail
(A) (B)

Most eukaryotic organisms make several histone H1 proteins of related but quite Figure 4–29 A model for the role played
distinct amino acid sequences. The presence of many other DNA-binding pro- by histone tails in the compaction of
chromatin. (A) A schematic diagram
teins, as well as proteins that bind directly to histones, is certain to add important shows the approximate exit points of
additional features to any array of nucleosomes. the eight histone tails, one from each
MBoC6 m4.33/4.27.5 histone protein, that extend from each
Summary nucleosome. The actual structure is
shown to its right. In the high-resolution
A gene is a nucleotide sequence in a DNA molecule that acts as a functional unit structure of the nucleosome, the tails are
largely unstructured, suggesting that they
for the production of a protein, a structural RNA, or a catalytic or regulatory RNA
are highly flexible. (B) As indicated, the
molecule. In eukaryotes, protein-coding genes are usually composed of a string of histone tails are thought to be involved in
alternating introns and exons associated with regulatory regions of DNA. A chro- interactions between nucleosomes that
mosome is formed from a single, enormously long DNA molecule that contains a help to pack them together. (A, PDB
linear array of many genes, bound to a large set of proteins. The human genome code: 1K X 5.)
contains 3.2 × 109 DNA nucleotide pairs, divided between 22 different autosomes
(present in two copies each) and 2 sex chromosomes. Only a small percentage of this
DNA codes for proteins or functional RNA molecules. A chromosomal DNA mole-
cule also contains three other types of important nucleotide sequences: replication
origins and telomeres allow the DNA molecule to be efficiently replicated, while a
centromere attaches the sister DNA molecules to the mitotic spindle, ensuring their
accurate segregation to daughter cells during the M phase of the cell cycle.
The DNA in eukaryotes is tightly bound to an equal mass of histones, which
form repeated arrays of DNA–protein particles called nucleosomes. The nucleosome
is composed of an octameric core of histone proteins around which the DNA dou-
ble helix is wrapped. Nucleosomes are spaced at intervals of about 200 nucleotide
pairs, and they are usually packed together (with the aid of histone H1 molecules)
into quasi-regular arrays to form a 30-nm chromatin fiber. Even though compact,
the structure of chromatin must be highly dynamic to allow access to the DNA.
There is some spontaneous DNA unwrapping and rewrapping in the nucleosome
itself; however, the general strategy for reversibly changing local chromatin struc-
ture features ATP-driven chromatin remodeling complexes. Cells contain a large set
of such complexes, which are targeted to specific regions of chromatin at appropri-
ate times. The remodeling complexes collaborate with histone chaperones to allow
nucleosome cores to be repositioned, reconstituted with different histones, or com-
pletely removed to expose the underlying DNA.

histone H1

Figure 4–30 How the linker histone


binds to the nucleosome. The position
and structure of histone H1 is shown. The
H1 core region constrains an additional
nucleosome 20 nucleotide pairs of DNA where it exits
from the nucleosome core and is important
for compacting chromatin. (A) Schematic,
and (B) structure inferred for a single
C nucleosome from a structure determined
by high-resolution electron microscopy of
(A) N
histone H1
(B) (C) a reconstituted chromatin fiber (C). (B and
C, adapted from F. Song et al., Science
344:376–380, 2014.)
194 Chapter 4: DNA, Chromosomes, and Genomes

CHROMATIN STRUCTURE AND FUNCTION


Having described how DNA is packaged into nucleosomes to create a chromatin
fiber, we now turn to the mechanisms that create different chromatin structures
in different regions of a cell’s genome. Mechanisms of this type have a variety of
important functions in cells. Most strikingly, certain types of chromatin structure
can be inherited; that is, the structure can be directly passed down from a cell
to its descendants. Because the cell memory that results is based on an inher-
ited chromatin structure rather than on a change in DNA sequence, this is a form
of epigenetic inheritance. The prefix epi is Greek for “on”; this is appropriate,
because epigenetics represents a form of inheritance that is superimposed on the
genetic inheritance based on DNA.
In Chapter 7, we shall introduce the many different ways in which the expres-
sion of genes is regulated. There we discuss epigenetic inheritance in detail and
present several different mechanisms that can produce it. Here, we are con-
cerned with only one, that based on chromatin structure. We begin this section by
reviewing the observations that first demonstrated that chromatin structures can
be inherited. We then describe some of the chemistry that makes this possible—
the covalent modification of histones in nucleosomes. These modifications have
many functions, inasmuch as they serve as recognition sites for protein domains
that link specific protein complexes to different regions of chromatin. Histones
thereby have effects on gene expression, as well as on many other DNA-linked
processes. Through such mechanisms, chromatin structure plays an important
role in the development, growth, and maintenance of all eukaryotic organisms,
including ourselves.

Heterochromatin Is Highly Organized and Restricts Gene


Expression
Light-microscope studies in the 1930s distinguished two types of chromatin in
the interphase nuclei of many higher eukaryotic cells: a highly condensed form,
called heterochromatin, and all the rest, which is less condensed, called euchro-
matin. Heterochromatin represents an especially compact form of chromatin
(see Figure 4–9), and we are finally beginning to understand its molecular prop-
erties. It is highly concentrated in certain specialized regions, most notably at the
centromeres and telomeres introduced previously (see Figure 4–19), but it is also
present at many other locations along chromosomes—locations that can vary
according to the physiological state of the cell. In a typical mammalian cell, more
than 10% of the genome is packaged in this way.
The DNA in heterochromatin typically contains few genes, and when euchro-
matic regions are converted to a heterochromatic state, their genes are generally
switched off as a result. However, we know now that the term heterochromatin
encompasses several distinct modes of chromatin compaction that have different
implications for gene expression. Thus, heterochromatin should not be thought
of as simply encapsulating “dead” DNA, but rather as a descriptor for compact
chromatin domains that share the common feature of being unusually resistant
to gene expression.

The Heterochromatic State Is Self-Propagating


Through chromosome breakage and rejoining, whether brought about by a nat-
ural genetic accident or by experimental artifice, a piece of chromosome that is
normally euchromatic can be translocated into the neighborhood of heteroch-
romatin. Remarkably, this often causes silencing—inactivation—of the normally
active genes. This phenomenon is referred to as a position effect. It reflects a
spreading of the heterochromatic state into the originally euchromatic region,
and it has provided important clues to the mechanisms that create and maintain
heterochromatin. First recognized in Drosophila, position effects have now been
observed in many eukaryotes, including yeasts, plants, and humans.
CHROMATIN STRUCTURE AND FUNCTION 195

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

barrier
genes
heterochromatin euchromatin early in the developing embryo, heterochromatin forms and spreads into neighboring
1 2 3 4 5 euchromatin to different extents in different cells

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
CHROMOSOME
TRANSLOCATION
cell proliferation
1 2 3 4 5

heterochromatin euchromatin

clone of cells with clone of cells with clone of cells with


gene 1 inactive genes 1, 2, and 3 inactive no genes inactivated

(A) (B)

Figure 4–31 The cause of position effect variegation in Drosophila. (A) Heterochromatin (green) is normally prevented from
spreading into adjacent regions of euchromatin (red) by barrier DNA sequences, which we shall discuss shortly. In flies that
inherit certain chromosomal rearrangements, however, this barrier is no longer present. (B) During the early development of such
flies, heterochromatin can spread into neighboring chromosomal DNA, proceeding for different distances in different cells. This
spreading soon stops, but the established pattern of heterochromatin is subsequently inherited, so that large clones of progeny
cells are produced that have the same neighboring genes condensed into heterochromatin and thereby inactivated (hence the
“variegated” appearance of some of these flies; see Figure 4–32). Although “spreading” is used to describe the formation of
new heterochromatin close to previously existing heterochromatin, the term may not be wholly accurate. There is evidence that
during expansion, the condensation of DNA into heterochromatin can “skip over” some regions of chromatin, sparing the genes
that lie within them from repressive effects.

In chromosome breakage-and-rejoining events of the sort just described, the


zone of silencing, where euchromatin is converted to a heterochromatic state,
spreads for different distances in differentMBoC6
early m4.36/4.29
cells in the fly embryo. Remark-
ably, these differences then are perpetuated for the rest of the animal’s life: in
each cell, once the heterochromatic condition is established on a piece of chro-
matin, it tends to be stably inherited by all of that cell’s progeny (Figure 4–31). This
remarkable phenomenon, called position effect variegation, was first recognized
through a detailed genetic analysis of the mottled loss of red pigment in the fly eye
(Figure 4–32). It shares features with the extensive spread of heterochromatin that
inactivates one of the two X chromosomes in female mammals. There too, a ran-
dom process acts in each cell of the early embryo to dictate which X chromosome Figure 4–32 The discovery of position
will be inactivated, and that same X chromosome then remains inactive in all the effects on gene expression. The White
gene in the fruit fly Drosophila controls eye
cell’s progeny, creating a mosaic of different clones of cells in the adult body (see pigment production and is named after the
Figure 7–50). mutation that first identified it. Wild-type
These observations, taken together, point to a fundamental strategy of het- flies with a normal White gene (White+)
erochromatin formation: heterochromatin begets more heterochromatin. This have normal pigment production, which
positive feedback can operate both in space, causing the heterochromatic state to gives them red eyes, but if the White gene
is mutated and inactivated, the mutant
spread along the chromosome, and in time, across cell generations, propagating flies (White–) make no pigment and have
the heterochromatic state of the parent cell to its daughters. The challenge is to white eyes. In flies in which a normal White
explain the molecular mechanisms that underlie this remarkable behavior. gene has been moved near a region of
heterochromatin, the eyes are mottled,
with both red and white patches. The white
White gene patches represent cell lineages in which
at normal barrier the White gene has been silenced by the
location
effects of the heterochromatin. In contrast,
heterochromatin
the red patches represent cell lineages in
which the White gene is expressed. Early
in development, when the heterochromatin
rare chromosome is first formed, it spreads into neighboring
inversion euchromatin to different extents in different
embryonic cells (see Figure 4–31). The
presence of large patches of red and white
cells reveals that the state of transcriptional
activity, as determined by the packaging of
White gene
barrier near heterochromatin
this gene into chromatin in those ancestor
cells, is inherited by all daughter cells.
196 Chapter 4: DNA, Chromosomes, and Genomes

(A) LYSINE ACETYLATION AND METHYLATION ARE COMPETING REACTIONS

H O H O H O H O H O

N C C N C C N C C N C C N C C

H CH2 H CH2 H CH2 H CH2 H CH2

CH2 CH2 CH2 CH2 CH2

CH2 CH2 CH2 CH2 CH2

CH2 CH2 CH2 CH2 CH2


+ + + +
N NH3 N N N
H C O H3C H H3C CH3 H3C CH3
lysine H H CH3
CH3

acetyl lysine monomethyl lysine dimethyl lysine trimethyl lysine

Figure 4–33 Some prominent types of covalent amino acid side-chain (B) SERINE PHOSPHORYLATION
modifications found on nucleosomal histones. (A) Three different levels
H O H O
of lysine methylation are shown; each can be recognized by a different
binding protein and thus each can have a different significance for the cell. N C C N C C
Note that acetylation removes the plus charge on lysine, and that, most
importantly, an acetylated lysine cannot be methylated, and vice versa. H CH2 H CH2
(B) Serine phosphorylation adds a negative charge to a histone. Modifications
of histones not shown here include the mono- or dimethylation of an arginine, OH O
the phosphorylation of a threonine, the addition of ADP-ribose to a glutamic
acid, and the addition of a ubiquityl, sumoyl, or biotin group to a lysine. serine O P O
_
O
As a first step, one can carry out a search for the molecules that are involved.
phosphoserine
This has been done by means of genetic screens, in which large numbers of
mutants are generated, after which one picks out those that show an abnormal-
ity of the process in question. Extensive genetic screens in Drosophila, fungi, and
mice have identified more than 100 genes whose products either enhance or sup-
press the spread of heterochromatin and its stable inheritance—in other words,
genes that serve as either enhancers or suppressors of position effect variegation.
Many of these genes turn out to code for non-histone chromosomal proteins that
interact with histones and are involved in modifying or maintaining chromatin
structure. We shall discuss how they work in the sections that follow.

The Core Histones Are Covalently ModifiedMBoC6


at Many Different Sites
m4.38/4.31
The amino acid side chains of the four histones in the nucleosome core are sub-
jected to a remarkable variety of covalent modifications, including the acetylation
of lysines, the mono-, di-, and trimethylation of lysines, and the phosphorylation
of serines (Figure 4–33). A large number of these side-chain modifications occur
on the eight relatively unstructured N-terminal “histone tails” that protrude from
the nucleosome (Figure 4–34). However, there are also more than 20 specific side-
chain modifications on the nucleosome’s globular core.
All of the above types of modifications are reversible, with one enzyme serv-
ing to create a particular type of modification, and another to remove it. These
enzymes are highly specific. Thus, for example, acetyl groups are added to specific
lysines by a set of different histone acetyl transferases (HATs) and removed by a set
of histone deacetylase complexes (HDACs). Likewise, methyl groups are added to
lysine side chains by a set of different histone methyl transferases and removed
by a set of histone demethylases. Each enzyme is recruited to specific sites on
the chromatin at defined times in each cell’s life history. For the most part, the
initial recruitment depends on transcription regulator proteins (sometimes called
“transcription factors”). As we shall explain in Chapter 7, these proteins recognize
and bind to specific DNA sequences in the chromosomes. They are produced at
CHROMATIN STRUCTURE AND FUNCTION 197

H3
P A A A A
H3
SGRGKQGGKARAKAKTRSSRAGLQFPVGRV H2A
side view 1 5 9 13 15

H4
P M
A A A A A
PEPAKSAPAPKKGSKKAVTKAQKKDGKKRK H2B
5 12 14 15 20 2324

A A
A MA A M
H2B M M P M M M M P M
ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVK H3
H2B H2A 2 4 9 10 14 1718 23 26 2728 36

H2A
H4
H3 A A
P M A A M A M
H2A H3 SGRGKGGKGLGKGGAKRHRKVLRDNIQGIT H4
1 3 5 8 12 16 20

N-terminal tails globular


domains
H2B H2B

H4 KEY: M methylation P phosphorylation A acetylation


H2A
bottom view
(A) (B)

Figure 4–34 The covalent modification of core histone tails. (A) The structure of the nucleosome highlighting the location of
the first 30 amino acids in each of its eight N-terminal histone tails (green). These tails are unstructured and highly mobile, and
thus will change their conformation depending on other bound proteins. (B) Well-documented modifications of the four histone
core proteins are indicated. Although only a single symbol is used here for methylation (M), each lysine (K) or arginine (R) can be
methylated in several different ways. Note also that some positions (e.g., lysine 9 of H3) can be modified either by methylation
or by acetylation, but not both. Most of the modifications shown add a relatively small molecule onto the histone tails; the
exception is ubiquitin, a 76-amino-acid protein also used for other cell processes (see Figure 3–69). Not shown are more than
20 possible modifications located in the globular core of the histones. (A, PDB: 1KX5; B, adapted from H. Santos-Rosa and
C. Caldas, Eur. J. Cancer 41:2381–2402, 2005. With permission from Elsevier.)

different times and places in the life of an organism, thereby determining where
and when the chromatin-modifying enzymes will act. In this way, the DNA
sequence ultimately determines how histones are modified. But in at least some
cases, the covalent modifications on nucleosomesMBoC6 can persist long after the tran-
m4.39/4.32
scription regulator proteins that first induced them have disappeared, thereby
providing the cell with a memory of its developmental history. Most remarkably,
as in the related phenomenon of position effect variegation discussed above, this
memory can be transmitted from one cell generation to the next.
Very different patterns of covalent modification are found on different groups
of nucleosomes, depending both on their exact position in the genome and on
the history of the cell. The modifications of the histones are carefully controlled,
and they have important consequences. The acetylation of lysines on the N-ter-
minal tails loosens chromatin structure, in part because adding an acetyl group
to lysine removes its positive charge, thereby reducing the affinity of the tails for
adjacent nucleosomes. However, the most profound effects of the histone modifi-
cations lie in their ability to recruit specific other proteins to the modified stretch
of chromatin. Trimethylation of one specific lysine on the histone H3 tail, for
instance, attracts the heterochromatin-specific protein HP1 and contributes to
the establishment and spread of heterochromatin. More generally, the recruited
proteins act with the modified histones to determine how and when genes will be
expressed, as well as other chromosome functions. In this way, the precise struc-
ture of each domain of chromatin governs the readout of the genetic information
that it contains, and thereby the structure and function of the eukaryotic cell.
198 Chapter 4: DNA, Chromosomes, and Genomes

histone fold SPECIAL FUNCTION Figure 4–35 The structure of some histone
variants compared with the major histone
that they replace. The histone variants
H3 are inserted into nucleosomes at specific
sites on chromosomes by ATP-dependent
H3.3 transcriptional activation
chromatin remodeling enzymes that act in
concert with histone chaperones (see Figure
4–27). The CENP-A (Centromere Protein-A)
CENP-A centromere function and
kinetochore assembly
variant of histone H3 is discussed later in
loop insert this chapter (see Figure 4–42); other variants
are discussed in Chapter 7. The sequences
in each variant that are colored differently
H2A (compared to the major histone above it)
denote regions with an amino acid sequence
different from this major histone. (Adapted
H2AX DNA repair and
recombination
from K. Sarma and D. Reinberg, Nat. Rev.
Mol. Cell Biol. 6:139–149, 2005. With
H2AZ gene expression, permission from Macmillan Publishers Ltd.)
chromosome segregation

transcriptional repression,
macroH2A
X-chromosome inactivation
histone fold

Chromatin Acquires Additional Variety Through the Site-Specific


Insertion of a Small Set of Histone Variants
In addition to the four highly conserved standard core histones, eukaryotes also
MBoC6 m4.41/4.33
contain a few variant histones that can also assemble into nucleosomes. These
histones are present in much smaller amounts than the major histones, and they
have been less well conserved over long evolutionary times. Variants are known
for each of the core histones with the exception of H4; some examples are shown
in Figure 4–35.
The major histones are synthesized primarily during the S phase of the cell
cycle and assembled into nucleosomes on the daughter DNA helices just behind
the replication fork (see Figure 5–32). In contrast, most histone variants are syn-
thesized throughout interphase. They are often inserted into already-formed
chromatin, which requires a histone-exchange process catalyzed by the ATP-de-
pendent chromatin remodeling complexes discussed previously. These remodel-
ing complexes contain subunits that cause them to bind both to specific sites on
chromatin and to histone chaperones that carry a particular variant. As a result,
each histone variant is inserted into chromatin in a highly selective manner (see
Figure 4–27).

Covalent Modifications and Histone Variants Act in Concert to


Control Chromosome Functions
The number of possible distinct markings on an individual nucleosome is in prin-
ciple enormous, and this potential for diversity is still greater when we allow for
nucleosomes that contain histone variants. However, the histone modifications
are known to occur in coordinated sets. More than 15 such sets can be identified
in mammalian cells. However, it is not yet clear how many different types of chro-
matin are functionally important in cells.
Some combinations are known to have a specific meaning for the cell in the
sense that they determine how and when the DNA packaged in the nucleosomes
is to be accessed or manipulated—a fact that led to the idea of a “histone code.”
For example, one type of marking signals that a stretch of chromatin has been
newly replicated, another signals that the DNA in that chromatin has been dam-
aged and needs repair, while others signal when and how gene expression should
take place. Various regulatory proteins contain small domains that bind to spe-
cific marks, recognizing, for example, a trimethylated lysine 4 on histone H3 (Fig-
ure 4–36). These domains are often linked together as modules in a single large
CHROMATIN STRUCTURE AND FUNCTION 199

CH3

CH3
Zn
H3C N+

Zn
(A) Arg2
Lys4

Thr6

Thr3
N-terminus
Gln5 Ala

(B) (C)

Figure 4–36 How a mark on a nucleosome is read. The figure shows the structure of a protein module (called an ING PHD
domain) that specifically recognizes histone H3 trimethylated on lysine 4. (A) A trimethyl group. (B) Space-filling model of an ING
PHD domain bound to a histone tail (green, with the trimethyl group highlighted in yellow). (C) A ribbon model showing how
the N-terminal six amino acids in the H3 tail are recognized. The red lines represent hydrogen bonds. This is one of a family of
PHD domains that recognize methylated lysines on histones; different members of the family bind tightly to lysines located at
different positions, and they can discriminate between a mono-, di-, and trimethylated lysine. In a similar way, other small protein
modules recognize specific histone side chains that have been marked with acetyl groups, phosphate groups, and so on.
(Adapted from P.V. Peña et al., Nature 442:100–103, 2006. With permission from Macmillan Publishers Ltd.)

protein or protein complex, which thereby recognizes a specific combination of


MBoC6 m4.42/4.34
histone modifications (Figure 4–37). The result is a reader complex that allows
particular combinations of markings on chromatin to attract additional proteins,
so as to execute an appropriate biological function at the right time (Figure 4–38).
The marks on nucleosomes due to covalent additions to histones are dynamic,
being constantly removed and added at rates that depend on their chromosomal
locations. Because the histone tails extend outward from the nucleosome core
and are likely to be accessible even when chromatin is condensed, they would
seem to provide an especially suitable format for creating marks that can be
readily altered as a cell’s needs change. Although much remains to be learned
about the meaning of the different histone modifications, a few well-studied
examples of the information that can be encoded in the histone H3 tail are listed
in Figure 4–39.

A Complex of Reader and Writer Proteins Can Spread Specific


Chromatin Modifications Along a Chromosome H3 tail exit
from core
H4 tail exit
from core

The phenomenon of position effect variegation described previously requires that


some modified forms of chromatin have the ability to spread for substantial dis-
tances along a chromosomal DNA molecule (see Figure 4–31). How is this possi-
ble?
The enzymes that add or remove modifications to histones in nucleosomes
are part of multisubunit complexes. They can initially be brought to a particu-
lar region of chromatin by one of the sequence-specific DNA-binding proteins
(transcription regulators) discussed in Chapters 6 and 7 (for a specific example,

Figure 4–37 Recognition of a specific combination of marks on a


nucleosome. In the example shown, two adjacent domains that are part of
the NURF (Nucleosome Remodeling Factor) chromatin remodeling complex
bind to the nucleosome, with the PHD domain (red) recognizing a methylated
H3 lysine 4 and another domain (a bromodomain, blue) recognizing an
acetylated H4 lysine 16. These two histone marks constitute a unique histone
modification pattern that occurs in subsets of nucleosomes in human cells.
Here the two histone tails are indicated by green dotted lines, and only half
of one nucleosome is shown. (Adapted from A.J. Ruthenburg et al., Cell
145:692–706, 2011. With permission from Elsevier.)
200 Chapter 4: DNA, Chromosomes, and Genomes

protein modules scaffold Figure 4–38 Schematic diagram showing


binding to specific protein how a particular combination of histone
histone modifications modifications can be recognized by a
on nucleosome
reader complex. A large protein complex
that contains a series of protein modules,
each of which recognizes a specific histone
reader mark, is schematically illustrated (green).
complex
This “reader complex” will bind tightly only
to a region of chromatin that contains
several of the different histone marks that
it recognizes. Therefore, only a specific
covalent combination of marks will cause the
modification complex to bind to chromatin and attract
on histone tail the additional protein complexes (purple)
(mark)
needed to catalyze a biological function.
READER PROTEIN
BINDS AND
ATTRACTS OTHER
protein complex with
COMPONENTS
catalytic activities and
additional binding sites

attachment to other components in nucleus,


leading to gene expression, gene silencing,
or other biological function

see Figure 7–20). But after a modifying enzyme “writes” its mark on one or a few
neighboring nucleosomes, events that resemble a chain reaction can ensue. In
such a case, the “writer enzyme” works in concert with a “reader protein” located
in the same protein complex. The reader protein contains a module that recog- Figure 4–39 Some specific meanings
nizes the mark and binds tightly to the newly modified nucleosome (see Figure of histone modifications. (A) The
modifications on the histone H3 N-terminal
(A) tail are shown, repeated from Figure
A A 4–34. (B) The H3 tail can be marked by
M A A A M different sets of modifications that act in
M M P MMBoC6
M m4.43/4.36
M M P M combination to convey a specific meaning.
histone
Only a small number of the meanings
R K KS K RK K RK S K H3
are known, including the three examples
2 4 9 10 14 17 18 23 26 27 28 36
shown. Not illustrated is the fact that, as
just implied (see Figure 4–38), reading a
histone mark generally involves the joint
(B) modification state “meaning” recognition of marks at other sites on the
nucleosome along with the indicated H3
trimethyl tail recognition. In addition, specific levels
M of methylation (mono-, di-, or trimethyl
heterochromatin formation, groups) are generally required. Thus,
K gene silencing
for example, the trimethylation of lysine
9
trimethyl 9 attracts the heterochromatin-specific
M A protein HP1, which induces a spreading
gene expression wave of further lysine 9 trimethylation
K K followed by further HP1 binding, according
4 9 to the general scheme that will be
trimethyl
illustrated shortly (see Figure 4–40). Also
M
gene silencing important in this process, however, is a
K (Polycomb repressive complex) synergistic trimethylation of the histone H4
27 N-terminal tail on lysine 20.
CHROMATIN STRUCTURE AND FUNCTION 201

regulatory protein Figure 4–40 How the recruitment


of a reader–writer complex can
spread chromatin changes along a
chromosome. The writer is an enzyme
that creates a specific modification on one
or more of the four nucleosomal histones.
After its recruitment to a specific site on a
histone-modifying chromosome by a transcription regulatory
enzyme (”writer protein”)
protein, the writer collaborates with a
reader protein to spread its mark from
nucleosome to nucleosome by means of
the indicated reader–writer complex. For
this mechanism to work, the reader must
recognize the same histone modification
mark that the writer produces; its binding
reader protein to that mark can be shown to activate
the writer. In this schematic example, a
spreading wave of chromatin condensation
is thereby induced. Not shown are the
additional proteins involved, including an
ATP-dependent chromatin remodeling
complex required to reposition the modified
nucleosomes.
histone modification (mark) NEW “READER–WRITER”
COMPLEX BINDS

REPEATS

SPREADING WAVE OF CHROMATIN CONDENSATION

4–36), activating an attached writer enzyme and positioning it near an adjacent


nucleosome. Through many such read–write cycles, the reader protein can carry
the writer enzyme along the DNA—spreading the mark in a hand-over-hand man-
ner along the chromosome (Figure 4–40).
In reality, the process is more complicated than the scheme just described.
Both readers and writers are part of a protein complex that is likely to contain
MBoC6
multiple readers and writers, and to m4.447/4.38
require multiple marks on the nucleosome to
spread. Moreover, many of these reader–writer complexes also contain an ATP-de-
pendent chromatin remodeling protein (see Figure 4–26C), and the reader, writer,
and remodeling proteins can work in concert to either decondense or condense
long stretches of chromatin as the reader moves progressively along the nucleo-
some-packaged DNA.
A similar process is used to remove histone modifications from specific regions
of the DNA; in this case, an “eraser enzyme,” such as a histone demethylase or his-
tone deacetylase, is recruited to the complex. As for the writer complex in Figure
4–40, sequence-specific DNA-binding proteins (transcription regulators) direct
where such modifications occur (discussed in Chapter 7).
Some idea of the complexity of the above processes can be derived from the
results of genetic screens for genes that either enhance or suppress the spreading
and stability of heterochromatin, as manifest in effects on position effect varie-
gation in Drosophila (see Figure 4–32). As pointed out previously, more than 100
such genes are known, and most of them are likely to code for subunits in one or
more reader–writer–remodeling protein complexes.
202 Chapter 4: DNA, Chromosomes, and Genomes

Barrier DNA Sequences Block the Spread of Reader–Writer


Complexes and thereby Separate Neighboring Chromatin
Domains
The above mechanism for spreading chromatin structures raises a potential prob-
lem. Inasmuch as each chromosome contains one continuous, very long DNA
molecule, what prevents a cacophony of confusing cross-talk between adjacent
chromatin domains of different structure and function? Early studies of position
effect variegation had suggested an answer: certain DNA sequences mark the
boundaries of chromatin domains and separate one such domain from another
(see Figure 4–31). Several such barrier sequences have now been identified and
characterized through the use of genetic engineering techniques that allow spe-
cific DNA segments to be deleted from, or inserted in, chromosomes.
For example, in cells that are destined to give rise to red blood cells, a sequence
called HS4 normally separates the active chromatin domain that contains the
human β-globin locus from an adjacent region of silenced, condensed chromatin.
If this sequence is deleted, the β-globin locus is invaded by condensed chromatin.
This chromatin silences the genes it covers, and it spreads to a different extent in
different cells, causing position effect variegation similar to that observed in Dro-
sophila. As described in Chapter 7, the consequences are dire: the globin genes
are poorly expressed, and individuals who carry such a deletion have a severe
form of anemia.
In genetic engineering experiments, the HS4 sequence is often added to both
ends of a gene that is to be inserted into a mammalian genome, in order to protect
that gene from the silencing caused by spreading heterochromatin. Analysis of
this barrier sequence reveals that it contains a cluster of binding sites for histone
acetylase enzymes. Since the acetylation of a lysine side chain is incompatible
with the methylation of the same side chain, and specific lysine methylations are
required to spread heterochromatin, histone acetylases are logical candidates for
the formation of DNA barriers to spreading (Figure 4–41). However, several other
types of chromatin modifications are known that can also protect genes from
silencing.

(A) nuclear pore

Figure 4–41 Some mechanisms of


barrier action. These models are derived
from experimental analyses of barrier
action, and a combination of several of
them may function at any one site.
spreading euchromatin (A) The tethering of a region of chromatin to
heterochromatin barrier protein a large fixed site, such as the nuclear pore
complex illustrated here, can form a barrier
(B) that stops the spread of heterochromatin.
(B) The tight binding of barrier proteins to
a group of nucleosomes can make this
chromatin resistant to heterochromatin
spreading. (C) By recruiting a group of
highly active histone-modifying enzymes,
barriers can erase the histone marks that
barrier protein
are required for heterochromatin to spread.
(C) For example, a potent acetylation of lysine
9 on histone H3 will compete with lysine 9
methylation, thereby preventing the binding
of the HP1 protein needed to form a major
form of heterochromatin. (Based on
A.G. West and P. Fraser, Hum. Mol. Genet.
14:R101–R111, 2005. With permission
barrier protein from Oxford University Press.)
CHROMATIN STRUCTURE AND FUNCTION 203

Figure 4–42 A model for the structure


of a simple centromere. (A) In the yeast
Saccharomyces cerevisiae, a special
normal nucleosome with centromeric DNA sequence assembles a
nucleosome centromere-specific single nucleosome in which two copies of
histone H3 an H3 variant histone (called CENP-A in
(A) most organisms) replace the normal H3.
sequence-specific (B) How peptide sequences unique to
DNA-binding protein this variant histone (see Figure 4–35) help
to assemble additional proteins, some
of which form a kinetochore. The yeast
kinetochore is unusual in capturing only
a single microtubule; humans have much
yeast centromeric DNA larger centromeres and form kinetochores
that can capture 20 or more microtubules
(see Figure 4–43). The kinetochore is
discussed in detail in Chapter 17. (Adapted
microtubule from A. Joglekar et al., Nat. Cell Biol.
8:581–585, 2006. With permission from
yeast kinetochore centromere- Macmillan Publishers Ltd.)
specific
(B) nucleosome

The Chromatin in Centromeres Reveals How Histone Variants Can


Create Special Structures
Nucleosomes carrying histone variants have a distinctive character and are
thought to be able to produce marks in chromatin that are unusually long-lasting.
An important example is seen in the formation and inheritance of the specialized
chromatin structure at the centromere, the region of each chromosome required
for attachment to the mitotic spindle and orderly segregation of the duplicated
copies of the genome into daughter cells each time a cell divides. In many com-
plex organisms, including humans, each centromere is embedded in a stretch of
special centromeric chromatin that persists throughout interphase, even though
the centromere-mediated attachment to the spindle and movement of DNA occur
only during mitosis. This chromatin
MBoC6 contains a centromere-specific variant H3
m4.48/4.41
histone, known as CENP-A (Centromere Protein-A; see Figure 4–35), plus addi-
tional proteins that pack the nucleosomes into particularly dense arrangements
and form the kinetochore, the special structure required for attachment of the
mitotic spindle (see Figure 4–19).
A specific DNA sequence of approximately 125 nucleotide pairs is sufficient to
serve as a centromere in the yeast S. cerevisiae. Despite its small size, more than
a dozen different proteins assemble on this DNA sequence; the proteins include
the CENP-A histone H3 variant, which, along with the three other core histones,
forms a centromere-specific nucleosome. The additional proteins at the yeast
centromere attach this nucleosome to a single microtubule from the yeast mitotic
spindle (Figure 4–42).
The centromeres in more complex organisms are considerably larger than
those in budding yeasts. For example, fly and human centromeres extend over
hundreds of thousands of nucleotide pairs and, while they contain CENP-A, they
do not seem to contain a centromere-specific DNA sequence. These centromeres
largely consist of short, repeated DNA sequences, known as alpha satellite DNA
in humans. But the same repeat sequences are also found at other (non-centro-
meric) positions on chromosomes, indicating that they are not sufficient to direct
centromere formation. Most strikingly, in some unusual cases, new human cen-
tromeres (called neocentromeres) have been observed to form spontaneously on
fragmented chromosomes. Some of these new positions were originally euchro-
matic and lack alpha satellite DNA altogether (Figure 4–43). It seems that cen-
tromeres in complex organisms are defined by an assembly of proteins, rather
than by a specific DNA sequence.
Inactivation of some centromeres and genesis of others de novo seem to have
played an essential part in evolution. Different species, even when quite closely
204 Chapter 4: DNA, Chromosomes, and Genomes

higher-order repeat

alpha satellite DNA monomer


(171 nucleotide pairs)

pericentric inactive centromere


active centromere heterochromatin with nonfunctional
alpha satellite DNA neocentromere formed
(A) (B) without alpha satellite DNA

Figure 4–43 Evidence for the plasticity of human centromere formation. (A) A series of A-T-rich alpha satellite DNA
sequences is repeated many thousands of times at each human centromere (red), and is surrounded by pericentric
heterochromatin (brown). However, due to an ancient chromosome breakage-and-rejoining event, some human chromosomes
contain two blocks of alpha satellite DNA, each of which presumably functioned as a centromere in its original chromosome.
Usually, chromosomes with two functional centromeres are not stably propagated because they attach improperly to the
spindle and are broken apart during mitosis. In chromosomes that do survive, however, one of the centromeres has somehow
become inactivated, even though it contains all the necessary DNA sequences. This allows the chromosome to be stably
propagated. (B) In a small fraction (1/2000) of human births, extra chromosomes are observed in cells of the offspring. Some of
these extra chromosomes, which have formed from a breakage event, lack alpha satellite DNA altogether, yet new centromeres
(neocentromeres) have arisen from what was originally euchromatic DNA.
The complexity of centromeric chromatin is not illustrated in these diagrams. The alpha satellite DNA that forms centromeric
chromatin in humans is packaged into alternating blocks of chromatin. One block is formed from a long string of nucleosomes
containing the CENP-A H3 variant histone; the other block contains nucleosomes that are specially marked with dimethyl lysine
4 on the normal H3 histone. Each block is more than a thousand nucleosomes long. This centromeric chromatin is flanked by
pericentric heterochromatin, as shown. The pericentric chromatin contains methylated lysine 9 on its H3 histones, along with
HP1 protein, and it is an example of “classical” heterochromatin (see Figure 4–39).

related, often have different numbers of chromosomes; see Figure 4–14 for an
extreme example. As we shall discuss below, detailed genome comparisons show
that in many cases the changes in chromosome numbers have arisen through
chromosome breakage-and-rejoining events, creating novel chromosomes, some
of which must initially have contained abnormal numbers of centromeres—either
more than one, or none at all. Yet stable inheritance requires that each chromo-
some should contain one centromere, and one only. It seems that surplus cen-
tromeres must have been inactivated, and/or new centromeres created, so as to
allow the rearranged chromosome sets to be stably maintained.
MBoC6 m4.49/4.42
Some Chromatin Structures Can Be Directly Inherited
The changes in centromere activity just discussed, once established, need to be
perpetuated through subsequent cell generations. What could be the mechanism
of this type of epigenetic inheritance?
It has been proposed that de novo centromere formation requires an initial
seeding event, involving the formation of a specialized DNA–protein structure that
contains nucleosomes formed with the CENP-A variant of histone H3. In humans,
this seeding event happens more readily on arrays of alpha satellite DNA than
on other DNA sequences. The H3–H4 tetramers from each nucleosome on the
parental DNA helix are directly inherited by the sister DNA helices at a replication
fork (see Figure 5–32). Therefore, once a set of CENP-A-containing nucleosomes
has been assembled on a stretch of DNA, it is easy to understand how a new cen-
tromere could be generated in the same place on both daughter chromosomes
following each round of cell division. One need only assume that the presence of
the CENP-A histone in an inherited nucleosome selectively recruits more CENP-A
histone to its newly formed neighbors.
There are some striking similarities between the formation and maintenance
of centromeres and the formation and maintenance of some other regions of
CHROMATIN STRUCTURE AND FUNCTION 205

heterochromatin proteins nucleosomes Figure 4–44 How the packaging of


DNA in chromatin can be inherited
histone following chromosome replication.
modification In this model, some of the specialized
chromatin components are distributed
heterochromatin euchromatin to each sister chromosome after DNA
CHROMOSOME duplication, along with the specially marked
DUPLICATION nucleosomes that they bind. After DNA
replication, the inherited nucleosomes that
are specially modified, acting in concert
with the inherited chromatin components,
change the pattern of histone modification
on the newly formed nucleosomes nearby.
This creates new binding sites for the
NEW HETEROCHROMATIN same chromatin components, which then
PROTEINS ADDED TO REGION
WITH MODIFIED HISTONES
assemble to complete the structure. The
latter process is likely to involve reader–
writer–remodeling complexes operating in a
manner similar to that previously illustrated
in Figure 4–40.

heterochromatin euchromatin heterochromatin euchromatin

heterochromatin. In particular, the entire centromere forms as an all-or-none


entity, suggesting that the creation of centromeric chromatin is a highly coop-
erative process, spreading out from an initial seed in a manner reminiscent of
the phenomenon of position effect variegation that we discussed earlier. In both
cases, a particular chromatin structure, once formed, seems to be directly inher-
ited on the DNA following each round of chromosome replication. A cooperative
recruitment of proteins, along with the action of reader–writer complexes, can
thus not only account for the spreading of specific forms of chromatin in space
MBoC6 m4.52/4.44
along the chromosome, but also for its propagation across cell generations—from
parent cell to daughter cell (Figure 4–44).

Experiments with Frog Embryos Suggest that both Activating and


Repressive Chromatin Structures Can Be Inherited Epigenetically
Epigenetic inheritance plays a central part in the creation of multicellular organ-
isms. Their differentiated cell types become established during development, and
persist thereafter even through repeated cell-division cycles. The daughters of a
liver cell persist as liver cells, those of an epidermal cell as epidermal cells, and so
on, even though they all contain the same genome; and this is because distinctive
patterns of gene expression are passed on faithfully from parent cell to daughter
cell. Chromatin structure has a role in this epigenetic transmission of information
from one cell generation to the next.
One type of evidence comes from studies in which the nucleus of a cell from
a frog or tadpole is transplanted into a frog egg whose own nucleus has been
removed (an enucleated egg). In a classic set of experiments performed in 1968,
it was shown that a nucleus taken from a differentiated donor cell can be repro-
grammed in this way to support development of a whole new tadpole (see Figure
7–2). But this reprogramming occurs only with difficulty, and it becomes less and
less efficient as nuclei from older animals are used. Thus, for example, less than
2% of the enucleated eggs injected with a nucleus from a tadpole epithelial cell
developed to the swimming tadpole stage, compared with 35% when the donor
nuclei were taken instead from an early (gastrula-stage) embryo. With new exper-
imental tools, the cause of this resistance to reprogramming can now be traced.
It arises, at least in part, because specific chromatin structures in the original dif-
ferentiated nucleus tend to persist and be transmitted through the many cell-di-
vision cycles required for embryonic development. In experiments with Xenopus
embryos, specific forms of either repressive or active chromatin structures could
be demonstrated to persist through as many as 24 cell divisions, causing the mis-
placed expression of genes. Figure 4–45 briefly describes one such experiment,
206 Chapter 4: DNA, Chromosomes, and Genomes

Figure 4–45 Evidence for the inheritance


somite cells expressing MyoD of a gene-activating chromatin state.
The well-characterized MyoD gene
enucleated egg encodes a master transcription regulatory
protein for muscle, MyoD (see p. 399). This
donor Xenopus
embryo gene is normally turned on in the indicated
region of the young embryo where somites
nuclear transfer form. When a nucleus from this region is
injected into an enucleated egg as shown,
many of the progeny cell nuclei abnormally
express the MyoD protein in non-muscle
regions of the “nuclear transplant embryo”
two-cell stage embryo
that forms. This abnormal expression can
be attributed to maintenance of the MyoD
promoter region in its active chromatin
state through the many cycles of cell
inject normal no injection inject mutant division that produce the blastula-stage
H3.3 mRNA (control) H3.3 mRNA
embryo—a so-called “epigenetic memory”
that persists in this case in the absence
of transcription. The active chromatin
blastula- surrounding the MyoD promoter contains
stage the variant histone H3.3 (see Figure 4–35)
embryos
in a Lys4 methylated form. As indicated,
an overproduction of this histone caused
cells analyzed for MyoD expression and for H3.3 histone on MyoD promoter by injecting excess mRNA encoding the
normal H3.3 protein increases both H3.3
occupancy on the MyoD promoter and
HIGH MyoD MODERATE MyoD LOW MyoD the epigenetic MyoD production, whereas
EPIGENETIC EPIGENETIC EPIGENETIC injection of an mRNA producing a mutant
MEMORY MEMORY MEMORY form of H3.3 that cannot be methylated
(much MyoD (little MyoD
protein protein at Lys4 reduces the epigenetic MyoD
produced) produced) production. Such experiments provide
evidence that an inherited chromatin state
underlies the epigenetic memory observed.
(Adapted from R.K. Ng and J.B. Gurdon,
focused on chromatin containing the histone variant, H3.3. We shall return to Nat. Cell Biol. 10:102–109, 2008. With
these phenomena in the final section of Chapter 22, where we discuss stem cells permission from Macmillan Publishers Ltd.)
and the ways in which one cellMBoC6 n4.105/4.45
type can be converted into another.

Chromatin Structures Are Important for Eukaryotic Chromosome


Function
Although a great deal remains to be learned about the functions of different chro-
matin structures, the packaging of DNA into nucleosomes was probably crucial
for the evolution of eukaryotes like ourselves. To form a complex multicellular
organism, the cells in different lineages must specialize by changing the acces-
sibility and activity of many hundreds of genes. As described in Chapter 21, this
process depends on cell memory: each cell holds a record of its past developmen-
tal history in the regulatory circuits that control its many genes. That record, it
seems, is partly stored in the structure of the chromatin.
Although bacteria also have cell memory mechanisms, the complexity of the
memory circuits in higher eukaryotes is unparalleled. Strategies based on local
variations in chromatin structure, unique to eukaryotes, can enable individual
genes, once they are switched on or switched off, to stay in that state until some
new factor acts to reverse it. At one extreme are structures like centromeric chro-
matin that, once established, are stably inherited from one cell generation to the
next. Likewise, the major “classical” type of heterochromatin, which contains long
arrays of the HP1 protein (see Figure 4–39), can persist stably throughout life. In
contrast, a form of condensed chromatin that is created by the Polycomb group of
proteins serves to silence genes that must be kept inactive in some conditions, but
are active in others. The latter mechanism governs the expression of a large num-
ber of genes that encode transcription regulators important in early embryonic
development, as discussed in Chapter 21. There are many other variant forms of
chromatin, some with much shorter lifetimes, often less than the division time of
the cell. We shall say more about the variety of chromatin types in the next section.
THE GLOBAL STRUCTURE OF CHROMOSOMES 207

Summary
In the chromosomes of eukaryotes, DNA is uniformly assembled into nucleosomes,
but a variety of different chromatin structures is possible. This variety is based on a
large set of reversible covalent modifications of the four histones in the nucleosome
core. These modifications include the mono-, di-, and trimethylation of many differ-
ent lysine side chains, an important reaction that is incompatible with the acetyla-
tion that can occur on the same lysines. Specific combinations of the modifications
mark many nucleosomes, governing their interactions with other proteins. These
marks are read when protein modules that are part of a larger protein complex
bind to the modified nucleosomes in a region of chromatin. These reader proteins
then attract additional proteins that perform various functions.
Some reader protein complexes contain a histone-modifying enzyme, such as a
histone lysine methylase, that “writes” the same mark that the reader recognizes. A
reader–writer–remodeling complex of this type can spread a specific form of chro-
matin along a chromosome. In particular, large regions of condensed heterochro-
matin are thought to be formed in this way. Heterochromatin is commonly found
around centromeres and near telomeres, but it is also present at many other posi-
tions in chromosomes. The tight packaging of DNA into heterochromatin usually
silences the genes within it.
The phenomenon of position effect variegation provides strong evidence for the
inheritance of condensed states of chromatin from one cell generation to the next. A
similar mechanism appears to be responsible for maintaining the specialized chro-
matin at centromeres. More generally, the ability to propagate specific chromatin
structures across cell generations makes possible an epigenetic cell memory process
that plays a role in maintaining the set of different cell states required by complex
multicellular organisms.

THE GLOBAL STRUCTURE OF CHROMOSOMES extended chromatin


in looped domain

Having discussed the DNA and protein molecules from which the chromatin fiber
is made, we now turn to the organization of the chromosome on a more global
scale and the way in which its various domains are arranged in space. As a 30-nm 10 µm
fiber, a typical human chromosome would still be 0.1 cm in length and able to sister
chromatids
span the nucleus more than 100 times. Clearly, there must be a still higher level
of folding, even in interphase chromosomes. Although the molecular details are
still largely a mystery, this higher-order packaging almost certainly involves the
folding of the chromatin into a series of loops and coils. This chromatin packing is
fluid, frequently changing in response to the needs of the cell. less
We begin this section by describing some unusual interphase chromosomes condensed highly
chromatin condensed
that can be easily visualized. Exceptional though they are, these special cases chromatin
reveal features that are thought to be representative of all interphase chromo-
somes. Moreover, they provide ways to investigate some fundamental aspects of
chromatin structure that we have touched on in the previous section. Next, we
describe how a typical interphase chromosome is arranged in the mammalian Figure 4–46 A model for the chromatin
cell nucleus. Finally, we shall discuss the additional tenfold compaction that chro- domains in a lampbrush chromosome.
Shown is a small portion of one pair of
mosomes undergo in the passage from interphase to mitosis. sister chromatids. Here, two identical DNA
MBoC6 n4.126/4.47
double helices are aligned side by side,
Chromosomes Are Folded into Large Loops of Chromatin packaged into different types of chromatin.
The set of lampbrush chromosomes
Insight into the structure of the chromosomes in interphase cells has come from in many amphibians contains a total of
studies of the stiff and enormously extended chromosomes in growing amphib- about 10,000 loops resembling those
shown here. The rest of the DNA in each
ian oocytes (immature eggs). These very unusual lampbrush chromosomes (the chromosome (the great majority) remains
largest chromosomes known), paired in preparation for meiosis, are clearly visi- highly condensed. Four copies of each
ble even in the light microscope, where they are seen to be organized into a series loop are present in the cell, since each
of large chromatin loops emanating from a linear chromosomal axis (Figure 4–46 lampbrush chromosome consists of two
and Figure 4–47). aligned sets of paired chromatids. This
four-stranded structure is characteristic of
In these chromosomes, a given loop always contains the same DNA sequence this stage of development of the oocyte,
that remains extended in the same manner as the oocyte grows. These chromo- which has arrested at the diplotene stage
somes are producing large amounts of RNA for the oocyte, and most of the genes of meiosis; see Figure 17–56.
208 Chapter 4: DNA, Chromosomes, and Genomes

present in the DNA loops are being actively expressed. The majority of the DNA,
however, is not in loops but remains highly condensed on the chromosome axis,
where genes are generally not expressed.
It is thought that the interphase chromosomes of all eukaryotes are similarly
arranged in loops. Although these loops are normally too small and fragile to be
easily observed in a light microscope, other methods can be used to infer their
presence. For example, modern DNA technologies have made it possible to assess
the frequency with which any two loci along an interphase chromosome are held
together, thus revealing likely candidates for the sites on chromatin that form the
bases of loop structures (Figure 4–48). These experiments and others suggest that
the DNA in human chromosomes is likely to be organized into loops of various
lengths. A typical loop might contain between 50,000 and 200,000 nucleotide
pairs of DNA, although loops of a million nucleotide pairs have also been sug-
gested (Figure 4–49).

Polytene Chromosomes Are Uniquely Useful for Visualizing


Chromatin Structures
Further insight has come from another unusual class of cells—the polytene cells of
flies, such as the fruit fly Drosophila. Some types of cells, in many organisms, grow
abnormally large through multiple cycles of DNA synthesis without cell division.
Such cells, containing increased numbers of standard chromosomes, are said to
be polyploid. In the salivary glands of fly larvae, this process is taken to an extreme
degree, creating huge cells that contain hundreds or thousands of copies of the

(A)
100 µm

Figure 4–47 Lampbrush chromosomes.


(A) A light micrograph of lampbrush
chromosomes in an amphibian oocyte.
Early in oocyte differentiation, each
chromosome replicates to begin
meiosis, and the homologous replicated
chromosomes pair to form this highly
extended structure containing a total of
four replicated DNA double helices, or
chromatids. The lampbrush chromosome
stage persists for months or years, while
the oocyte builds up a supply of materials
required for its ultimate development into
a new individual. (B) An enlarged region
of a similar chromosome, stained with a
fluorescent reagent that makes the loops
active in RNA synthesis clearly visible.
(B)
20 µm (Courtesy of Joseph G. Gall.)
THE GLOBAL STRUCTURE OF CHROMOSOMES 209

DNA-binding cross-link
proteins formed

DNA probes used for PCR

TREAT CUT DNA REMOVE


WITH WITH LIGATION CROSS-LINKS TEST FOR JOINED
FORMALDEHYDE RESTRICTION BY HEAT TREATMENT SEGMENTS BY
NUCLEASE AND PROTEOLYSIS PCR

DNA product is obtained


only if proteins hold the
two DNA sequences close
together in the cell
genome. Moreover, in this case, all the copies of each chromosome are aligned
side by side in exact register, like drinking straws in a box, to create giant polytene
chromosomes. These allow features to be detected that are thought to be shared Figure 4–48 A method for determining
with ordinary interphase chromosomes, but are normally hard to see. the position of loops in interphase
chromosomes. In this technique, known
When polytene chromosomes from a fly’s salivary glands are viewed in the
as the chromosome conformation
light microscope, distinct alternating dark bands and light interbands are visible capture (3C) method, cells are treated
(Figure 4–50), each formed from a thousand identical DNA sequences arranged with formaldehyde to create the indicated
side by side in register. About 95% of theMBoC6 DNAm4.56/4.47
in polytene chromosomes is in covalent DNA–protein and DNA–DNA
bands, and 5% is in interbands. A very thin band can contain 3000 nucleotide cross-links. The DNA is then treated with
an enzyme (a restriction nuclease) that
pairs, while a thick band may contain 200,000 nucleotide pairs in each of its chro- chops the DNA into many pieces, cutting
matin strands. The chromatin in each band appears dark because the DNA is more at strictly defined nucleotide sequences
condensed than the DNA in interbands; it may also contain a higher concentra- and forming sets of identical “cohesive
tion of proteins (Figure 4–51). This banding pattern seems to reflect the same sort ends” (see Figure 8–28). The cohesive
of organization detected in the amphibian lampbrush chromosomes described ends can be made to join through their
complementary base-pairing. Importantly,
earlier. prior to the ligation step shown, the DNA
There are approximately 3700 bands and 3700 interbands in the complete set is diluted so that the fragments that have
of Drosophila polytene chromosomes. The bands can be recognized by their dif- been kept in close proximity to each other
ferent thicknesses and spacings, and each one has been given a number to gener- (through cross-linking) are the ones most
ate a chromosome “map” that has been indexed to the finished genome sequence likely to join. Finally, the cross-links are
reversed and the newly ligated fragments
of this fly. of DNA are identified and quantified by
The Drosophila polytene chromosomes provide a good starting point for exam- PCR (the polymerase chain reaction,
ining how chromatin is organized on a large scale. In the previous section, we described in Chapter 8). From the results,
saw that there are many forms of chromatin, each of which contains nucleosomes combined with DNA sequence information,
one can derive models for the interphase
with a different combination of modified histones. Specific sets of non-histone
conformation of chromosomes.
proteins assemble on these nucleosomes to affect biological function in differ-
ent ways. Recruitment of some of these non-histone proteins can spread for long
distances along the DNA, imparting a similar chromatin structure to broad tracts

high-level
looped domain expression
folded of genes
chromatin in loop
fiber
histone-
modifying enzymes
chromatin
remodeling complexes
RNA polymerase

proteins forming chromosome scaffold

Figure 4–49 A model for the organization of an interphase chromosome. A section of an interphase chromosome is shown folded into a series
of looped domains, each containing perhaps 50,000–200,000 or more nucleotide pairs of double-helical DNA condensed into a chromatin fiber.
The chromatin in each individual loop is further condensed through poorly
MBoC6 understood folding processes that are reversed when the cell requires
m4.57/4.48
direct access to the DNA packaged in the loop. Neither the composition of the postulated chromosomal axis nor how the folded chromatin fiber is
anchored to it is clear. However, in mitotic chromosomes, the bases of the chromosomal loops are enriched both in condensins (discussed below)
and in DNA topoisomerase II enzymes (discussed in Chapter 5), two proteins that may form much of the axis at metaphase.
210 Chapter 4: DNA, Chromosomes, and Genomes

Figure 4–50 The entire set of polytene


chromosomes in one Drosophila salivary
cell. In this drawing of a light micrograph,
normal mitotic right arm of the giant chromosomes have been
chromosomes at chromosome 2
same scale spread out for viewing by squashing them
against a microscope slide. Drosophila
region has four chromosomes, and there are four
where two different chromosome pairs present. But
homologous
chromosomes each chromosome is tightly paired with
are separated its homolog (so that each pair appears
left arm of as a single structure), which is not true
chromosome 2 in most nuclei (except in meiosis). Each
chromosome has undergone multiple
X chromosome rounds of replication, and the homologs
and all their duplicates have remained in
exact register with each other, resulting
chromosome 4
in huge chromatin cables many DNA
strands thick.
The four polytene chromosomes
left arm of are normally linked together by
chromocenter chromosome 3 20 µm
heterochromatic regions near their
centromeres that aggregate to create
right arm of a single large chromocenter (pink
chromosome 3
region). In this preparation, however, the
chromocenter has been split into two
halves by the squashing procedure used.
of the genome (see Figure 4–40). Such regions, where all of the chromatin has (Adapted from T.S. Painter, J. Hered.
a similar structure, are separated from neighboring domains by barrier proteins 25:465–476, 1934. With permission from
(see Figure 4–41). At low resolution, the interphase chromosome can therefore Oxford University Press.)
be considered as a mosaic of chromatin structures, each containing particular
nucleosome modifications associated with a particular set of non-histone pro-
MBoC6 m4.58/4.49
teins. Polytene chromosomes allow us to see details of this mosaic of domains in
the light microscope, as well as to observe some of the changes associated with
gene expression.

There Are Multiple Forms of Chromatin


By staining Drosophila polytene chromosomes with antibodies, or by using a
more recent technique called ChIP (chromatin immunoprecipitation) analysis
(see Chapter 8), the locations of the histone and non-histone proteins in chro-
matin can be mapped across the entire DNA sequence of an organism’s genome.
Such an analysis in Drosophila has thus far localized more than 50 different chro-
matin proteins and histone modifications. The results suggest that three major
types of repressive chromatin predominate in this organism, along with two major
types of chromatin on actively transcribed genes, and that each type is associated
with a different complex of non-histone proteins. Thus, classical heterochromatin Figure 4–51 Micrographs of polytene
contains more than six such proteins, including heterochromatin protein 1 (HP1), chromosomes from Drosophila salivary
glands. (A) Light micrograph of a portion of
a chromosome. The DNA has been stained
with a fluorescent dye, but a reverse image
is presented here that renders the DNA
black rather than white; the bands are
clearly seen to be regions of increased
DNA concentration. This chromosome
has been processed by a high-pressure
treatment so as to show its distinct pattern
of bands and interbands more clearly.
(B) An electron micrograph of a small
section of a Drosophila polytene
chromosome seen in thin section. Bands
interbands of very different thickness can be readily
distinguished, separated by interbands,
bands
which contain less condensed chromatin.
(A, adapted from D.V. Novikov, I. Kireev
and A.S. Belmont, Nat. Methods 4:483–
485, 2007. With permission from
(A) (B) Macmillan Publishers Ltd; B, courtesy
2 µm 1 µm of Veikko Sorsa.)
THE GLOBAL STRUCTURE OF CHROMOSOMES 211

Figure 4–52 RNA synthesis in polytene chromosome puffs.


An autoradiograph of a single puff in a polytene chromosome from the RNA synthesis
salivary glands of the freshwater midge Chironomus tentans. As outlined
in Chapter 1 and described in detail in Chapter 6, the first step in gene
expression is the synthesis of an RNA molecule using the DNA as a template.
The decondensed portion of the chromosome is undergoing RNA synthesis
and has become labeled with 3H-uridine, an RNA precursor molecule that is
incorporated into growing RNA chains. (Courtesy of José Bonner.)

whereas the so-called Polycomb form of heterochromatin contains a similar


number of proteins of a different set (PcG proteins). In addition to the five major
chromatin types, other more minor forms of chromatin appear to be present, each
of which may be differently regulated and have distinct roles in the cell. 10 µm

The set of proteins bound as part of the chromatin at a given locus varies
depending on the cell type and its stage of development. These variations make
the accessibility of specific genes different in different tissues, helping to generate
the cell diversification that accompanies embryonic development (described in
Chapter 21).

Chromatin Loops Decondense When the Genes Within Them Are


Expressed
MBoC6 m4.62/4.51
When an insect progresses from one developmental stage to another, distinc-
tive chromosome puffs arise and old puffs recede in its polytene chromosomes
as new genes become expressed and old ones are turned off (Figure 4–52). From
inspection of each puff when it is relatively small and the banding pattern is still
discernible, it seems that most puffs arise from the decondensation of a single
chromosome band.
The individual chromatin fibers that make up a puff can be visualized with
an electron microscope. In favorable cases, loops are seen, much like those
observed in amphibian lampbrush chromosomes. When genes in the loop are
not expressed, the loop assumes a thickened structure, possibly that of a folded
30-nm fiber, but when gene expression is occurring, the loop becomes more
extended. In electron micrographs, the chromatin located on either side of the
decondensed loop appears considerably more compact, suggesting that a loop
constitutes a distinct functional domain of chromatin structure.
Observations in human cells also suggest that highly folded loops of chromatin
expand to occupy an increased volume when a gene within them is expressed. For
example, quiescent chromosome regions from 0.4 to 2 million nucleotide pairs in
length appear as compact dots in an interphase nucleus when visualized by fluo-
rescence microscopy. However, the same DNA is seen to occupy a larger territory
when its genes are expressed, with elongated, punctate structures replacing the
original dot.
New ways of visualizing individual chromosomes have shown that each of the
46 interphase chromosomes in a human cell tends to occupy its own discrete ter-
ritory within the nucleus: that is, the chromosomes are not extensively entangled 10 µm
with one another (Figure 4–53). However, pictures such as these present only
9 11
an average view of the DNA in each chromosome. Experiments that specifically 4 10 19 9
localize the heterochromatic regions of a chromosome reveal that they are often 14
13 21
15
22
3
1
Figure 4–53 Simultaneous visualization of the chromosome territories 8
for all of the human chromosomes in a single interphase nucleus. Here, 3
12 8
a mixture of DNA probes for each chromosome has been labeled so as to 7 6
fluoresce with a different spectra; this allows DNA–DNA hybridization to be X
13
used to detect each chromosome, as in Figure 4–10. Three-dimensional 2 14
12
reconstructions were then produced. Below the micrograph, each
7
chromosome is identified in a schematic of the actual image. Note that
15 17
homologous chromosomes (e.g., the two copies of chromosome 9) are not 6 18
in general co-located. (From M.R. Speicher and N.P. Carter, Nat. Rev. Genet. 21
4 5 20
6:782–792, 2005. With permission from Macmillan Publishers Ltd.) 17
212 Chapter 4: DNA, Chromosomes, and Genomes

Figure 4–54 The distribution of gene-rich regions of the human genome


in an interphase nucleus. Gene-rich regions have been visualized with a
fluorescent probe that hybridizes to the Alu interspersed repeat, which is
present in more than a million copies in the human genome (see page 292).
For unknown reasons, these sequences cluster in chromosomal regions
rich in genes. In this representation, regions enriched for the Alu sequence
are green, regions depleted for these sequences are red, while the average
regions are yellow. The gene-rich regions are seen to be largely absent in
the DNA near the nuclear envelope. (From A. Bolzer et al., PLoS Biol.
3:826–842, 2005.)

5 µm

closely associated with the nuclear lamina, regardless of the chromosome exam-
ined. And DNA probes that preferentially stain gene-rich regions of human chro-
mosomes produce a striking picture of the interphase nucleus that presumably
reflects different average positions for active and inactive genes (Figure 4–54).
How is most of the chromatin in each interphase chromosome condensed
when its genes are not being expressed? A powerful extension of the chromosome
conformation capture method described previously (see Figure 4–48), which
exploits a high-throughput DNA sequencing technology called massive parallel
sequencing (see Panel 8–1, pp. 478–481), allows the connections between all of
the different one-megabase (1 Mb) segments of the human genome to be mapped
in human interphase chromosomes. The results reveal that most regions of our
chromosomes are folded into a conformation referred to as a fractal globule: a
knot-free arrangement that facilitates maximally dense packing while, at the same
time, preserving the ability of the chromatin fiber to unfold and fold (Figure 4–55).

Chromatin Can Move to Specific Sites Within the Nucleus to Alter


Gene Expression
A variety of different types of experiments has led to the conclusion that the
position of a gene in the interior of the nucleus changes when it becomes highly
MBoC6 m4.64/4.53
expressed. Thus, a region that becomes very actively transcribed is sometimes
found to extend out of its chromosome territory, as if in an extended loop (Figure
4–56). We will see in Chapter 6 that the initiation of transcription—the first step in
gene expression—requires the assembly of over 100 proteins, and it makes sense
that this would be facilitated in regions of the nucleus enriched in these proteins.
More generally, it is clear that the nucleus is very heterogeneous, with func-
tionally different regions to which portions of chromosomes can move as they are
subjected to different biochemical processes—such as when their gene expres-
sion changes. It is this issue that we discuss next.
Figure 4–55 A fractal globule model for
~5 megabases interphase chromatin. An extension of
chromosome
the 3C method in Figure 4–48, called Hi-C,
was used to measure the extent to which
each of the three thousand 1 Mb segments
CHROMOSOME FOLDING
IN NUCLEUS in the human genome was located adjacent
to any other of these segments. The
nucleus results support the type of model shown.
In the enlarged fractal globule illustrated,
a region of 5 million base pairs is seen to
fold in a way that keeps regions that are
neighbors along the one-dimensional DNA
helix as neighbors in three dimensions;
this gives rise to monochromatic blocks in
this representation that are obvious both
on the surface and in cross section. The
fractal globule is a knot-free conformation
of the DNA that permits dense packing, yet
retains an ability to easily fold and unfold
any genomic locus. (Adapted from
E. Lieberman-Aiden et al., Science
326:289–293, 2009. With permission
chromosome territory cross section from AAAS.)
THE GLOBAL STRUCTURE OF CHROMOSOMES 213

Figure 4–56 An effect of high levels of


gene expression on the intranuclear
location of chromatin. (A) Fluorescence
micrographs of human nuclei showing
how the position of a gene changes when
it becomes highly transcribed. The region
of the chromosome adjacent to the gene
(red) is seen to leave its chromosomal
territory (green) only when it is highly
active. (B) Schematic representation of
(A) a large loop of chromatin that expands
5 µm when the gene is on, and contracts when
the gene is off. Other genes that are less
nuclear envelope actively expressed can be shown by the
same methods to remain inside their
chromosomal territory when transcribed.
(From J.R. Chubb and W.A. Bickmore, Cell
homologous chromosomes 112:403–406, 2003. With permission from
detected by hybridization Elsevier.)
techniques

specially marked gene

(B) GENE OFF GENE ON

Networks of Macromolecules Form a Set of Distinct Biochemical


Environments inside the Nucleus
In Chapter 6, we shall describe the function of a variety of subcompartments that
are present within the nucleus. The largest and most obvious of these is the nucle-
olus, a structure well known to microscopists even in the nineteenth century (see
Figure 4–9). The nucleolus is the cell’s site of ribosome subunit formation, as well
as the place where many other specialized reactions occur (see Figure 6–42): it
MBoC6 m4.65/4.54
consists of a network of RNAs and proteins concentrated around ribosomal RNA
genes that are being actively transcribed. In eukaryotes, the genome contains
multiple copies of the ribosomal RNA genes, and although they are typically clus-
tered together in a single nucleolus, they are often located on several separate
chromosomes.
A variety of less obvious organelles are also present inside the nucleus. For
example, spherical structures called Cajal bodies and interchromatin granule
clusters are present in most plant and animal cells (Figure 4–57). Like the nucle-
olus, these organelles are composed of selected protein and RNA molecules that
bind together to create networks that are highly permeable to other protein and
RNA molecules in the surrounding nucleoplasm.
Structures such as these can create distinct biochemical environments by
immobilizing select groups of macromolecules, as can other networks of proteins
and RNA molecules associated with nuclear pores and with the nuclear envelope.
In principle, this allows other molecules that enter these spaces to be processed
with great efficiency through complex reaction pathways. Highly permeable, 1 µm
fibrous networks of this sort can thereby impart many of the kinetic advantages of
compartmentalization (see p. 164) to reactions that take place in subregions of the Figure 4–57 Electron micrograph
showing two very common fibrous
nucleus (Figure 4–58A). However, unlike the membrane-bound compartments in nuclear subcompartments. The large
the cytoplasm (discussed in Chapter 12), these nuclear subcompartments—lack- sphere here is a Cajal body. The smaller
ing a lipid bilayer membrane—can neither concentrate nor exclude specific small darker sphere is an interchromatin granule
molecules. cluster, also known as a speckle (see
The cell has a remarkable ability to construct distinct environments to per- also Figure 6–46). These “subnuclear
organelles” are from the nucleus of a
form complex biochemical tasks efficiently. Those that we have mentioned in the Xenopus oocyte. (From K.E. Handwerger
MBoC6 m4.67/4.55
nucleus facilitate various aspects of gene expression, and will be further discussed and J.G. Gall, Trends Cell Biol. 16:19–26,
in Chapter 6. These subcompartments, including the nucleolus, appear to form 2006. With permission from Elsevier.)
214 Chapter 4: DNA, Chromosomes, and Genomes

Figure 4–58 Effective compartmentalization


without a bilayer membrane. (A) Schematic
nuclear illustration of the organization of a spherical
envelope subnuclear organelle (left) and of a postulated
similarly organized subcompartment just
beneath the nuclear envelope (right). In
both cases, RNAs and/or proteins (gray)
associate to form highly porous, gel-like
structures that contain binding sites for other
specific proteins and RNA molecules (colored
(A) objects). (B) How the tethering of a selected
set of proteins and RNA molecules to long
flexible polymer chains, as in (A), can create
“staging areas” that greatly speed the rates of
reactions in subcompartments of the nucleus.
The reactions catalyzed will depend on the
particular macromolecules that are localized
by the tethering. The same strategy for
accelerating complex sets of reactions is also
employed in subcompartments elsewhere in
the cell (see also Figure 3–78).

(B)

only as needed, and they create a high local concentration of the many different
enzymes and RNA molecules needed for a particular process. In an analogous
way, when DNA is damaged by irradiation, the set of enzymes needed to carry out
DNA repair are observed to congregate in discrete foci inside the nucleus, creating
MBoC6 m4.69/4.56
“repair factories” (see Figure 5–52). And nuclei often contain hundreds of discrete
foci representing factories for DNA or RNA synthesis (see Figure 6–47).
It seems likely that all of these entities make use of the type of tethering illus-
trated in Figure 4–58B, where long flexible lengths of polypeptide chain and/or
long noncoding RNA molecules are interspersed with specific binding sites that
concentrate the multiple proteins and other molecules that are needed to catalyze
a particular process. Not surprisingly, tethers are similarly used to help to speed
biological processes in the cytoplasm, increasing specific reaction rates there (for
example, see Figure 16–18).
Is there also an intranuclear framework, analogous to the cytoskeleton, on
which chromosomes and other components of the nucleus are organized? The chromosome
nuclear matrix, or scaffold, has been defined as the insoluble material left in the
nucleus after a series of biochemical extraction steps. Many of the proteins and
RNA molecules that form this insoluble material are likely to be derived from the
fibrous subcompartments of the nucleus just discussed, while others may be pro-
teins that help to form the base of chromosomal loops or to attach chromosomes
to other structures in the nucleus.
centromere
Mitotic Chromosomes Are Especially Highly Condensed
Having discussed the dynamic structure of interphase chromosomes, we now
turn to mitotic chromosomes. The chromosomes from nearly all eukaryotic cells
become readily visible by light microscopy during mitosis, when they coil up to
form highly condensed structures. This condensation reduces the length of a
typical interphase chromosome only about tenfold, but it produces a dramatic
change in chromosome appearance. chromatid
Figure 4–59 depicts a typical mitotic chromosome at the metaphase stage
of mitosis (for the stages of mitosis, see Figure 17–3). The two DNA molecules Figure 4–59 A typical mitotic
produced by DNA replication during interphase of the cell-division cycle are chromosome at metaphase. Each sister
chromatid contains one of two identical
separately folded to produce two sister chromosomes, or sister chromatids, held sister DNA molecules generated earlier in
together at their centromeres, as mentioned earlier. These chromosomes are nor- the cell cycle by DNA replication (see also
mally covered with a variety of molecules, including large amounts of RNA–protein Figure 17–21).

MBoC6 m4.70/4.57
THE GLOBAL STRUCTURE OF CHROMOSOMES 215

Figure 4–60 A scanning electron micrograph of a region near one end


of a typical mitotic chromosome. Each knoblike projection is believed to
represent the tip of a separate looped domain. Note that the two identical
paired chromatids (drawn in Figure 4–59) can be clearly distinguished. chromatid 1
(From M.P. Marsden and U.K. Laemmli, Cell 17:849–858, 1979. With
permission from Elsevier.)

complexes. Once this covering has been stripped away, each chromatid can be
seen in electron micrographs to be organized into loops of chromatin emanating
from a central scaffolding (Figure 4–60). Experiments using DNA hybridization chromatid 2
to detect specific DNA sequences demonstrate that the order of visible features
along a mitotic chromosome at least roughly reflects the order of genes along the
DNA molecule. Mitotic chromosome condensation can thus be thought of as the
final level in the hierarchy of chromosome packaging (Figure 4–61).
The compaction of chromosomes during mitosis is a highly organized and 0.1 µm
dynamic process that serves at least two important purposes. First, when conden-
sation is complete (in metaphase), sister chromatids have been disentangled from
each other and lie side by side. Thus, the sister chromatids can easily separate
when the mitotic apparatus begins pulling them apart. Second, the compaction
of chromosomes protects the relatively fragile DNA molecules from being broken
as they are pulled to separate daughter cells.
The condensation of interphase chromosomes into mitotic chromosomes MBoC6 m4.71/4.58
begins in early M phase, and it is intimately connected with the progression of
the cell cycle. During M phase, gene expression shuts down, and specific mod-
ifications are made to histones that help to reorganize the chromatin as it com-
pacts. Two classes of ring-shaped proteins, called cohesins and condensins, aid
this compaction. How they help to produce the two separately folded chromatids
of a mitotic chromosome will be discussed in Chapter 17, along with the details
of the cell cycle.

short region of 2 nm
DNA double helix

“beads-on-a-string” 11 nm
form of chromatin

chromatin fiber
of packed 30 nm
nucleosomes

chromatin fiber
700 nm
folded into loops

centromere

entire
mitotic 1400 nm
chromosome
Figure 4–61 Chromatin packing. This
NET RESULT: EACH DNA MOLECULE HAS BEEN model shows some of the many levels
PACKAGED INTO A MITOTIC CHROMOSOME THAT of chromatin packing postulated to give
IS 10,000-FOLD SHORTER THAN ITS FULLY rise to the highly condensed mitotic
EXTENDED LENGTH
chromosome.
216 Chapter 4: DNA, Chromosomes, and Genomes

Summary
Chromosomes are generally decondensed during interphase, so that the details
of their structure are difficult to visualize. Notable exceptions are the specialized
lampbrush chromosomes of vertebrate oocytes and the polytene chromosomes in
the giant secretory cells of insects. Studies of these two types of interphase chromo-
somes suggest that each long DNA molecule in a chromosome is divided into a large
number of discrete domains organized as loops of chromatin that are compacted by
further folding. When genes contained in a loop are expressed, the loop unfolds and
allows the cell’s machinery access to the DNA.
Interphase chromosomes occupy discrete territories in the cell nucleus; that is,
they are not extensively intertwined. Euchromatin makes up most of interphase
chromosomes and, when not being transcribed, it probably exists as tightly folded
fibers of compacted nucleosomes. However, euchromatin is interrupted by stretches
of heterochromatin, in which the nucleosomes are subjected to additional packing
that usually renders the DNA resistant to gene expression. Heterochromatin exists in
several forms, some of which are found in large blocks in and around centromeres
and near telomeres. But heterochromatin is also present at many other positions on
chromosomes, where it can serve to help regulate developmentally important genes.
The interior of the nucleus is highly dynamic, with heterochromatin often posi-
tioned near the nuclear envelope and loops of chromatin moving away from their
chromosome territory when genes are very highly expressed. This reflects the exis-
tence of nuclear subcompartments, where different sets of biochemical reactions
are facilitated by an increased concentration of selected proteins and RNAs. The
components involved in forming a subcompartment can self-assemble into discrete
organelles such as nucleoli or Cajal bodies; they can also be tethered to fixed struc-
tures such as the nuclear envelope.
During mitosis, gene expression shuts down and all chromosomes adopt a
highly condensed conformation in a process that begins early in M phase to pack-
age the two DNA molecules of each replicated chromosome as two separately folded
chromatids. The condensation is accompanied by histone modifications that facil-
itate chromatin packing, but satisfactory completion of this orderly process, which
reduces the end-to-end distance of each DNA molecule from its interphase length by
an additional factor of ten, requires additional proteins.

HOW GENOMES EVOLVE


In this final section of the chapter, we provide an overview of some of the ways
that genes and genomes have evolved over time to produce the vast diversity of
modern-day life-forms on our planet. The sequencing of the genomes of thou-
sands of organisms is revolutionizing our view of the process of evolution, uncov-
ering an astonishing wealth of information about not only family relationships
among organisms, but also about the molecular mechanisms by which evolution
has proceeded.
It is perhaps not surprising that genes with similar functions can be found in
a diverse range of living things. But the great revelation of the past 30 years has
been the extent to which the actual nucleotide sequences of many genes have
been conserved. Homologous genes—that is, genes that are similar in both their
nucleotide sequence and function because of a common ancestry—can often be
recognized across vast phylogenetic distances. Unmistakable homologs of many
human genes are present in organisms as diverse as nematode worms, fruit flies,
yeasts, and even bacteria. In many cases, the resemblance is so close that, for
example, the protein-coding portion of a yeast gene can be substituted with its
human homolog—even though humans and yeast are separated by more than a
billion years of evolutionary history.
As emphasized in Chapter 3, the recognition of sequence similarity has
become a major tool for inferring gene and protein function. Although a sequence
match does not guarantee similarity in function, it has proved to be an excellent
clue. Thus, it is often possible to predict the function of genes in humans for which
no biochemical or genetic information is available simply by comparing their
HOW GENOMES EVOLVE 217

nucleotide sequences with the sequences of genes that have been characterized
in other more readily studied organisms.
In general, the sequences of individual genes are much more tightly con-
served than is overall genome structure. Features of genome organization such
as genome size, number of chromosomes, order of genes along chromosomes,
abundance and size of introns, and amount of repetitive DNA are found to differ
greatly when comparing distant organisms, as does the number of genes that each
organism contains.

Genome Comparisons Reveal Functional DNA Sequences by their


Conservation Throughout Evolution
A first obstacle in interpreting the sequence of the 3.2 billion nucleotide pairs in
the human genome is the fact that much of it is probably functionally unimport-
ant. The regions of the genome that code for the amino acid sequences of proteins
(the exons) are typically found in short segments (average size about 145 nucle-
otide pairs), small islands in a sea of DNA whose exact nucleotide sequence is
thought to be mostly of little consequence. This arrangement makes it difficult
to identify all the exons in a stretch of DNA, and it is often hard too to determine
exactly where a gene begins and ends.
One very important approach to deciphering our genome is to search for DNA
sequences that are closely similar between different species, on the principle
that DNA sequences that have a function are much more likely to be conserved
than those without a function. For example, humans and mice are thought to
have diverged from a common mammalian ancestor about 80 × 106 years ago,
which is long enough for the majority of nucleotides in their genomes to have
been changed by random mutational events. Consequently, the only regions that
will have remained closely similar in the two genomes are those in which muta-
tions would have impaired function and put the animals carrying them at a dis-
advantage, resulting in their elimination from the population by natural selection.
Such closely similar pieces of DNA sequence are known as conserved regions. In
addition to revealing those DNA sequences that encode functionally important
exons and RNA molecules, these conserved regions will include regulatory DNA
sequences as well as DNA sequences with functions that are not yet known. In
contrast, most nonconserved regions will reflect DNA whose sequence is much
less likely to be critical for function.
The power of this method can be increased by including in such comparisons
the genomes of large numbers of species whose genomes have been sequenced,
such as rat, chicken, fish, dog, and chimpanzee, as well as mouse and human.
By revealing in this way the results of a very long natural “experiment,” lasting
for hundreds of millions of years, such comparative DNA sequencing studies
have highlighted the most interesting regions in our genome. The comparisons
reveal that roughly 5% of the human genome consists of “multispecies conserved
sequences.” To our great surprise, only about one-third of these sequences code
for proteins (see Table 4–1, p. 184). Many of the remaining conserved sequences
consist of DNA containing clusters of protein-binding sites that are involved in
gene regulation, while others produce RNA molecules that are not translated
into protein but are important for other known purposes. But, even in the most
intensively studied species, the function of the majority of these highly conserved
sequences remains unknown. This remarkable discovery has led scientists to con-
clude that we understand much less about the cell biology of vertebrates than we
had thought. Certainly, there are enormous opportunities for new discoveries,
and we should expect many more surprises ahead.

Genome Alterations Are Caused by Failures of the Normal


Mechanisms for Copying and Maintaining DNA, as well as by
Transposable DNA Elements
Evolution depends on accidents and mistakes followed by nonrandom survival.
Most of the genetic changes that occur result simply from failures in the normal
218 Chapter 4: DNA, Chromosomes, and Genomes

mechanisms by which genomes are copied or repaired when damaged, although


the movement of transposable DNA elements (discussed below) also plays an
important part. As we will explain in Chapter 5, the mechanisms that maintain
DNA sequences are remarkably precise—but they are not perfect. DNA sequences
are inherited with such extraordinary fidelity that typically, along a given line of
descent, only about one nucleotide pair in a thousand is randomly changed in the
germ line every million years. Even so, in a population of 10,000 diploid individu-
als, every possible nucleotide substitution will have been “tried out” on about 20
occasions in the course of a million years—a short span of time in relation to the
evolution of species.
Errors in DNA replication, DNA recombination, or DNA repair can lead either
to simple local changes in DNA sequence—so-called point mutations such as the
substitution of one base pair for another—or to large-scale genome rearrange-
ments such as deletions, duplications, inversions, and translocations of DNA from
one chromosome to another. In addition to these failures of the genetic machin-
ery, genomes contain mobile DNA elements that are an important source of
genomic change (see Table 5–3, p. 267). These transposable DNA elements (trans-
posons) are parasitic DNA sequences that can spread within the genomes they
colonize. In the process, they often disrupt the function or alter the regulation
of existing genes. On occasion, they have created altogether novel genes through
fusions between transposon sequences and segments of existing genes. Over long
periods of evolutionary time, DNA transposition events have profoundly affected
genomes, so much so that nearly half of the DNA in the human genome consists
of recognizable relics of past transposition events (Figure 4–62). Even more of our
genome is thought to have been derived from transpositions that occurred so long
ago (>108 years) that the sequences can no longer be traced to transposons.

The Genome Sequences of Two Species Differ in Proportion to the


Length of Time Since They Have Separately Evolved
The differences between the genomes of species alive today have accumulated
over more than 3 billion years. Although we lack a direct record of changes over
time, scientists can reconstruct the process of genome evolution from detailed
comparisons of the genomes of contemporary organisms.
The basic organizing framework for comparative genomics is the phyloge-
netic tree. A simple example is the tree describing the divergence of humans from
the great apes (Figure 4–63). The primary support for this tree comes from com-
parisons of gene or protein sequences. For example, comparisons between the
sequences of human genes or proteins and those of the great apes typically reveal Figure 4–62 A representation of the
the fewest differences between human and chimpanzee and the most between nucleotide sequence content of the
human and orangutan. sequenced human genome. The LINEs
(long interspersed nuclear elements), SINEs
For closely related organisms such as humans and chimpanzees, it is relatively (short interspersed nuclear elements),
easy to reconstruct the gene sequences of the extinct, last common ancestor of the retroviral-like elements, and DNA-only
two species (Figure 4–64). The close similarity between human and chimpanzee transposons are mobile genetic elements
genes is mainly due to the short time that has been available for the accumulation that have multiplied in our genome by
of mutations in the two diverging lineages, rather than to functional constraints replicating themselves and inserting the
new copies in different positions. These
mobile genetic elements are discussed in
Chapter 5 (see Table 5–3, p. 267). Simple
percentage sequence repeats are short nucleotide
0 10 20 30 40 50 60 70 80 90 100
sequences (less than 14 nucleotide pairs)
that are repeated again and again for long
stretches. Segmental duplications are large
LINEs SINEs introns
blocks of DNA sequence (1000–200,000
nucleotide pairs) that are present at two
retroviral-like elements protein-coding regions
or more locations in the genome. The
DNA-only transposon “fossils” GENES most highly repeated blocks of DNA
TRANSPOSONS in heterochromatin have not yet been
nonrepetitive DNA that is completely sequenced; therefore about
simple sequence repeats neither in introns nor codons
10% of human DNA sequences are not
segmental duplications
represented in this diagram. (Data courtesy
REPEATED SEQUENCES UNIQUE SEQUENCES of E. Margulies.)
HOW GENOMES EVOLVE 219

15 Figure 4–63 A phylogenetic tree


last common ancestor
showing the relationship between
1.5

millions of years before present

percent nucleotide substitution


humans and the great apes based on
nucleotide sequence data. As indicated,
the sequences of the genomes of all four
10 species are estimated to differ from the
1.0
sequence of the genome of their last
common ancestor by a little over 1.5%.
Because changes occur independently
5 on both diverging lineages, pairwise
0.5
comparisons reveal twice the sequence
divergence from the last common
ancestor. For example, human–orangutan
comparisons typically show sequence
0 0.0
human chimpanzee gorilla orangutan divergences of a little over 3%, while
human–chimpanzee comparisons show
divergences of approximately 1.2%.
that have kept the sequences the same. Evidence for this view comes from the (Modified from F.C. Chen and W.H. Li,
observation that the human and chimpanzee genomes are nearly identical even Am. J. Hum. Genet. 68:444–456, 2001.)
where there is no functional constraint on the nucleotide sequence—such as in
the third position of “synonymous” MBoC6codons (codons specifying the same amino
m4.75/4.62
acid but differing in their third nucleotide).
For much less closely related organisms, such as humans and chickens (which
have evolved separately for about 300 million years), the sequence conservation
found in genes is almost entirely due to purifying selection (that is, selection that
eliminates individuals carrying mutations that interfere with important genetic
functions), rather than to an inadequate time for mutations to occur.

Phylogenetic Trees Constructed from a Comparison of DNA


Sequences Trace the Relationships of All Organisms
Phylogenetic trees based on molecular sequence data can be compared with
the fossil record, and we get our best view of evolution by integrating the two Figure 4–64 Tracing the ancestral
approaches. The fossil record remains essential as a source of absolute dates, sequence from a sequence comparison
of the coding regions of human and
gorilla CAA
chimpanzee leptin genes. Reading left
1 Q 60 to right and top to bottom, a continuous
human GTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGG 300-nucleotide segment of a leptin-coding
gene is illustrated. Leptin is a hormone
chimp GTGCCCATCCAAAAAGTCCAGGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGG
that regulates food intake and energy
protein V P I Q K V Q D D T K T L I K T I V T R
utilization in response to the adequacy of
fat reserves. As indicated by the codons
boxed in green, only 5 nucleotides (of
61 K 120 441 total) differ between the two species.
human ATCAATGACATTTCACACACGCAGTCAGTCTCCTCCAAACAGAAAGTCACCGGTTTGGAC
Moreover, in only one of the five positions
chimp ATCAATGACATTTCACACACGCAGTCAGTCTCCTCCAAACAGAAGGTCACCGGTTTGGAC does the difference in nucleotide lead to
protein I N D I S H T O S V S S K Q K V T G L D a difference in the encoded amino acid.
gorilla AAG For each of the five variant nucleotide
positions, the corresponding sequence in
gorilla CCC
P
the gorilla is also indicated. In two cases,
121 180
human TTCATTCCTGGGCTCCACCCCATCCTGACCTTATCCAAGATGGACCAGACACTGGCAGTC the gorilla sequence agrees with the human
sequence, while in three cases it agrees
chimp TTCATTCCTGGGCTCCACCCTATCCTGACCTTATCCAAGATGGACCAGACACTGGCAGTC with the chimpanzee sequence.
protein F I P G L H P I L T L S K M D Q T L A V
What was the sequence of the leptin gene
in the last common ancestor? The most
economical assumption is that evolution
181 V 240 has followed a pathway requiring the
human TACCAACAGATCCTCACCAGTATGCCTTCCAGAAACGTGATCCAAATATCCAACGACCTG minimum number of mutations consistent
chimp TACCAACAGATCCTCACCAGTATGCCTTCCAGAAACATGATCCAAATATCCAACGACCTG
with the data. Thus, it seems likely that
protein Y Q Q I L T S M P S R N M I Q I S N D L the leptin sequence of the last common
gorilla ATG ancestor was the same as the human and
chimpanzee sequences when they agree;
when they disagree, the gorilla sequence
241 D 300
human GAGAACCTCCGGGATCTTCTTCAGGTGCTGGCCTTCTCTAAGAGCTGCCACTTGCCCTGG
would be used as a tiebreaker. For
convenience, only the first 300 nucleotides
chimp GAGAACCTCCGGGACCTTCTTCAGGTGCTGGCCTTCTCTAAGAGCTGCCACTTGCCCTGG of the leptin-coding sequences are given.
protein E N L R D L L H V L A F S K S C H L P W The remaining 141 are identical between
gorilla GAC humans and chimpanzees.
220 Chapter 4: DNA, Chromosomes, and Genomes

exon intron
mouse
GTGCCTATCCAGAAAGTCCAGGATGACACCAAAACCCTCATCAAGACCATTGTCACCAGGATCAATGACATTTCACACACGGTA-GGAGTCTCATGGGGGGACAAAGATGTAGGACTAGA
GTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACGGTAAGGAGAGT-ATGCGGGGACAAA---GTAGAACTGCA
human

mouse
ACCAGAGTCTGAGAAACATGTCATGCACCTCCTAGAAGCTGAGAGTTTAT-AAGCCTCGAGTGTACAT-TATTTCTGGTCATGGCTCTTGTCACTGCTGCCTGCTGAAATACAGGGCTGA
GCCAG--CCC-AGCACTGGCTCCTAGTGGCACTGGACCCAGATAGTCCAAGAAACATTTATTGAACGCCTCCTGAATGCCAGGCACCTACTGGAAGCTGA--GAAGGATTTGAAAGCACA
human

Figure 4–65 The very different rates of evolution of exons and introns, as illustrated by comparing a portion of the
mouse and human leptin genes. Positions where the sequences differ by a single nucleotide substitution are boxed in green,
and positions that differ by the addition or deletion of nucleotides are boxed in yellow. Note that, thanks to purifying selection,
the coding sequence of the exon is much more conserved than is the adjacent intron sequence.

based on radioisotope decay in the rock formations in which


MBoC6 fossils are found.
m4.78/4.65
Because the fossil record has many gaps, however, precise divergence times
between species are difficult to establish, even for species that leave good fossils
with distinctive morphology.
Phylogenetic trees whose timing has been calibrated according to the fos-
sil record suggest that changes in the sequences of particular genes or proteins
tend to occur at a nearly constant rate, although rates that differ from the norm
by as much as twofold are observed in particular lineages. This provides us with a
molecular clock for evolution—or rather a set of molecular clocks corresponding
to different categories of DNA sequence. As in the example in Figure 4–65, the
clock runs most rapidly and regularly in sequences that are not subject to purifying
selection. These include portions of introns that lack splicing or regulatory signals,
the third position in synonymous codons, and genes that have been irreversibly
inactivated by mutation (the so-called pseudogenes). The clock runs most slowly
for sequences that are subject to strong functional constraints—for example, the
amino acid sequences of proteins that engage in specific interactions with large
numbers of other proteins and whose structure is therefore highly constrained,
or the nucleotide sequences that encode the RNA subunits of the ribosome, on
which all protein synthesis depends.
Occasionally, rapid change is seen in a previously highly conserved sequence.
As discussed later in this chapter, such episodes are especially interesting because
they are thought to reflect periods of strong positive selection for mutations that
have conferred a selective advantage in the particular lineage where the rapid
change occurred.
The pace at which molecular clocks run during evolution is determined not
only by the degree of purifying selection, but also by the mutation rate. Most
notably, in animals, although not in plants, clocks based on functionally uncon-
strained mitochondrial DNA sequences run much faster than clocks based on
functionally unconstrained nuclear sequences, because the mutation rate in ani-
mal mitochondria is exceptionally high.
Categories of DNA for which the clock runs fast are most informative for recent
evolutionary events; the mitochondrial DNA clock has been used, for example, to
chronicle the divergence of the Neanderthal lineage from that of modern Homo
sapiens. To study more ancient evolutionary events, one must examine DNA for
which the clock runs more slowly; thus the divergence of the major branches of
the tree of life—bacteria, archaea, and eukaryotes—has been deduced from study
of the sequences specifying ribosomal RNA.
In general, molecular clocks, appropriately chosen, have a finer time resolu-
tion than the fossil record, and they are a more reliable guide to the detailed struc-
ture of phylogenetic trees than are classical methods of tree construction, which
are based on family resemblances in anatomy and embryonic development. For
example, the precise family tree of great apes and humans was not settled until
sufficient molecular sequence data accumulated in the 1980s to produce the ped-
igree shown previously in Figure 4–63. And with huge amounts of DNA sequence
now determined from a wide variety of mammals, much better estimates of our
relationship to them are being obtained (Figure 4–66).
HOW GENOMES EVOLVE 221

opossum Figure 4–66 A phylogenetic tree showing


wallaby the evolutionary relationships of some
armadillo present-day mammals. The length of
ancestor hedgehog each line is proportional to the number of
bat
cat “neutral substitutions”—that is, nucleotide
dog changes at sites where there is assumed
horse to be no purifying selection. (Adapted from
cow
sheep G.M. Cooper et al., Genome Res.
Indian muntjac 15:901–913, 2005. With permission from
pig Cold Spring Harbor Laboratory Press.)
rabbit
rat
mouse
galago
lemur
marmoset
squirrel monkey
vervet
baboon
macaque
orangutan
gorilla
chimpanzee
human

A Comparison of Human and Mouse Chromosomes Shows How


the Structures of Genomes Diverge
As would be expected, the human and chimpanzee genomes are much more
alike than are the human and mouse MBoC6genomes, even though all three genomes
m4.77/4.64
are roughly the same size and contain nearly identical sets of genes. Mouse and
human lineages have had approximately 80 million years to diverge through accu-
mulated mutations, versus 6 million years for humans and chimpanzees. In addi-
tion, as indicated in Figure 4–66, rodent lineages (represented by the rat and the
mouse) have unusually fast molecular clocks, and have diverged from the human
lineage more rapidly than otherwise expected.
While the way that the genome is organized into chromosomes is almost iden-
tical between humans and chimpanzees, this organization has diverged greatly
between humans and mice. According to rough estimates, a total of about 180
breakage-and-rejoining events have occurred in the human and mouse lineages
since these two species last shared a common ancestor. In the process, although
the number of chromosomes is similar in the two species (23 per haploid genome
in the human versus 20 in the mouse), their overall structures differ greatly. None-
theless, even after the extensive genomic shuffling, there are many large blocks
of DNA in which the gene order is the same in the human and the mouse. These
stretches of conserved gene order in chromosomes are referred to as regions of
synteny. Figure 4–67 illustrates how segments of the different mouse chromo-
somes map onto the human chromosome set. For much more distantly related
vertebrates, such as chicken and human, the number of breakage-and-rejoining
events has been much greater and the regions of synteny are much shorter; in
addition, they are often hard to discern because of the divergence of the DNA
sequences that they contain.
An unexpected conclusion from a detailed comparison of the complete mouse
and human genome sequences, confirmed by subsequent comparisons between
the genomes of other vertebrates, is that small blocks of DNA sequence are being
deleted from and added to genomes at a surprisingly rapid rate. Thus, if we
assume that our common ancestor had a genome of human size (about 3.2 billion
nucleotide pairs), mice would have lost a total of about 45% of that genome from
accumulated deletions during the past 80 million years, while humans would
have lost about 25%. However, substantial sequence gains from many small chro-
mosome duplications and from the multiplication of transposons have compen-
sated for these deletions. As a result, our genome size is thought to be practically
unchanged from that of the last common ancestor of humans and mice, while the
mouse genome is smaller by only about 0.3 billion nucleotides.
222 Chapter 4: DNA, Chromosomes, and Genomes

Figure 4–67 Synteny between human


and mouse chromosomes. In this
diagram, the human chromosome set
is shown above, with each part of each
chromosome colored according to the
mouse chromosome with which it is
syntenic. The color coding used for each
mouse chromosome is shown below.
Heterochromatic highly repetitive regions
(such as centromeres) that are difficult to
sequence cannot be mapped in this way;
these are colored black. (Adapted from
E.E. Eichler and D. Sankoff, Science
301:793–797, 2003. With permission
from AAAS.)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
mouse
chromosome
index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X

Good evidence for the loss of DNA sequences in small blocks during evolution
can be obtained from a detailed comparison of regions of synteny in the human
and mouse genomes. The comparative shrinkage of the mouse genome can be
clearly seen from such comparisons, with the net loss of sequences scattered
MBoC6 m4.601/4.66
throughout the long stretches of DNA that are otherwise homologous (Figure
4–68).
DNA is added to genomes both by the spontaneous duplication of chromo-
somal segments that are typically tens of thousands of nucleotide pairs long
(as will be discussed shortly) and by insertion of new copies of active transposons.
Most transposition events are duplicative, because the original copy of the
transposon stays where it was when a copy inserts at the new site; see, for exam-
ple, Figure 5–63. Comparison of the DNA sequences derived from transposons
in the human and the mouse readily reveals some of the sequence additions
(Figure 4–69).
It remains a mystery why all mammals have maintained genome sizes of
roughly 3 billion nucleotide pairs that contain nearly identical sets of genes,
even though only approximately 150 million nucleotide pairs appear to be under
sequence-specific functional constraints.

The Size of a Vertebrate Genome Reflects the Relative Rates of


DNA Addition and DNA Loss in a Lineage Figure 4–68 Comparison of a syntenic
portion of mouse and human genomes.
In more distantly related vertebrates, genome size can vary considerably, appar- About 90% of the two genomes can be
ently without a drastic effect on the organism or its number of genes. Thus, the aligned in this way. Note that while there
chicken genome, at one billion nucleotide pairs, is only about one-third the size is an identical order of the matched index
sequences (red marks), there has been a
net loss of DNA in the mouse lineage that
human chromosome 14 is interspersed throughout the entire region.
This type of net loss is typical for all such
regions, and it accounts for the fact that the
mouse genome contains 14% less DNA than
does the human genome. (Adapted from
Mouse Genome Sequencing Consortium,
mouse chromosome 12 Nature 420:520–562, 2002. With permission
200,000 bases from Macmillan Publishers Ltd.)
HOW GENOMES EVOLVE 223

human β-globin gene cluster Figure 4–69 A comparison of the


G A β-globin gene cluster in the human
and mouse genomes, showing the
ε γ γ δ β
locations of transposable elements. This
stretch of the human genome contains five
functional β-globin-like genes (orange);
the comparable region from the mouse
mouse β-globin gene cluster
genome has only four. The positions of
ε γ βmajor βminor the human Alu sequences are indicated
by green circles, and the human L1
sequences by red circles. The mouse
genome contains different but related
transposable elements: the positions of
10,000 B1 elements (which are related to the
nucleotide pairs
human Alu sequences) are indicated by
blue triangles, and the positions of the
mouse L1 elements (which are related to
of the mammalian genome. An extreme example is the puffer fish, Fugu rubripes the human L1 sequences) are indicated
by orange triangles. The absence of
(Figure 4–70), which has a tiny genome for a vertebrate (0.4 billion nucleotide transposable elements from the globin
pairs compared to 1 billion or more for many other fish). The small size of the Fugu structural genes can be attributed to
genome is largely dueMBoC6 m4.80/4.68
to the small size of its introns. Specifically, Fugu introns, as purifying selection, which would have
well as other noncoding segments of the Fugu genome, lack the repetitive DNA eliminated any insertion that compromised
that makes up a large portion of the genomes of most well-studied vertebrates. gene function. (Courtesy of Ross Hardison
and Webb Miller.)
Nevertheless, the positions of the Fugu introns between the exons of each gene
are almost the same as in mammalian genomes (Figure 4–71).
While initially a mystery, we now have a simple explanation for such large dif-
ferences in genome size between similar organisms: because all vertebrates expe-
rience a continuous process of DNA loss and DNA addition, the size of a genome
merely depends on the balance between these opposing processes acting over
millions of years. Suppose, for example, that in the lineage leading to Fugu, the
rate of DNA addition happened to slow greatly. Over long periods of time, this
would result in a major “cleansing” from this fish genome of those DNA sequences
whose loss could be tolerated. The result is an unusually compact genome, rela-
tively free of junk and clutter, but retaining through purifying selection the ver-
tebrate DNA sequences that are functionally important. This makes Fugu, with
its 400 million nucleotide pairs of DNA, a valuable resource for genome research
aimed at understanding humans.

We Can Infer the Sequence of Some Ancient Genomes


The genomes of ancestral organisms can be inferred, but most can never be
directly observed. DNA is very stable compared with most organic molecules,
but it is not perfectly stable, and its progressive degradation, even under the best
circumstances, means that it is virtually impossible to extract sequence infor-
mation from fossils that are more than a million years old. Although a modern
organism such as the horseshoe crab looks remarkably similar to fossil ancestors
that lived 200 million years ago, there is every reason to believe that the horse-
shoe-crab genome has been changing during all that time in much the same way
as in other evolutionary lineages, and at a similar rate. Selection must have main-
tained key functional properties of the horseshoe-crab genome to account for the
morphological stability of the lineage. However, comparisons between different
present-day organisms show that the fraction of the genome subject to purifying
selection is small; hence, it is fair to assume that the genome of the modern horse-
shoe crab, while preserving features critical for function, must differ greatly from
that of its extinct ancestors, known to us only through the fossil record.
It is possible to get direct sequence information by examining DNA samples
from ancient materials if these are not too old. In recent years, technical advances
have allowed DNA sequencing from exceptionally well-preserved bone fragments
that date from more than 100,000 years ago. Although any DNA this old will be
imperfectly preserved, a sequence of the Neanderthal genome has been recon-
structed from many millions of short DNA sequences, revealing—among other Figure 4–70 The puffer fish, Fugu
things—that our human ancestors interbred with Neanderthals in Europe and rubripes. (Courtesy of Byrappa Venkatesh.)
224 Chapter 4: DNA, Chromosomes, and Genomes

Figure 4–71 Comparison of the


human gene
genomic sequences of the human
and Fugu genes encoding the protein
huntingtin. Both genes (indicated in red)
contain 67 short exons that align in 1:1
correspondence to one another; these
exons are connected by curved lines.
The human gene is 7.5 times larger than
the Fugu gene (180,000 versus 24,000
nucleotide pairs). The size difference is
entirely due to larger introns in the human
gene. The larger size of the human
introns is due in part to the presence of
retrotransposons (discussed in Chapter
Fugu gene 5), whose positions are represented by
green vertical lines; the Fugu introns lack
0.0 100.0 180.0 retrotransposons. In humans, mutation of
thousands of nucleotide pairs the huntingtin gene causes Huntington’s
disease, an inherited neurodegenerative
disorder. (Adapted from S. Baxendale et
that modern humans have inherited specific genes from them (Figure 4–72). The al., Nat. Genet. 10:67–76, 1995. With
average difference in DNA sequence between humans and Neanderthals shows permission from Macmillan Publishers Ltd.)
that our two lineages diverged somewhere between 270,000 and 440,000 years
ago, well before the time that humans are believed to have migrated out of Africa.
But what about deciphering them4.82/4.70
MBoC6 genomes of much older ancestors, those for
which no useful DNA samples can be isolated? For organisms that are as closely
related as human and chimpanzee, we saw that this may not be difficult: reference
to the gorilla sequence can be used to sort out which of the few sequence differ-
ences between human and chimpanzee are inherited from our common ancestor
some 6 million years ago (see Figure 4–64). And for an ancestor that has produced
a large number of different organisms alive today, the DNA sequences of many
species can be compared simultaneously to unscramble much of the ancestral
sequence, allowing scientists to derive DNA sequences much farther back in time.
For example, from the genome sequences currently being obtained for dozens of
modern placental mammals, it should be possible to infer much of the genome
sequence of their 100 million-year-old common ancestor—the precursor of spe-
cies as diverse as dog, mouse, rabbit, armadillo, and human (see Figure 4–66). Figure 4–72 The Neanderthals. (A) Map
of Europe showing the location of the
Multispecies Sequence Comparisons Identify Conserved DNA cave in Croatia where most of the bones
used to isolate the DNA used to derive
Sequences of Unknown Function the Neanderthal genome sequence were
discovered. (B) Photograph of the Vindija
The mass of DNA sequence now in databases (hundreds of billions of nucleotide cave. (C) Photograph of the 38,000-year-
pairs) provides a rich resource that scientists can mine for many purposes. This old bones from Vindija. More recent
information can be used not only to unscramble the evolutionary pathways that studies have succeeded in extracting
have led to modern organisms, but also to provide insights into how cells and DNA sequence information from hominid
organisms function. Perhaps the most remarkable discovery in this realm comes remains that are considerably older (see
Movie 8.3). (B, courtesy of Johannes
from the observation that a striking amount of DNA sequence that does not code Krause; C, from R.E. Green et al., Science
for protein has been conserved during mammalian evolution (see Table 4–1, 328: 710–722, 2010. Reprinted with
p. 184). This is most clearly revealed when we align and compare DNA synteny permission from AAAS.)

cave in
Vindija, Croatia

(A) (B) (C)


5 cm
HOW GENOMES EVOLVE 225

blocks from many different species, thereby identifying large numbers of so-called
multispecies conserved sequences: some of these code for protein, but most of
them do not (Figure 4–73).
Most of the noncoding conserved sequences discovered in this way turn out
to be relatively short, containing between 50 and 200 nucleotide pairs. Among the
most mysterious are the so-called “ultraconserved” noncoding sequences, exem-
plified by more than 5000 DNA segments over 100 nucleotides long that are exactly
the same in human, mouse, and rat. Most have undergone little or no change
since mammalian and bird ancestors diverged about 300 million years ago. The
strict conservation implies that even though the sequences do not encode pro-
teins, each nevertheless has an important function maintained by purifying selec-
tion. The puzzle is to unravel what those functions are.
Many of the conserved sequences that do not code for protein are now known
to produce untranslated RNA molecules, such as the thousands of long noncoding
RNAs (lncRNAs) that are thought to have important functions in regulating gene
transcription. As we shall also see in Chapter 7, others are short regions of DNA
scattered throughout the genome that directly bind proteins involved in gene reg-
ulation. But it is uncertain how much of the conserved noncoding DNA can be
accounted for in these ways, and the function of most of it remains a mystery. This
enigma highlights how much more we need to learn about the fundamental bio-
logical mechanisms that operate in animals and other complex organisms, and its
solution is certain to have profound consequences for medicine.
How can cell biologists tackle the mystery of noncoding conserved DNA? Tra-
ditionally, attempts to determine the function of a puzzling DNA sequence begin
by looking at the consequences of its experimental disruption. But many DNA
sequences that are crucial for an organism in the wild can be expected to have no
noticeable effect on its phenotype under laboratory conditions: what is required
for a mouse to survive in a laboratory cage is very much less than what is required

human CFTR gene (cystic fibrosis transmembrane conductance regulator)

190,000 nucleotide pairs

5′ 3′

intron exon multispecies conserved sequences

100%
chimpanzee 50% Figure 4–73 The detection of
orangutan multispecies conserved sequences.
In this example, genome sequences
baboon
for each of the organisms shown have
marmoset been compared with the indicated region
of the human CFTR (cystic fibrosis
lemur
transmembrane conductance regulator)
rabbit gene; this region contains one exon plus
percent a large amount of intronic DNA. For each
horse identity organism, the percent identity with human
cat for each 25-nucleotide block is plotted
in green. In addition, a computational
dog algorithm has been used to detect the
mouse
sequences within this region that are most
highly conserved when the sequences from
opossum all of the organisms are taken into account.
Besides the exon (dark blue on the line at
chicken
the top of the figure), the positions of three
Fugu 100% other blocks of multispecies conserved
50%
sequences are indicated (pale blue). The
100 nucleotide pairs function of most such sequences in the
human genome is not known. (Courtesy of
10,000 nucleotide pairs Eric D. Green.)
226 Chapter 4: DNA, Chromosomes, and Genomes

for it to succeed in nature. Moreover, calculations based on population genetics


reveal that just a tiny selective advantage—less than a 0.1% difference in sur-
vival—can be enough to strongly favor retaining a particular DNA sequence over
evolutionary time spans. One should therefore not be surprised to find that many
DNA sequences that are ultraconserved can be deleted from the mouse genome
without any noticeable effect on that mouse in a laboratory.
A second important approach for discovering the function of a mysterious
noncoding DNA sequence uses biochemical techniques to identify proteins or
RNA molecules that bind to it—and/or to any RNA molecules that it produces.
Most of this task still lies before us, but a start has been made (see p. 435).

Changes in Previously Conserved Sequences Can Help Decipher


Critical Steps in Evolution
Given genome sequence information, we can tackle another intriguing question:
What alterations in our DNA have made humans so different from other ani-
mals—or for that matter, what makes any individual species so different from its
relatives? For example, as soon as both the human and the chimpanzee genome
sequences became available, scientists began searching for DNA sequence
changes that might account for the striking differences between us and chim-
panzees. With 3.2 billion nucleotide pairs to compare in the two species, this
might seem an impossible task. But the job was made much easier by confining
the search to 35,000 clearly defined multispecies conserved sequences (a total
of about 5 million nucleotide pairs), representing parts of the genome that are
most likely to be functionally important. Though these sequences are conserved
strongly, they are not conserved perfectly, and when the version in one species is
compared with that in another they are generally found to have drifted apart by
a small amount corresponding simply to the time elapsed since the last common
ancestor. In a small proportion of cases, however, one sees signs of a sudden evo-
lutionary spurt. For example, some DNA sequences that have been highly con-
served in other mammalian species are found to have accumulated nucleotide
changes exceptionally rapidly during the 6 million years of human evolution since
we diverged from the chimpanzees. These human accelerated regions (HARs) are
thought to reflect functions that have been especially important in making us dif-
ferent in some useful way.
About 50 such sites were identified in one study, one-fourth of which were
located near genes associated with neural development. The sequence exhibiting
the most rapid change (18 changes between human and chimpanzee, compared
to only two changes between chimpanzee and chicken) was examined further
and found to encode a 118-nucleotide noncoding RNA molecule, HAR1F (human
accelerated region 1F), that is produced in the human cerebral cortex at a critical
time during brain development. The function of this HAR1F RNA is not yet known,
but findings of this type are stimulating research studies that may shed light on
crucial features of the human brain.
A related approach in the search for the important mutations that contributed
to human evolution likewise begins with DNA sequences that have been con-
served during mammalian evolution, but rather than screening for accelerated
changes in individual nucleotides, it focuses instead on chromosome sites that
have experienced deletions in the 6 million years since our lineage diverged from
that of chimpanzees. More than 500 such sequences—conserved among other
species but deleted in humans—have been discovered. Each deletion removes an
average of 95 nucleotides of DNA sequence. Only one of these deletions affects a
protein-coding region: the rest are thought to alter regions that affect how nearby
genes are expressed, an expectation that has been experimentally confirmed
in a few cases. A large proportion of the presumed regulatory regions identified
in this way lie near genes that affect neural function and/or near genes involved in
steroid signaling, suggesting that changes in the nervous system and in immune
or reproductive functions have played an especially important role in human
evolution.
HOW GENOMES EVOLVE 227

Mutations in the DNA Sequences That Control Gene Expression


Have Driven Many of the Evolutionary Changes in Vertebrates
The vast hoard of genomic sequence data now being accumulated can be explored
in many other ways to reveal events that happened even hundreds of millions of
years ago. For example, one can attempt to trace the origins of the regulatory ele-
ments in DNA that have played critical parts in vertebrate evolution. One such
study began with the identification of nearly 3 million noncoding sequences, aver-
aging 28 base pairs in length, that have been conserved in recent vertebrate evolu-
tion while being absent in more ancient ancestors. Each of these special non-cod-
ing sequences is likely to represent a functional innovation peculiar to a particular
branch of the vertebrate family tree, and most of them are thought to consist of
regulatory DNA that governs the expression of a neighboring gene. Given full
genome sequences, one can identify the genes that lie closest and thus appear
most likely to have fallen under the sway of these novel regulatory elements. By
comparing many different species, with known divergence times, one can also
estimate when each such regulatory element came into existence as a conserved
feature. The findings suggest remarkable evolutionary differences between the
various functional classes of genes (Figure 4–74). Conserved regulatory elements
that originated early in vertebrate evolution—that is, more than about 300 million
years ago, which is when the mammalian lineage split from the lineage leading to
birds and reptiles—seem to be mostly associated with genes that code for tran-
scription regulator proteins and for proteins with roles in organizing embryonic
development. Then came an era when the regulatory DNA innovations arose next
to genes coding for receptors for extracellular signals. Finally, over the course of
the past 100 million years, the regulatory innovations seem to have been concen-
trated in the neighborhood of genes coding for proteins (such as protein kinases)
that function to modify other proteins post-translationally.
Many questions remain to be answered about these phenomena and what
they mean. One possible interpretation is that the logic—the circuit diagram—of
the gene regulatory network in vertebrates was established early, and that more
recent evolutionary change has mainly occurred through the tuning of quantita-
tive parameters. This could help to explain why, among the mammals, for exam-
ple, the basic body plan—the topology of the tissues and organs—has been largely
conserved.

Gene Duplication Also Provides an Important Source of Genetic


Novelty During Evolution
Evolution depends on the creation of new genes, as well as on the modification
of those that already exist. How does this occur? When we compare organisms
that seem very different—a primate with a rodent, for example, or a mouse with
a fish—we rarely encounter genes in the one species that have no homolog in the

reception of
extracellular signals
HUMAN
MOUSE
Figure 4–74 The types of changes
COW in gene regulation inferred to have
predominated during the evolution of
PLATYPUS our vertebrate ancestors. To produce
CHICKEN
the information summarized in this plot,
wherever possible the type of gene
development and FROG regulated by each conserved noncoding
transcription sequence was inferred from the identity of
FISH
regulation its closest protein-coding gene. The fixation
post-translational time for each conserved sequence was
protein then used to derive the conclusions shown.
modification (Based on C.B. Lowe et al., Science
500 400 300 200 100 0 333:1019–1024, 2011. With permission
millions of years before present from AAAS.)
228 Chapter 4: DNA, Chromosomes, and Genomes

other. Genes without homologous counterparts are relatively scarce even when
we compare such divergent organisms as a mammal and a worm. On the other
hand, we frequently find gene families that have different numbers of members in
different species. To create such families, genes have been repeatedly duplicated,
and the copies have then diverged to take on new functions that often vary from
one species to another.
Gene duplication occurs at high rates in all evolutionary lineages, contributing
to the vigorous process of DNA addition discussed previously. In a detailed study
of spontaneous duplications in yeast, duplications of 50,000 to 250,000 nucleotide
pairs were commonly observed, most of which were tandemly repeated. These
appeared to result from DNA replication errors that led to the inexact repair of
double-strand chromosome breaks. A comparison of the human and chimpanzee
genomes reveals that, since the time that these two organisms diverged, such seg-
mental duplications have added about 5 million nucleotide pairs to each genome
every million years, with an average duplication size being about 50,000 nucleo-
tide pairs (although there are some duplications five times larger). In fact, if one
counts nucleotides, duplication events have created more differences between
our two species than have single-nucleotide substitutions.

Duplicated Genes Diverge


What is the fate of newly duplicated genes? In most cases, there is presumed to
be little or no selection—at least initially—to maintain the duplicated state since
either copy can provide an equivalent function. Hence, many duplication events
are likely to be followed by loss-of-function mutations in one or the other gene.
This cycle would functionally restore the one-gene state that preceded the duplica-
tion. Indeed, there are many examples in contemporary genomes where one copy
of a duplicated gene can be seen to have become irreversibly inactivated by mul-
tiple mutations. Over time, the sequence similarity between such a pseudogene
and the functional gene whose duplication produced it would be expected to be
eroded by the accumulation of many mutations in the pseudogene—the homolo-
gous relationship eventually becoming undetectable.
An alternative fate for gene duplications is for both copies to remain func-
tional, while diverging in their sequence and pattern of expression, thus taking
on different roles. This process of “duplication and divergence” almost certainly
explains the presence of large families of genes with related functions in biolog-
ically complex organisms, and it is thought to play a critical role in the evolution
of increased biological complexity. An examination of many different eukaryotic
genomes suggests that the probability that any particular gene will undergo a
duplication event that spreads to most or all individuals in a species is approxi-
mately 1 percent every million years.
Whole-genome duplications offer particularly dramatic examples of the dupli-
cation–divergence cycle. A whole-genome duplication can occur quite simply: all
that is required is one round of genome replication in a germ-line cell lineage
without a corresponding cell division. Initially, the chromosome number simply
doubles. Such abrupt increases in the ploidy of an organism are common, par-
ticularly in fungi and plants. After a whole-genome duplication, all genes exist
as duplicate copies. However, unless the duplication event occurred so recently
that there has been little time for subsequent alterations in genome structure,
the results of a series of segmental duplications—occurring at different times—
are hard to distinguish from the end product of a whole-genome duplication. In
mammals, for example, the role of whole-genome duplications versus a series of
piecemeal duplications of DNA segments is quite uncertain. Nevertheless, it is
clear that a great deal of gene duplication has occurred in the distant past.
Analysis of the genome of the zebrafish, in which at least one whole-genome
duplication is thought to have occurred hundreds of millions of years ago, has cast
some light on the process of gene duplication and divergence. Although many
duplicates of zebrafish genes appear to have been lost by mutation, a significant
fraction—perhaps as many as 30–50%—have diverged functionally while both
HOW GENOMES EVOLVE 229

Figure 4–75 A comparison of the structure of one-chain and four-chain single-chain globin binds
globins. The four-chain globin shown is hemoglobin, which is a complex of one oxygen molecule
two α-globin and two β-globin chains. The one-chain globin present in some
primitive vertebrates represents an intermediate in the evolution of the four-chain
globin. With oxygen bound it exists as a monomer; without oxygen it dimerizes.

copies have remained active. In many cases, the most obvious functional differ-
ence between the duplicated genes is that they are expressed in different tissues
or at different stages of development. One attractive theory to explain such an end oxygen-
result imagines that different, mildly deleterious mutations occur quickly in both binding site
copies of a duplicated gene set. For example, one copy might lose expression in on heme
EVOLUTION OF A
a particular tissue as a result of a regulatory mutation, while the other copy loses SECOND GLOBIN
CHAIN BY
expression in a second tissue. Following such an occurrence, both gene copies GENE DUPLICATION
would be required to provide the full range of functions that were once supplied FOLLOWED BY
MUTATION
by a single gene; hence, both copies would now be protected from loss through
inactivating mutations. Over a longer period, each copy could then undergo fur- β
ther changes through which it could acquire new, specialized features. β

The Evolution of the Globin Gene Family Shows How DNA


Duplications Contribute to the Evolution of Organisms
The globin gene family provides an especially good example of how DNA dupli-
cation generates new proteins, because its evolutionary history has been worked
out particularly well. The unmistakable similarities in amino acid sequence and
structure among the present-day globins indicate that they all must derive from a
common ancestral gene, even though some are now encoded by widely separated α α
genes in the mammalian genome.
We can reconstruct some of the past events that produced the various types four-chain globin binds four
of oxygen-carrying hemoglobin molecules by considering the different forms of oxygen molecules in a
cooperative manner
the protein in organisms at different positions on the tree of life. A molecule like
hemoglobin was necessary to allow multicellular animals to grow to a large size,
since large animals cannot simply rely on the diffusion of oxygen through the
body surface to oxygenate their tissues adequately. But oxygen plays a vital part in
the life of nearly all living organisms, and oxygen-binding proteins homologous to
hemoglobin can be recognized even in plants, fungi, and bacteria. In animals, the chromosome chromosome
most primitive oxygen-carrying molecule is a globin polypeptide chain of about 16 11
150 amino acids that is found in many marine worms, insects, and primitive fish. various
a genes e g G gA d b
The hemoglobin molecule in more complex vertebrates, however, is composed of MBoC6 m4.86/4.74
two kinds of globin chains. It appears that about 500 million years ago, during the
continuing evolution of fish, a series of gene mutations and duplications occurred.
These events established two slightly different globin genes in the genome of each
100
individual, coding for α- and β-globin chains that associate to form a hemoglobin adult
fetal b
millions of years ago

molecule consisting of two α chains and two β chains (Figure 4–75). The four oxy- b
gen-binding sites in the α2β2 molecule interact, allowing a cooperative allosteric 300
change in the molecule as it binds and releases oxygen, which enables hemoglo- a b
bin to take up and release oxygen more efficiently than the single-chain version.
Still later, during the evolution of mammals, the β-chain gene apparently 500
translocation
underwent duplication and mutation to give rise to a second β-like chain that separating a single-chain
is synthesized specifically in the fetus. The resulting hemoglobin molecule has a and b genes globin
higher affinity for oxygen than adult hemoglobin and thus helps in the transfer 700
of oxygen from the mother to the fetus. The gene for the new β-like chain subse-
Figure 4–76 An evolutionary scheme
quently duplicated and mutated again to produce two new genes, ε and γ, the ε
for the globin chains that carry oxygen
chain being produced earlier in development (to form α2ε2) than the fetal γ chain, in the blood of animals. The scheme
which forms α2γ2. A duplication of the adult β-chain gene occurred still later, emphasizes the β-like globin gene family.
during primate evolution, to give rise to a δ-globin gene and thus to a minor form A relatively recent gene duplication of the
of hemoglobin (α2δ2) that is found only in adult primates (Figure 4–76). γ-chain gene produced γG and γA, which
are fetal β-like chains of identical function.
Each of these duplicated genes has been modified by point mutations that The location of the globin genes in the
affect the properties of the final hemoglobin molecule, as well as by changes in human genome is shown at the top of
regulatory regions that determine the timing and level of expression of the gene. the figure.
230 Chapter 4: DNA, Chromosomes, and Genomes

As a result, each globin is made in different amounts at different times of human


development.
The history of these gene duplications is reflected in the arrangement of hemo-
globin genes in the genome. In the human genome, the genes that arose from the
original β gene are arranged as a series of homologous DNA sequences located
within 50,000 nucleotide pairs of one another on a single chromosome. A similar
cluster of human α-globin genes is located on a separate chromosome. Not only
other mammals, but birds too have their α- and β-globin gene clusters on sepa-
rate chromosomes. In the frog Xenopus, however, they are together, suggesting
that a chromosome translocation event in the lineage of birds and mammals sep-
arated the two gene clusters about 300 million years ago, soon after our ancestors
diverged from amphibians (see Figure 4–76).
There are several duplicated globin DNA sequences in the α- and β-globin
gene clusters that are not functional genes but pseudogenes. These have a close
sequence similarity to the functional genes but have been disabled by muta-
tions that prevent their expression as functional proteins. The existence of such
pseudogenes makes it clear that, as expected, not every DNA duplication leads to
a new functional gene.

Genes Encoding New Proteins Can Be Created by the


Recombination of Exons
The role of DNA duplication in evolution is not confined to the expansion of
gene families. It can also act on a smaller scale to create single genes by string-
ing together short duplicated segments of DNA. The proteins encoded by genes
generated in this way can be recognized by the presence of repeating similar pro-
tein domains, which are covalently linked to one another in series. The immu-
noglobulins (Figure 4–77), for example, as well as most fibrous proteins (such as
collagens) are encoded by genes that have evolved by repeated duplications of a
primordial DNA sequence.
In genes that have evolved in this way, as well as in many other genes, each
separate exon often encodes an individual protein folding unit, or domain. It is
believed that the organization of DNA coding sequences as a series of such exons
separated by long introns has greatly facilitated the evolution of new proteins. The
duplications necessary to form a single gene coding for a protein with repeating
domains, for example, can easily occur by breaking and rejoining the DNA any-
where in the long introns on either side of an exon; without introns there would be
only a few sites in the original gene at which a recombinational exchange between
DNA molecules could duplicate the domain and not disrupt it. By enabling the
duplication to occur by recombination at many potential sites rather than just a
few, introns increase the probability of a favorable duplication event.
More generally, we know from genome sequences that the various parts of
genes—both their individual exons and their regulatory elements—have served
as modular elements that have been duplicated and moved about the genome
to create the great diversity of living things. Thus, for example, many present-day
proteins are formed as a patchwork of domains from different origins, reflecting heavy chain
their complex evolutionary history (see Figure 3–17).
H 2N NH2
Neutral Mutations Often Spread to Become Fixed in a Population, H2N
NH2
with a Probability That Depends on Population Size
In comparisons between two species that have diverged from one another by mil-
lions of years, it makes little difference which individuals from each species are

Figure 4–77 Schematic view of an antibody (immunoglobulin) molecule.


This molecule is a complex of two identical heavy chains and two identical light chain
light chains. Each heavy chain contains four similar, covalently linked
domains, while each light chain contains two such domains. Each domain
is encoded by a separate exon, and all of the exons are thought to have
evolved by the serial duplication of a single ancestral exon. HOOC COOH
HOW GENOMES EVOLVE 231

compared. For example, typical human and chimpanzee DNA sequences differ
from one another by about 1%. In contrast, when the same region of the genome
is sampled from two randomly chosen humans, the differences are typically about
0.1%. For more distantly related organisms, the interspecies differences outshine
intraspecies variation even more dramatically. However, each “fixed difference”
between the human and the chimpanzee (in other words, each difference that is
now characteristic of all or nearly all individuals of each species) started out as a
new mutation in a single individual. If the size of the interbreeding population in
which the mutation occurred is N, the initial allele frequency for a new mutation
would be 1/(2N) for a diploid organism. How does such a rare mutation become
fixed in the population, and hence become a characteristic of the species rather
than of a few scattered individuals?
The answer to this question depends on the functional consequences of the
mutation. If the mutation has a significantly deleterious effect, it will simply be
eliminated by purifying selection and will not become fixed. (In the most extreme
case, the individual carrying the mutation will die without producing progeny.)
Conversely, the rare mutations that confer a major reproductive advantage on
individuals who inherit them can spread rapidly in the population. Because
humans reproduce sexually and genetic recombination occurs each time a gam-
ete is formed (discussed in Chapter 5), the genome of each individual who has
inherited the mutation will be a unique recombinational mosaic of segments
inherited from a large number of ancestors. The selected mutation along with a
modest amount of neighboring sequence—ultimately inherited from the individ-
ual in which the mutation occurred—will simply be one piece of this huge mosaic.
The great majority of mutations that are not harmful are not beneficial either.
These selectively neutral mutations can also spread and become fixed in a pop-
ulation, and they make a large contribution to evolutionary change in genomes.
For example, as we saw earlier, they account for most of the DNA sequence dif-
ferences between apes and humans. The spread of neutral mutations is not as
rapid as the spread of the rare strongly advantageous mutations. It depends on
a random variation in the number of mutation-bearing progeny produced by
each mutation-bearing individual, causing changes in the relative frequency of
the mutant allele in the population. Through a sort of “random walk” process, the
mutant allele may eventually become extinct, or it may become commonplace.
This can be modeled mathematically for an idealized interbreeding population,
on the assumption of constant population size and random mating, as well as
selective neutrality for the mutations. While neither of the first two assumptions
is a good description of human population history, study of this idealized case
reveals the general principles in a clear and simple way.
When a new neutral mutation occurs in a population of constant size N that
is undergoing random mating, the probability that it will ultimately become fixed
is approximately 1/(2N). This is because there are 2N copies of the gene in the
diploid population, and each of them has an equal chance of becoming the pre-
dominant version in the long run. For those mutations that do become fixed, the
mathematics shows that the average time to fixation is approximately 4N gener-
ations. Detailed analyses of data on human genetic variation have suggested an
ancestral population size of approximately 10,000 at the time when the current
pattern of genetic variation was largely established. With a population that has
reached this size, the probability that a new, selectively neutral mutation would
become fixed is small (1/20,000), while the average time to fixation would be on
the order of 800,000 years (assuming a 20-year generation time). Thus, while we
know that the human population has grown enormously since the development
of agriculture approximately 15,000 years ago, most of the present-day set of com-
mon human genetic variants reflects the mixture of variants that was already pres-
ent long before this time, when the human population was still small.
Similar arguments explain another phenomenon with important practical
implications for genetic counseling. In an isolated community descended from
a small group of founders, such as the people of Iceland or the Jews of Eastern
232 Chapter 4: DNA, Chromosomes, and Genomes

disease survivors Figure 4–78 How founder effects


or migrants determine the set of genetic variants in
a population of individuals belonging to
the same species. This example illustrates
how a rare allele (red) can become
established in an isolated population,
individual with
rare allele even though the mutation that produced
it has no selective advantage—or is mildly
original population founder group new population deleterious.

Europe, genetic variants that are rare in the human population as a whole can
often be present at a high frequency, even if those variants are mildly deleterious
(Figure 4–78).
MBoC6 n4.448/4.76.5
A Great Deal Can Be Learned from Analyses of the Variation
Among Humans
Even though the common variant gene alleles among modern humans originate
from variants present in a comparatively tiny group of ancestors, the total number
of variants now encountered, including those that are individually rare, is very
large. New neutral mutations are constantly occurring and accumulating, even
though no single one of them has had enough time to become fixed in the vast
modern human population.
From detailed comparisons of the DNA sequences of a large number of mod-
ern humans located around the globe, scientists can estimate how many gener-
ations have elapsed since the origin of a particular neutral mutation. From such
data, it has been possible to map the routes of ancient human migrations. For
example, by combining this type of genetic analysis with archaeological findings,
scientists have been able to deduce the most probable routes that our ancestors
took when they left Africa 60,000 to 80,000 years ago (Figure 4–79).
We have been focusing on mutations that affect a single gene, but these are not
the only source of variation. Another source, perhaps even more important but
missed for many years, lies in the many duplications and deletions of large blocks
of human DNA. When one compares any individual human with the standard
reference genome in the database, one will generally find roughly 100 differences
involving gain or loss of long sequence blocks, totaling perhaps 3 million nucleo-
tide pairs. Some of these copy number variations (CNVs) will be very common,
presumably reflecting relatively ancient origins, while others will be present in Figure 4–79 Tracing the course of
only a small minority of people (Figure 4–80). On average, nearly half of the CNVs human history by analyses of genome
sequences. The map shows the
contain known genes. CNVs have been implicated in many human traits, includ- routes of the earliest successful human
ing color blindness, infertility, hypertension, and a wide variety of disease suscep- migrations. Dotted lines indicate two
tibilities. In retrospect, this type of variation is not surprising, given the prominent alternative routes that our ancestors are
role of DNA addition and DNA loss in vertebrate evolution. thought to have taken out of Africa. DNA
The intraspecies variations that have been most extensively characterized, sequence comparisons suggest that
modern Europeans descended from a
however, are single-nucleotide polymorphisms (SNPs). These are simply points small ancestral population that existed
in the genome sequence where one large fraction of the human population has about 30,000 to 50,000 years ago.
one nucleotide, while another substantial fraction has another. To qualify as In agreement, archaeological findings
suggest that the ancestors of modern
native Australians (solid red arrows)—and
of modern European and Middle Eastern
populations—reached their destinations
about 45,000 years ago. Even more recent
studies, comparing the genome sequences
of living humans with those of Neanderthals
and another extinct population from
southern Siberia (the Denisovans), suggest
that our exit from Africa was a bit more
convoluted, while also revealing that a
number of our ancestors interbred with
these hominid neighbors as they made
their way across the globe. (Modified from
P. Forster and S. Matsumura, Science
308:965–966, 2005.)

ECB4 e19.37/19.41
HOW GENOMES EVOLVE 233

10,000,000 Figure 4–80 Detection of copy number


nucleotide pairs variations on human chromosome 17.
human chromosome 17
When 100 individuals were tested by a
DNA microarray analysis that detects
the copy number of DNA sequences
throughout the entire length of this
chromosome, the indicated distributions
of DNA additions (green bars) and DNA
density of losses (red bars) were observed compared
known genes with an arbitrary human reference
sequence. The shortest red and green bars
represent a single occurrence among the
200 chromosomes examined, whereas the
longer bars indicate that the addition or
DNA additions
in individual loss was correspondingly more frequent.
humans The results show preferred regions where
the variations occur, and these tend to
be in or near regions that already contain
blocks of segmental duplications. Many
of the changes include known genes.
(Adapted from J.L. Freeman et al., Genome
DNA losses
in individual Res. 16:949–961, 2006. With permission
humans from Cold Spring Harbor Laboratory Press.)

a polymorphism, the variants must be common enough to give a reasonably


high probability that the genomes of two randomly chosen individuals will dif-
fer at the given site; a probability of 1% is commonly chosen as the cutoff. Two
human genomes sampled from the modern world population at random will dif-
fer at approximately 2.5 × 106 such sites (1 per 1300 nucleotide pairs). As will be
described in the overview of genetics in Chapter 8, SNPs in the human genome
MBoC6 m4.90/4.77
can be extremely useful for genetic mapping analyses, in which one attempts to
associate specific traits (phenotypes) with specific DNA sequences for medical or
scientific purposes (see p. 493). But while useful as genetic markers, there is good
evidence that most of these SNPs have little or no effect on human fitness. This
is as expected, since deleterious variants will have been selected against during
human evolution and, unlike SNPs, should therefore be rare.
Against the background of ordinary SNPs inherited from our prehistoric
ancestors, certain sequences with exceptionally high mutation rates stand out. A
dramatic example is provided by CA repeats, which are ubiquitous in the human
genome and in the genomes of other eukaryotes. Sequences with the motif (CA)n
are replicated with relatively low fidelity because of a slippage that occurs between
the template and the newly synthesized strands during DNA replication; hence,
the precise value of n can vary over a considerable range from one genome to the
next. These repeats make ideal DNA-based genetic markers, since most humans
are heterozygous, having inherited one repeat length (n) from their mother and a
different repeat length from their father. While the value of n changes sufficiently
rarely that most parent–child transmissions propagate CA repeats faithfully, the
changes are sufficiently frequent to maintain high levels of heterozygosity in the
human population. These and some other simple repeats that display exception-
ally high variability therefore provide the basis for identifying individuals by DNA
analysis in crime investigations, paternity suits, and other forensic applications
(see Figure 8–39).
While most of the SNPs and CNVs in the human genome sequence are thought
to have little or no effect on phenotype, a subset of the genome sequence varia-
tions must be responsible for the heritable aspects of human individuality. We
know that even a single nucleotide change that alters one amino acid in a protein
can cause a serious disease, as for example in sickle-cell anemia, which is caused
by such a mutation in hemoglobin (Movie 4.3). We also know that gene dosage—a
doubling or halving of the copy number of some genes—can have a profound
effect on development by altering the level of gene product, as can changes in
regulatory DNA sequences. There is therefore every reason to suppose that some
of the many differences between any two human beings will have substantial
234 Chapter 4: DNA, Chromosomes, and Genomes

effects on human health, physiology, behavior, and physique. A major challenge


in human genetics is to recognize those relatively few variations that are function-
ally important against a large background of variation that is neutral and of no
consequence.

Summary
Comparisons of the nucleotide sequences of present-day genomes have revolution- WHAT WE DON’T KNOW
ized our understanding of gene and genome evolution. Because of the extremely
high fidelity of DNA replication and DNA repair processes, random errors in main-
• How many different types of
taining the nucleotide sequences in genomes occur so rarely that only about one chromatin structure are important for
nucleotide in a thousand is altered in every million years in any particular eukary- cells? How is each of these structures
otic line of descent. Not surprisingly, therefore, a comparison of human and chim- established and maintained, and
panzee chromosomes—which are separated by about 6 million years of evolution— which ones tend to be inherited
reveals very few changes. Not only are our genes essentially the same, but their order following DNA replication?
on each chromosome is almost identical. Although a substantial number of seg-
mental duplications and segmental deletions have occurred in the past 6 million • Why are there so many different
years, even the positions of the transposable elements that make up a major portion chromatin remodeling complexes in
of our noncoding DNA are mostly unchanged. cells? What are their essential roles,
When one compares the genomes of two more distantly related organisms—such and how do they get loaded onto
as a human and a mouse, separated by about 80 million years—one finds many chromatin at specific places and at
more changes. Now the effects of natural selection can be clearly seen: through puri- specific times?
fying selection, essential nucleotide sequences—both in regulatory regions and in
coding sequences (exons)—have been highly conserved. In contrast, nonessential • How do chromosomal loops form
sequences (for example, much of the DNA in introns) have been altered to such an during interphase, and what happens
extent that one can no longer see any family resemblance. to these loops in condensed mitotic
Because of purifying selection, the comparison of the genome sequences of chromosomes?
multiple related species is an especially powerful way to find DNA sequences with
important functions. Although about 5% of the human genome has been conserved • What genetic changes made
as a result of purifying selection, the function of the majority of this DNA (tens of us uniquely human? What further
thousands of multispecies conserved sequences) remains mysterious. Future exper- aspects of our recent evolutionary
development can be reconstructed
iments characterizing its functions should teach us many new lessons about verte-
by sequencing DNA from remains of
brate biology. ancient hominids?
Other sequence comparisons show that a great deal of the genetic complexity of
present-day organisms is due to the expansion of ancient gene families. DNA dupli-
• How much of the enormous
cation followed by sequence divergence has clearly been a major source of genetic complexity that we find in cell biology
novelty during evolution. On a more recent time scale, the genomes of any two is unnecessary, having evolved by
humans will differ from each other both because of nucleotide substitutions (sin- random drift?
gle-nucleotide polymorphisms, or SNPs) and because of inherited DNA gains and
DNA losses that cause copy number variations (CNVs). Understanding the effects
of these differences will improve both medicine and our understanding of human
biology.

PROBLEMS
Which statements are true? Explain why or why not.
4–1 Human females have 23 different chromosomes, served DNA sequences facilitates the search for function-
whereas human males have 24. ally important regions.
4–2 The four core histones are relatively small proteins 4–5 Gene duplication and divergence is thought to
with a very high proportion of positively charged amino have played a critical role in the evolution of increased bio-
acids; the positive charge helps the histones bind tightly to logical complexity.
DNA, regardless of its nucleotide sequence.
Discuss the following problems.
4–3 Nucleosomes bind DNA so tightly that they cannot
4–6 DNA isolated from the bacterial virus M13 con-
move from the positions where they are first assembled.
tains 25% A, 33% T, 22% C, and 20% G. Do these results
4–4 In a comparison between the DNAs of related strike you as peculiar? Why or why not? How might you
organisms such as humans and mice, identifying the con- explain these values?
CHAPTER 4 END-OF-CHAPTER PROBLEMS 235

Figure Q4–1 Three nucleotides from the interior


of a single strand of DNA (Problem 4–7). Arrows O A
at the ends of the DNA strand indicate that the telomere telomere
CH2 O
structure continues in both directions.
Ade2 gene at normal location
4–7 A segment of DNA from the
white colony of
interior of a single strand is shown in O
yeast cells
Figure Q4–1. What is the polarity of this –O P O
DNA from top to bottom? O
C
CH2 O
4–8 Human DNA contains 20% C Ade2 gene moved near telomere
on a molar basis. What are the mole
percents of A, G, and T? red colony of
O yeast cells
with white sectors
4–9 Chromosome 3 in orangutans –O P O
differs from chromosome 3 in humans O Figure Q4–3 Position effect on expression of the yeast Ade2 gene
T (Problem 4–13). The Ade2 gene codes for one of the enzymes of
by two inversion events that occurred
CH2 O adenosine biosynthesis, and the absence of the Ade2 gene product
in the human lineage (Figure Q4–2). leads to the accumulation of a red pigment. Therefore a colony of cells
Draw the intermediate chromosome that express Ade2 is white, and one composed of cells in which the
that resulted from the first inversion Ade2 gene is not expressed is red.
O Problems p4.18/4.13
and explicitly indicate the segments
included in each inversion.
gene is still located near telomeres. Explain why white sec-
Figure Q4–2 Chromosome
3 in orangutans and humans
tors have formed near the rim of the red colony. Based on
two inversions (Problem 4–9). Differently colored the patterns observed, what can you conclude about the
blocks indicate segments of the propagation of the transcriptional state of the Ade2 gene
chromosomes
Problems that are homologous from mother to daughter cells in this experiment?
p4.03/4.03/Q4.1
in DNA sequence.
orangutan human
4–14 Mobile pieces of DNA—transposable elements—
that insert themselves into chromosomes and accumulate
during evolution make up more than 40% of the human
4–10 Assuming that the 30-nm chromatin fiber con-
genome. Transposable elements of four types—long inter-
tains about 20 nucleosomes (200 bp/nucleosome) per 50
spersed nuclear elements (LINEs), short interspersed
nm of length,
Problems calculate the degree of compaction of DNA
p4.06/4.05/Q4.2
nuclear elements (SINEs), long terminal repeat (LTR)
associated with this level of chromatin structure. What
retrotransposons, and DNA transposons—are inserted
fraction of the 10,000-fold condensation that occurs at
more-or-less randomly throughout the human genome.
mitosis does this level of DNA packing represent?
These elements are conspicuously rare at the four homeo-
4–11 In contrast to histone acetylation, which always box gene clusters, HoxA, HoxB, HoxC, and HoxD, as illus-
correlates with gene activation, histone methylation can trated for HoxD in Figure Q4–4, along with an equivalent
lead to either transcriptional activation or repression. How region of chromosome 22, which lacks a Hox cluster. Each
do you suppose that the same modification—methyla- Hox cluster is about 100 kb in length and contains 9 to 11
tion—can mediate different biological outcomes? genes, whose differential expression along the anteropos-
terior axis of the developing embryo establishes the basic
4–12 Why is a chromosome with two centromeres (a body plan for humans (and for other animals). Why do you
dicentric chromosome) unstable? Would a backup cen- suppose that transposable elements are so rare in the Hox
tromere not be a good thing for a chromosome, giving it clusters?
two chances to form a kinetochore and attach to microtu-
bules during mitosis? Would that not help to ensure that chromosome 22
the chromosome did not get left behind at mitosis?

4–13 Look at the two yeast colonies in Figure Q4–3. Each chromosome 2
of these colonies contains about 100,000 cells descended
from a single yeast cell, originally somewhere in the mid- 100 kb HoxD cluster
dle of the clump. A white colony arises when the Ade2 gene
is expressed from its normal chromosomal location. When Figure Q4–4 Transposable elements and genes in 1-Mb regions of
the Ade2 gene is moved to a location near a telomere, it chromosomes 2 and 22 (Problem 4–14). Blue lines that project upward
is packed into heterochromatin and inactivated in most indicate exons of known genes. Red lines that project downward
indicate transposable elements; they are so numerous (constituting more
cells, giving rise to colonies that are mostly red. In these than 40% of the human genome) that they merge into nearly a solid
largely red colonies, white sectors fan out from the middle block outside the Hox clusters. (Adapted from E. Lander et al., Nature
of the colony. In both the red and white sectors, the Ade2 409:860–921, 2001. With permission from Macmillan Publishers Ltd.)
Problems p4.31/4.22/Q4.4
236 Chapter 4: DNA, Chromosomes, and Genomes

REFERENCES Gohl D, Aoki T, Blanton J et al. (2011) Mechanism of chromosomal


boundary action: roadblock, sink, or loop? Genetics 187, 731–748.
General Mellone B, Erhardt S & Karpen GH (2006) The ABCs of centromeres.
Nat. Cell Biol. 8, 427–429.
Armstrong L (2014) Epigenetics. New York: Garland Science.
Morris SA, Baek S, Sung M-H et al. (2014) Overlapping chromatin-
Hartwell L, Hood L, Goldberg ML et al. (2010) Genetics: From Genes to
remodeling systems collaborate genome wide at dynamic
Genomes, 4th ed. Boston, MA: McGraw Hill.
chromatin transitions. Nat. Struct. Mol. Biol. 21, 73–81.
Jobling M, Hollox E, Hurles M et al. (2014) Human Evolutionary
Politz JCR, Scalzo D & Groudine M (2013) Something silent this
Genetics, 2nd ed. New York: Garland Science.
way forms: the functional organization of the repressive nuclear
Strachan T & Read AP (2010) Human Molecular Genetics, 4th ed. New compartment. Annu. Rev. Cell Dev. Biol. 29, 241–270.
York: Garland Science.
Rothbart SB & Strahl BD (2014) Interpreting the language of histone
The Structure and Function of DNA and DNA modifications. Biochim. Biophys. Acta 1839, 627–643.
Weber CM & Henikoff S (2014) Histone variants: dynamic punctuation
Avery OT, MacLeod CM & McCarty M (1944) Studies on the chemical
in transcription. Genes Dev. 28, 672–682.
nature of the substance inducing transformation of pneumococcal
types. J. Exp. Med. 79, 137–158. Xu M, Long C, Chen X et al. (2010) Partitioning of histone H3-H4
tetramers during DNA replication-dependent chromatin assembly.
Meselson M & Stahl FW (1958) The replication of DNA in Escherichia
Science 328, 94–98.
coli. Proc. Natl Acad. Sci. USA 44, 671–682.
Watson JD & Crick FHC (1953) Molecular structure of nucleic acids. The Global Structure of Chromosomes
A structure for deoxyribose nucleic acid. Nature 171, 737–738.
Belmont AS (2014) Large-scale chromatin organization: the good, the
Chromosomal DNA and Its Packaging in the Chromatin surprising, and the still perplexing. Curr. Opin. Cell Biol. 26, 69–78.
Fiber Bickmore W (2013) The spatial organization of the human genome.
Annu. Rev. Genomics Hum. Genet. 14, 67–84.
Andrews AJ & Luger K (2011) Nucleosome structure(s) and stability:
variations on a theme. Annu. Rev. Biophys. 40, 99–117. Callan HG (1982) Lampbrush chromosomes. Proc. R. Soc. Lond. B
Biol. Sci. 214, 417–448.
Avvakumov N, Nourani A & Cõté J (2011) Histone chaperones:
modulators of chromatin marks. Mol. Cell 41, 502–514. Cheutin T, Bantignies F, Leblanc B & Cavalli G (2010) Chromatin folding:
from linear chromosomes to the 4D nucleus. Cold Spring Harb.
Deal RB, Henikoff JG & Henikoff S (2010) Genome-wide kinetics of
Symp. Quant. Biol. 75, 461–473.
nucleosome turnover determined by metabolic labeling of histones.
Science 328, 1161–1164. Cremer T & Cremer M (2010) Chromosome territories. Cold Spring
Harb. Perspect. Biol. 2, a003889.
Grigoryev SA & Woodcock CL (2012) Chromatin organization—the
30 nm fiber. Exp. Cell Res. 318, 1448–1455. Lieberman-Aiden E, van Berkum NL, Williams L et al. (2009)
Comprehensive mapping of long-range interactions reveals folding
Li G, Levitus M, Bustamante C & Widom J (2005) Rapid spontaneous
principles of the human genome. Science 326, 289–293.
accessibility of nucleosomal DNA. Nat. Struct. Mol. Biol. 12, 46–53.
Maeshima K & Laemmli UK (2003) A two-step scaffolding model for
Luger K, Mäder AW, Richmond RK et al. (1997) Crystal structure of the
mitotic chromosome assembly. Dev. Cell 4, 467–480.
nucleosome core particle at 2.8 Å resolution. Nature 389, 251–260.
Moser SC & Swedlow JR (2011) How to be a mitotic chromosome.
Narlikar GJ, Sundaramoorthy R & Owen-Hughes T (2013) Mechanisms
Chromosome Res. 19, 307–319.
and functions of ATP-dependent chromatin-remodeling enzymes.
Cell 154, 490–503. Nizami ZF, Deryusheva S & Gall JG (2010) Cajal bodies and histone
locus bodies in Drosophila and Xenopus. Cold Spring Harb. Symp.
Song F, Chen P, Sun D et al. (2014) Cryo-EM study of the chromatin
Quant. Biol. 75, 313–320.
fiber reveals a double helix twisted by tetranucleosomal units.
Science 344, 376–380. Zhimulev IF (1997) Polytene chromosomes, heterochromatin, and
position effect variegation. Adv. Genet. 37, 1–566.
Chromatin Structure and Function How Genomes Evolve
Al-Sady B, Madhani HD & Narlikar GJ (2013) Division of labor between
Batzer MA & Deininger PL (2002) Alu repeats and human genomic
the chromodomains of HP1 and Suv39 methylase enables
diversity. Nat. Rev. Genet. 3, 370–379.
coordination of heterochromatin spread. Mol. Cell 51, 80–91.
Feuk L, Carson AR & Scherer S (2006) Structural variation in the human
Beisel C & Paro R (2011) Silencing chromatin: comparing modes and genome. Nat. Rev. Genet. 7, 85–97.
mechanisms. Nat. Rev. Genet. 12, 123–135.
Green RE, Krause J, Briggs AW et al. (2010) A draft sequence of the
Black BE, Jansen LET, Foltz DR & Cleveland DW (2011) Centromere Neandertal genome. Science 328, 710–722.
identity, function, and epigenetic propagation across cell divisions.
International Human Genome Sequencing Consortium (2001) Initial
Cold Spring Harb. Symp. Quant. Biol. 75, 403–418. sequencing and analysis of the human genome. Nature
Elgin SCR & Reuter G (2013) Position-effect variegation, 409, 860–921.
heterochromatin formation, and gene silencing in Drosophila. Cold International Human Genome Sequencing Consortium (2004) Finishing
Spring Harb. Perspect. Biol. 5, a017780. the euchromatic sequence of the human genome. Nature
Felsenfeld G (2014) A brief history of epigenetics. Cold Spring Harb. 431, 931–945.
Perspect. Biol. 6, a018200. Kellis M, Wold B, Snyder MP et al. (2014) Defining functional DNA
Feng S, Jacobsen SE & Reik W (2010) Epigenetic reprogramming in elements in the human genome. Proc. Natl Acad. Sci. USA
plant and animal development. Science 330, 622–627. 111, 6131–6138.
Filion GJ, van Bemmel JG, Braunschweig U et al. (2010) Systematic Lander ES (2011) Initial impact of the sequencing of the human
protein location mapping reveals five principal chromatin types in genome. Nature 470, 187–197.
Drosophila cells. Cell 143, 212–224. Lee C & Scherer SW (2010) The clinical context of copy number
Fodor BD, Shukeir N, Reuter G & Jenuwein T (2010) Mammalian variation in the human genome. Expert Rev. Mol. Med. 12, e8.
Su(var) genes in chromatin control. Annu. Rev. Cell Dev. Biol. Mouse Genome Sequencing Consortium (2002) Initial sequencing and
26, 471–501. comparative analysis of the mouse genome. Nature 420, 520–562.
Giles KE, Gowher H, Ghirlando R et al. (2010) Chromatin boundaries, Pollard KS, Salama SR, Lambert N et al. (2006) An RNA gene
insulators, and long-range interactions in the nucleus. Cold Spring expressed during cortical development evolved rapidly in humans.
Harb. Symp. Quant. Biol. 75, 79–85. Nature 443, 167–172.

You might also like