Transcription
Transcription
Transcription
Transcription
U. N. Dwivedi
Department of Biochemistry
University of Lucknow, Lucknow-226007
and
Smita Rastogi
Department of Biotechnology, Integral University, Lucknow
CONTENTS
Introduction
Transcription in prokaryotes (Synthesis of mRNA/rRNA/tRNA)
Prokaryotic transcription apparatus
RNA polymerase (RNA Pol) or DNA dependent RNA Polymerase
Structure of RNA polymerase
Synthesis of RNA in 5 3 direction
Requirement of Mg++
Significance of subunit of RNA Pol
Functions of RNA polymerase
Fidelity of RNA synthesis
Promoters
Overall process of prokaryotic transcription
Initiation
Elongation
Termination
Transcription in eukaryotes
Eukaryotic transcription apparatus
RNA polymerase or DNA dependent RNA Polymerase (RNA Pol)
Eukaryotic promoters
Enhancers
Transcription Factors
Elongation factors
Overall process of eukaryotic transcription
Post transcriptional processing
Post transcriptional processing of mRNA (maturation of mRNA)
Post transcriptional processing of mRNA in prokaryotes
Post transcriptional processing of mRNA in Eukaryotes
Alternative mRNA processing
Post transcriptional processing of tRNA and rRNA (maturation of tRNA and
rRNA)
Post transcriptional processing of tRNA
Post transcriptional processing of rRNA
Inhibitors of transcription
RNA Pol binding inhibitors
DNA specific inhibitors
Reverse transcriptase (RT) (RNA directed DNA polymerase)
Key words
Synthesis of mRNA, rRNA and tRNA; Prokaryotic and eukaryotic RNA polymerases; Promoters; Transcription
factors; Enhancers; Post transcriptional RNA processing: Capping; Splicing; Polyadenylation; Inhibition of
transcription; Reverse transcriptase
2
Introduction
DNA stores genetic information in a stable form that can be readily replicated. However, the
expression of this genetic information requires its flow from DNA to RNA to protein. The first
step i.e. the conversion of DNA sequence information into RNA sequence information or more
precisely the process of RNA synthesis according to the instructions of DNA template is called
transcription.
Before studying the details of transcription, few points that need mention are:
$ The two strands of double stranded DNA are coding strand and template strand. The coding
strand of DNA has the same sequence as that of RNA transcript except for thymine (T) in
place of uracil (U). The coding strand is also called sense or (+) strand. The template strand
is also called antisense or (-) strand. The sequence of the template strand is the complement
of the RNA transcript (Fig. 1).
$ The first nucleotide of a transcribed DNA sequence is denoted as + 1 and is called start site.
The sequences towards the 5 side of start site are referred to as upstream sequences and
denoted with minus sign. The sequences towards the 3 side are downstream sequences and
denoted with plus sign. Thus, the second nucleotide downstream of + 1 site is + 2 and so
on. The nucleotide preceding the start site is denoted as - 1 and so on. There is no 0 (zero)
nucleotide. These designations refer to the coding strand of DNA. The coding strand for a
particular gene may be located in either strand of a given DNA.
$ Different parts of the genome can be transcribed to different extents, choice of which part
to transcribe and how extensively can be regulated by regulatory elements.
$ RNA synthesis occurs in 5 3 direction.
3
Prokaryotic transcription apparatus
(1) RNA polymerase (RNA Pol) or DNA dependent RNA polymerase
RNA Pol is present in all prokaryotic cells and was first discovered in 1960 by Samuel Weiss
and Jerard Hurwitz. In E. coli (eubacteria), a single type of RNA Pol appears to be responsible
for almost all the synthesis of RNA such as mRNA, rRNA and tRNA. Various bacteriophages
also encode RNA Pol that synthesizes only phage-specific RNAs.
The RNA Pol moves along the template, synthesizing RNA starting from the promoter
(described below) until it reaches a sequence called terminator. This action defines a
transcription unit that extends from the promoter to the terminator and the immediate product of
transcription is called primary transcript. The primary transcript is, however, almost always
unstable, and is either degraded or cleaved to give the mature products, viz, mRNA / rRNA /
tRNA.
Another subunit called subunit binds only transiently to the core enzyme, forming a
holoenzyme 2. E. coli has several factors which are summarized in Table 2. 70 is used
for general transcription while other factors are activated by specific environmental conditions.
Thus, 32, 54, 28 or F, H, etc, are induced at the time of heat shock, nitrogen starvation,
flagellar, shock respectively.
E. coli RNA Pol has the overall shape of a crab claw, where the two pincers are made up
predominantly of two large subunits, namely and (Fig. 2).
Further structural analysis shows that RNA Pol there is a channel or groove that allow DNA,
RNA and ribonucleotides into and out of the enzymes active center cleft (Fig. 3). The channel
for DNA lies at the interface of the and subunits. The NTP-uptake channel allows
ribonucleotides to enter the active center. The RNA exit channel allows the growing RNA chain
to leave the enzyme as it is synthesized during elongation. The downstream DNA (i.e. DNA
ahead of the enzyme, yet to be transcribed) enters active center cleft in double stranded form
through the downstream DNA channel (between the pincers). Within the active center cleft, the
DNA strands separate from position +3. The non-template strand exits the active center cleft
through the non-template strand (NT) channel and travels across the surface of the enzyme. The
template strand, in contrast, follows a path through the active center cleft and exits through the
template strand (T) channel. RNA Pol surrounds the DNA. The length of groove could hold 16
bp in bacterial enzyme and ~25 bp in eukaryotic enzyme.
4
Table 1: Properties and functions of subunits of E. coli RNA pol
5
Table 2: Types of E. coli factors
Upstream
DNA
RNA Pol movement
RNA exit
Rudder DNA enters jaws
Wall
Bridge
Nucleotides
6
Active site
RNA pincer
exit channel Downstream
+1 DNA
-10
Upstream -35 pincer
DNA flap
T channel NT channel
The map of the E. coli 70 factor identifies four conserved regions, namely 1-4, which are further
subdivided into sub-regions (Fig. 4). These sub-regions have different functions. The subregion
2.4 (also called -10 region or unwinding domain) confers specificity by recognizing -10 region of
promoter, while subregion 4.2 (also called -35 region or recognition domain) provides binding
energy by recognizing -35 region of the promoter. The details of other sub-regions are tabulated
in Table 3.
C N
4 3 2 1
Responsible for melting
7
Table 3: Functions of regions and sub-regions of factor
8
incoming rNTP.
For transcription RNA Pol requires DNA template, ribonucleoside triphosphates (rNTPs; viz.
rATP, rGTP, rCTP and UTP), Mg++. There is no requirement of any primer. The enzyme is most
active when bound to double stranded DNA, but only one of the two strands serve as a template.
The 3-OH group of the growing RNA chain attacks the -P of the incoming NTP and releases
pyrophosphate. This reaction is thermodynamically favorable and the subsequent degradation of
the pyrophosphate to orthophosphate locks the reaction in the direction of RNA synthesis. The 5
triphosphate group of the first residue in a nascent (newly formed) RNA molecule is not cleaved
to release PPi, but remains intact throughout the transcription process. Thus, the reaction is
driven by the release and subsequent hydrolysis of PPi as summarized in Scheme 1.
where,
NMP signifies ribonucleoside monophosphate;
n represents number of NMPs
Pyrophosphatase
PPi 2Pi
Pyrophosphate Orthophosphate
RNA Pol requires that the initiating NTP be brought into its active site and held stably on its
template whereas the next NTP is presented with correct geometry for chemistry of
polymerization to occur. This is particularly difficult because RNA Pol starts most transcripts
with A and that ribonucleotide binds the template nucleotide T with only two H-bonds. Thus,
the enzyme has to make specific interactions with the initiating NTP, holding it rigidly in the
correct orientation to allow chemical attack on the incoming NTP. The requirement for such
specific interactions between the enzyme and the initiating NTP probably explains why most
transcripts start with same nucleotide. The interactions are specific for that nucleotide (or A) and
thus only chains beginning with A are held in a manner suitable for efficient initiation. It is
believed that the interactions are provided by various parts of the RNA Pol holoenzyme,
including part of . Consistent with this, in experiments using an RNA Pol containing a 70
derivative lacking this part of , initiation requires much higher than normal concentrations of
initiating it.
9
(Mg++) in its active form, consistent with the proposed two metal ion catalytic mechanism for
nucleotide addition proposed for all types of polymerases. One metal ion remains bound to the
enzyme whereas the other appears to come in with the nucleoside tri phosphate and leave with
the pyrophosphate. The and subunits extensively interact with one another, particularly at
the base of the channel where the active site Mg++ ion is located. The subunit binds a Zn++ ion
via four cysteine residues that are invariant in prokaryotes but not in eukaryotes. Three
conserved aspartate residues (Asp) of the enzyme participate in binding these metal ions.
Comparisons of the crystal structures of core enzyme and holoenzyme show that factor lies
largely on the surface of the core enzyme. It has an elongated structure that extends past the
DNA binding site. The subunit binds transiently to the core enzyme and directs the RNA Pol
holoenzyme to specific binding sites on DNA where transcription begins.
The factor dissociates from the rest of the RNA Pol when RNA chain reaches 8-9 nucleotides
in length. It is not necessary for elongation phase. When factor is released from core enzyme, it
reverts to a general affinity for all DNA, irrespective of sequence, that suits it to continue
transcription. It therefore becomes immediately available for use by another core enzyme.
A change in association between and holoenzyme changes binding affinity for DNA so that
core enzyme can move along DNA. RNA Pol encounters a dilemma in reconciling its needs for
initiation with those for elongation. Initiation requires tight binding only to particular sequences
(promoters), while elongation requires close association with all sequences that the enzyme
encounters during transcription. This dilemma is solved by the reversible association between
10
factor and core enzyme. factor is either released following initiation or changes its association
with core enzyme so that it no longer participates in DNA binding. There is only 30% of the
amount of factor present in the cell compared with core enzyme complexes. Therefore one-
third of the polymerase complexes can exist as holoenzyme at any one time. Because there are
fewer molecules of than of core enzyme, the utilization of core enzyme requires that
recycles. This occurs immediately after initiation in about one third of cases, presumably and
core dissociate at some later point in the other cases. Irrespective of the exact timing of its
release from core enzyme, factor is involved only in initiation.
After the release of factor from the RNA Pol, the core enzyme moves along the DNA
synthesizing the growing RNA strand. The factor can then complex with a further core enzyme
complex and reinitiate transcription.
$ Melting of DNA
This melting occurs between positions -11 and +3, in relation to the transcription start
site. The double helix reforms at -11 in the upstream DNA behind the enzyme. The and
subunits contact DNA at many points downstream of the active site. They make several
contacts with the coding strand in the region of the transcription bubble, thus stabilizing
the separated single strands. The RNA is contacted largely in the region of the
transcription bubble.
As the enzyme moves along DNA, the base in the template strand at the start of the turn
will be flipped to face the nucleotide entry site. The RNA-DNA hybrid is 9 bp long and the
5 end of RNA is forced to leave the DNA when it hits a protein called rudder.
Once DNA has been melted, the individual strands have a flexible structure in the
transcription bubble. This enables DNA to take its turn in the active site. But before
transcription starts, the DNA double helix is a relatively rigid straight structure. This
straight structure enters the polymerase without being blocked by the wall due to
conformational shift that occur in enzyme. Adjacent to the wall is a clamp. In the free form
of RNA Pol, this clamp swings away from the wall to allow DNA to follow a straight path
through the enzyme. After DNA has been melted to create the transcription bubble, the
clamp must swing back into position against the wall.
11
$ Selection of correct ribonucleotides
It selects the correct ribonucleotide triphosphate and catalyzes the formation of a
phosphodiester bond. This process is repeated many times as the enzyme moves
unidirectionally along the DNA template. RNA Pol is completely processive, i.e., a
transcript is synthesized from start to end by a single RNA Pol molecule.
$ Elongation
It is involved in elongation. When RNA Pol forms initial elongation complex after the first
10 bp have been synthesized, the RNA Pol may lose factor and lose contacts from -35
and -55. At 15-20 bp, general elongation complex is formed and covers 30-40 bp. The
elongating RNA Pol is a processive machine that synthesizes and proofreads RNA. DNA
passes through the elongating enzyme in a manner very similar to its passage through the
open complex. Thus, double stranded DNA enters the front of the enzyme between the
pincers. At the opening of the catalytic cleft, the strands separate to follow different paths
through the enzyme before exiting via their respective channels and reforming a double
helix behind the elongating polymerase. Ribonucleotides enter the active site through their
defined channel and are added to the growing RNA chain under the guidance of the
template DNA strand. Only eight or nine nucleotides of the growing RNA chain remain
base paired to the DNA template at any given time, the remainder of the RNA chain is
peeled off and directed out of the enzyme through the RNA exit channel.
RNA chain elongation requires that the double stranded DNA template be opened up at the
point of RNA synthesis so that the template strand can be transcribed to its complementary
RNA strand. In doing so, the RNA chain only transiently forms a short length of RNA-
DNA hybrid duplex, as is indicated by the observation that transcription leaves the
template duplex intact and yields single stranded RNA. The unpaired bubble of DNA in
the open initiation complex apparently travels along the DNA with RNA Pol. There are
two ways this might occur: (i) If the RNA Pol followed the template strand in its helical
path around the DNA, the DNA would build up little supercoiling because the DNA
duplex would never be unwound by more than about a turn. However, the RNA transcript
would wrap around the DNA, once per duplex turn. This model is implausible since it is
unlikely that its DNA and RNA could be readily untangled. The RNA would not
spontaneously unwind from the long and often circular DNA in any reasonable time and
no known topoisomerase can accelerate this process. (ii) If the RNA Pol moves in a
straight line while the DNA rotates, the RNA and DNA will not become entangled. Rather,
the DNAs helical turn are pushed ahead of the advancing transcription bubble so as to
more tightly wind the DNA ahead of the bubble (which promotes positive supercoiling)
and the linking number of the entire DNA remains unchanged). This model is supported
by the observations that the transcription of plasmids in E. coli causes their positive
supercoiling in Gyrase mutants (which cannot relax positive supercoils) and their negative
supercoiling in topoisomerase I mutants (which cannot relax negative supercoils). Infact,
by tethering RNA Pol to a glass surface and allowing it to transcribe DNA that had been
fluorescently labeled at one end, Kazuhiko Kinosita demonstrated, through fluorescence
microscopy (using techniques similar to those showing that the F1F0ATPase is a rotary
12
engine) that single DNA molecules rotated in the expected direction during transcription.
$ Proofreading
In addition, RNA Pol carries out two proofreading functions as well. The first of these is
called pyrophosphorilytic editing. In this, the enzyme uses its active site, in a simple back
reaction, to catalyze the removal of an incorrectly inserted ribonucleotide, by
reincorporation of PPi. The enzyme can then incorporate another ribonucleotide in its place
in the growing RNA chain. Note that the enzyme can remove either correct or incorrect
bases in this manner, but spends longer hovering over mismatches than matches and so
removes the former more frequently. In the second proofreading mechanism, called
hydrolytic editing, the polymerase back tracks by one or more nucleotides and cleaves the
RNA product, removing the error containing sequence. Hydrolytic editing is stimulated by
Gre factors, which as well as, enhancing hydrolytic editing function, also serve as
elongation stimulating factors. That is, they ensure that polymerase elongates efficiently
and helps overcome arrest at sequences that are difficult to transcribe. This combination
of functions is comparable to those imposed on the eukaryotic RNA Pol II by the
transcription factor TFIIs. Another group of proteins, the Nus proteins, joins polymerase in
the elongation phase and promotes, in still rather undefined ways, the process of
elongation and termination.
$ Termination
It detects termination signals that specify where a transcript ends. The length of RNA-
DNA hybrid is determined by a structure within the enzyme that forces the RNA-DNA
hybrid to separate, allowing the RNA chain to exit from the enzyme and the DNA chain to
rejoin its DNA partner. The RNA product does not remain base paired to the template
DNA strand, rather the enzyme displaces the growing chain only a few nucleotides behind
where each ribonucleotide is added. Because this release follows so closely behind the site
of polymerization, multiple RNA Pol molecules can transcribe the same gene at the same
time, each following closely along behind another. Thus, a cell synthesizes large numbers
of transcripts from a single gene (or other DNA sequence) in a short time.
Thus, RNA Pol has the facility to unwind and rewind DNA, to hold the separated strands of
DNA and the RNA product, to catalyze the addition of ribonucleotides to the growing RNA
chain and to adjust the difficulties in progressing by cleaving the RNA product and restarting
RNA synthesis (with the assistance of some accessory factors).
13
(2) Promoters
The promoter is the region of DNA where RNA polymerase binds to initiate transcription. The
information for promoter function is provided directly by the DNA sequence; its structure is the
signal for transcription. The promoter surrounds the first base pair that is transcribed into RNA,
the start point. As the promoters are present on the same DNA molecule as genes being
transcribed or regulated, these are called cis-acting elements. E. coli has about 2000 promoter
sites in its 4.6 X 106 bp genome. There are different types of promoters in E. coli, but most
prevalent one is 70 promoter (standard promoter), which is dealt with in detail in the following
discussion.
The sequence of promoter in E. coli lack any extensive conservation of sequence over the 60 bp
associated with RNA Pol. The sequence of much of the binding site is irrelevant. But some short
stretches within the promoter are conserved and they are critical for its function.
Bacterial promoters have following features:
(i) Start point (+1 position): The initiating (+1) nucleotide is usually (>90% of the time) a
purine nucleotide (A or G; A occurs more often than G). It is common for the start point to be the
central base in the poorly conserved CAT or CGT sequence, but the conservation of the base
triplet is not great enough to regard it as an obligatory signal.
(ii) -10 sequence or Pribnow Box: The most conserved sequence recognizable in almost all
promoters is a 6 bp long AT rich motif centered at ~10 nucleotides upstream of the start site.
Because of its position, it is named as -10 sequence. This is also known as Pribnow Box (named
after David Pribnow, who pointed out its existence in 1975). The center of the hexamer generally
is close to 10 bp upstream of the start point; the distance varies in known promoters from -18 to -
9. Its consensus is 5TATAAT and its average can be summarized in the form 5
T80A95T45A60A50T96 3 where the subscript denotes the % occurrence of the most frequently
found base, which in this case varies from 45-96%.
If the frequency of occurrence indicates likely importance in binding RNA Pol, we would expect
the initial highly conserved TA and the final almost completely conserved T in the -10 region to
be the most important bases. The region is AT rich and hence low energy is required for strand
separation at this region. A mutation in this region has been implicated to affect melting reaction.
(iii) -35 sequence: A 6 bp long sequence centered at ~35 nucleotides upstream of the start site.
The consensus is 5TTGACA. In more detailed form the conservation is 5 T82T84G78A65C54A45
14
3 where, the subscript denotes the % occurrence of the most frequently found base, which in this
case varies from 45-84%.
(iv) Distance between -10 and -35 sequences: The distance between these conserved
sequences (-10 and -35 regions) is also very critical. It is between 16-19 bp in 90% of the
promoters (a separation of 17 nucleotides is optimal). In the exceptions it is as little as 15
nucleotides and as large as 20 nucleotides. However, the actual sequence of this intervening
DNA is unimportant. The distance represents a single turn of the helix, thereby providing
appropriate separation for simultaneous interaction of factor with the two motifs (-10 and -35
sequences).
The promoters with the -10 and -35 sequences as 5TATAAT and 5TTGACA respectively are
called standard promoters. These are recognized by 70 subunit of RNA Pol. Individual
promoters usually differ from the consensus at one or more positions. A typical bacterial
promoter is represented in Fig. 5.
5TTGACA...16-18 bp....TATAAT.Purine.....3
- 35 region - 10 region +1
[Recognition Domain] [Pribnow Box] [Start site]
[Unwinding Domain]
(v) Some other conserved sequences of 70 promoters: 70 promoters of some genes have
additional consensus sequences such as:
(b) Extended-10 element: Another class of 70 promoters lack a -35 region and instead has
a so called extended -10 element. This comprises a standard -10 region with an additional short
sequence element at its upstream end. These elements are recognized by the region of RNA
Pol. Extra contacts made between polymerase and this additional sequence element compensate
for the absence of a -35 region, for eg. gal genes of E. coli use such a promoter.
15
Various combinations of bacterial promoter elements are shown in Fig. 6.
~ 17 bp
- 35 - 10 +1
UP - 35 - 10 +1
element
- 10 +1
Extended-10
Mutation of a single base in either -10 or -35 sequences can alter promoter activity. Mutations in
the -35 region usually affect initial binding of RNA Pol and mutations in the -10 region usually
affect the melting reaction.
16
front of and behind the polymerase, respectively. Inappropriate superhelicity in the DNA being
transcribed halts transcription. Quite possibly the torsional tension in the DNA generated by
negative superhelicity behind the transcription bubble is required to help drive the transcriptional
process, whereas too much such tension prevents the opening and maintenance of the
transcription bubble.
The dependence of a promoter on supercoiling is determined by its sequence. This would predict
that some promoters have sequences that are easier to melt and are therefore less dependent on
supercoiling, while others have more difficult sequences and have a greater need to be
supercoiled. An alternative is that the location of the promoter might be important if different
regions of the bacterial chromosome have different degrees of supercoiling.
A typical promoter relies on its -35 and -10 sequences to be recognized by RNA Pol, but one or
the other of these sequences can be absent from some (exceptional) promoters. In at least some
of these cases, RNA Pol alone cannot recognize the promoter, and the reaction also requires
ancillary proteins, which overcome the deficiency in intrinsic interaction between RNA Pol and
the promoter.
17
(3) Overall process of prokaryotic transcription
The process of transcription can be divided in three steps namely, Initiation, Elongation and
Termination (Fig. 8).
RNA Pol
Promoter recognition
+1 DNA
Promoter
Promoter binding
(closed complex)
Initiation
Promoter melting
(open complex)
Initial transcription
RNA
Elongation after
abortive initiations &
promoter clearance Elongation
RNA Elongation
Termination,
release of RNA &
RNA Pol
Termination
DNA
+ +
RNA Pol
RNA
18
(A) Initiation
Transcription begins with the insertion of the first ribonucleotide (usually a purine). The end of
initiation is signified by promoter clearance, where the RNA Pol moves ahead (along the DNA
template) from the promoter site without dissociating, freeing the promoter for further initiation
events. Promoter clearance occurs only if the open promoter complex is stable and this usually
follows a number of abortive initiations where short transcripts are generated. This is a general
property of RNA Pol and appears to be required for denovo strand synthesis. Initiation is usually
the rate-limiting step in transcription and is the primary level of gene regulation in both
prokaryotes and eukaryotes.
The pathway of transcription initiation consists of two major parts, binding and initiation, and
each part has multiple steps, which are summarized below. RNA Pol recognizes the promoter
region, leads to local unwinding at the site bound by RNA Pol and causes some abortive
initiations. During this phase the RNA Pol remains stationary at the site of binding (i.e.
promoter) and its conformation remains essentially the same. During this phase, the first ~8-9
nucleotides are added. The initiation phase ends when the enzyme succeeds in extending the
RNA chain and clears the promoter. Regulatory proteins that bind to specific sequences near
promoter sites and interact with RNA polymerase also markedly influence the frequency of
transcription of many genes.
The initiating reaction is simply the coupling of two NTPs in the reaction given below:
Initiation in transcription is further divided into discrete phases of DNA binding and initiation of
RNA synthesis, which are described below:
(i) Template and promoter recognition and formation of closed binary complex: The
holoenzyme-promoter reaction starts by forming a closed binary complex. Closed means that
the DNA remains duplex. Initially, the subunit of the enzyme RNA Pol ( subunit is involved
in promoter selection) binds loosely and reversibly to duplex DNA and searches for the promoter
sequence. This is the closed binary complex or closed promoter complex or closed promoter-
polymerase complex. In E. coli, RNA Pol binding occurs within a region stretching ~50 bp
before the transcription start site to ~20 bp beyond it. Because the formation of closed binary
complex is reversible, it is usually described by equilibrium constant (KB). There is a wide range
in values of the equilibrium constant for forming the closed complex. Formation of the closed
complex is readily reversible and RNA Pol can as easily dissociate from the promoter as make
the transition to the open complex.
(ii) Formation of open binary complex or isomerization: The transition from the closed
promoter complex (in which DNA is double helical) to the open promoter complex (in which
19
a DNA segment is unwound) is an essential event in transcription. In the bacterial enzyme
bearing 70, this transition often termed isomerization, does not require energy derived from
ATP hydrolysis and is instead the result of a spontaneous conformational change in the DNA-
enzyme complex to a more energetically favorable form. Isomerization is essentially irreversible
and once complete, typically guarantees that transcription will subsequently initiate (though
regulation can still be imposed after this point in some cases).
Although RNA Pol can search for promoter sites when bound to double helical DNA, a segment
of the helix must be unwound before synthesis can begin. A region of duplex DNA must be
unpaired so that the nucleotides on one of its strands become accessible for base pairing with
incoming ribonucleotides. When the correct sequence is recognized by RNA Pol holoenzyme,
the DNA at the promoter site is intact and locally unwound (DNA melting). The series of events
leading to formation of an open complex is called tight binding. Due to tight binding, the
interaction between the RNA Pol holoenzyme and DNA becomes irreversible and the closed
complex undergoes a transition to open complex. Thus, the closed complex is converted into an
open complex by melting of a short region of DNA within the sequence bound by the enzyme.
This characterizes the open binary complex, open promoter complex or open promoter-
polymerase complex. Here, DNA strands separate locally over a distance of ~17 bp of DNA
(from within the -10 region to position +2 or +3), which corresponds to 1.6 turns of the B-DNA
helix. This opening frees the template strand to be available for base pairing with
ribonucleotides. Unwinding increases the negative supercoiling of DNA. Negative supercoiling
of circular DNA favors transcription of genes because it facilitates unwinding.
For strong promoters, conversion into an open binary complex is irreversible, so this reaction is
described by a rate constant (k2). This reaction is fast. factor is involved in the DNA melting
reaction.
(iii) Formation of ternary complex (unstable) and Abortive initiations: The next step is to
incorporate the first two nucleotides and then catalyze a phosphodiester bond formation between
them. This generates a ternary complex that contains RNA as well as DNA and enzyme. The
ribonucleotides are aligned on the template strand and joined together. The initiating
ribonucleotide is usually a purine (A or G). RNA Pol makes specific interactions with the
initiating purine, holding it rigidly in correct orientation to allow chemical attack on incoming
NTP. The requirement for such specific interactions between the enzyme and the initiating NTP
probably explains why most transcripts start with same nucleotide. The interactions are specific
for that nucleotide (or A) and thus only chains beginning with A are held in a manner suitable
for efficient initiation. It is believed that the interactions are provided by various parts of the
RNA Pol holoenzyme, including part of . Consistent with this, in experiments using an RNA
Pol containing a 70 derivative lacking this part of , initiation requires much higher than normal
concentrations of initiating it. The region containing RNA Pol, DNA and nascent RNA is called
a transcription bubble (called so because it contains a locally melted bubble of DNA) or
transcription complex. Formation of ternary complex is described by the rate constant ki; this is
even faster than the rate constant k2.
Further nucleotides can be added without any enzyme movement to generate an RNA chain of
up to 9 bases. Thus, RNA Pol forms an unstable ternary complex comprising of DNA-RNA
hybrid helix (i.e. DNA template and short RNA) and RNA Pol holoenzyme. This RNA-DNA
20
helix is thus ~8 bp long, which corresponds to about one complete turn of the double helix. The
RNA-DNA hybrid also rotate each time a nucleotide is added so that 3-OH end of RNA stays at
the catalytic site of RNA Pol. Incorporation of first 9-10 ribonucleotides is a rather inefficient
process. After each base is added, there is a certain probability that the enzyme will release the
chain. At this stage the enzyme often releases short transcripts (each have less than ~10
ribonucleotides) and then starts synthesis of RNA again. Abortive initiations (i.e. synthesis of
short RNA) probably involve synthesizing an RNA chain that fills the active site. If the RNA is
released, the initiation is aborted and must start again. A cycle of abortive initiation usually
occurs to generate a series of very short oligonucleotides.
Initiation is accomplished when the enzyme manages to move along the template to the next
region of the DNA into the active site. The occurrence of a cycle of abortive initiations before
the enzyme moves to the next phase is a general property of RNA Pol and appears to be required
for denovo strand synthesis.
(iv) Formation of ternary complex (stable) and Promoter clearance: Once an RNA Pol
holoenzyme succeeds in synthesizing a nascent RNA chain of ~9-10 bases, i.e. when initiation
succeeds, is no longer necessary. The enzyme makes the transition to the elongation ternary
complex of core polymerase, DNA and nascent RNA. This involves a conformational change in
polymerase that help it to grip the template more firmly converting the ternary complex to the
elongation form. This conformational change is followed by movement of the RNA Pol away
from the promoter site, without dissociating, thereby freeing the promoter (i.e. promoter
clearance) for further initiation events. Thus, promoter clearance occurs only if the open complex
is stable (stable ternary complex) and usually follows a number of abortive initiations. This
signifies the end of the initiation phase and the transition to the elongation phase leading to the
extension of RNA chain beyond 10 bases. The efficiency of promoter clearance is modulated by
the nature of the first fifty or so bases in the transcribed region. The minimum value of the
promoter clearance time (i.e. the time taken by the RNA Pol to leave the promoter so that
another RNA Pol can initiate) is 1-2 sec, within which the RNA Pol establishes the maximum
frequency of initiation as <1 event per sec.
(B) Elongation
When the first ~9 nucleotides have been added, the transcribed template strand is scrunched in
the active site. The active site can hold a transcript of 6-9 nucleotides. The transcription bubble
moves along DNA and the RNA chain is extended in the 5 3 direction (Fig. 9).
As the RNA Pol holoenzyme clears the initiation site and enters the elongation phase of
transcription, the subunit may either dissociate or remains associated with the core enzyme. It
was discovered that factor is released after initiation. However, this may not be strictly true.
Direct measurements of elongating RNA Pol complexes show that ~70% of them retain factor.
Such a third of elongating polymerases lack , the original conclusion is certainly correct that it
is not necessary for elongation. In those cases where it remains associated with core enzyme, the
nature of the association has almost certainly changed.
The core enzyme without binds more strongly to the DNA template. From this point onwards,
the core enzyme undertakes RNA chain elongation beyond 10 bases. The core enzyme then
21
moves along the template strand, opening (or unwinding) the DNA helix ahead of the site of
polymerization (i.e. front or leading edge) so as to expose a new segment of the template in
single stranded condition. During this time, subsequent ribonucleotides are added to the 3 end of
the growing RNA chain. Elongation involves the movement of the transcription bubble (a
distance of 170 / second, corresponding to a rate of elongation of ~50 nucleotides / sec) by a
disruption of DNA structure, in which the template strand of the transiently unwound region is
paired with the nascent RNA at the growing point. As in the initiation phase, about 17 bp of
DNA are unwound at a time throughout the elongation phase. It has been found that the RNA-
DNA hybrid and the unwound region of DNA stay rather constant as RNA Pol moves along the
DNA template, thereby indicating that the unwound DNA reseals (or rewinds) at the same rate
behind (i.e. rear or trailing edge) the RNA Pol. The RNA-DNA hybrid must also rotate each time
a nucleotide is added so that the 3-OH end of the RNA stays at the catalytic site. When the RNA
chain extends to 15-20 bases, the enzyme makes a further transition to form the complex that
undertakes elongation and now it covers 30-40 bp (depending on the stage in elongation cycle).
Unwound DNA
(17 bp opened)
Double helical DNA
Coding strand
Template strand
RNA -DNA hybrid
Rewinding Unwinding
3' elongation site
Nascent RNA
RNA polymerase
5'ppp
(C) Termination
Termination involves following steps:
& Cessation of formation of phosphodiester bonds
& Dissociation of RNA-DNA hybrid
& Rewinding of melted region of DNA
& Release of RNA Pol from DNA
Sequences called terminators trigger the elongating polymerase to dissociate from the DNA
and release the RNA chain it has made. E. coli has at least two classes of termination signals, one
class relies on a protein factor called (rho) and the other is -independent. Both dependent
and independent terminators respond to a functioning signal that lies within the newly
22
synthesized RNA rather than in template DNA. In both types of termination, pausing by RNA
Pol is important in order to allow time for actual termination event to occur.
(i) -independent (intrinsic) termination: Many terminators require a hairpin to form in the
secondary structure of the RNA being transcribed. This indicates that termination depends on the
RNA product and is not determined simply by scrutiny of DNA sequence during transcription.
UUUUUUU UUUUUUU
Coding strand
GC rich
23
(ii) -dependent termination: As already discussed, RNA Pol needs no help to terminate
transcription at a hairpin followed by several U residues. At other sites, however, termination
requires the participation of additional factor. This discovery was prompted by the observation
that some RNA molecules synthesized in vitro by RNA Pol acting alone are longer than those
made in vivo. The missing factor, a protein that caused the correct termination, was isolated and
named rho (), also called rho transcription terminator factor. Additional information about the
action of the rho was obtained by adding this termination factor to an incubation mixture at
various times after the initiation of RNA synthesis. RNAs with sedimentation coefficients of
10S, 13S and 17S were obtained when rho was added at initiation, a few seconds after initiation
and 2 minutes after initiation, respectively. If no rho was added, transcription yielded a 23S RNA
product. It is evident that the template contains at least three termination sites that respond to rho
(yielding 10S, 13S and 17S RNA) and one termination site that does not (yielding 23S RNA).
Thus, specific termination at a site producing 23S RNA can occur in the absence of rho.
However, detects additional termination signals that are not recognized by RNA Pol alone
(Fig. 11).
(Rho) sites
(Indicated by arrows)
The -dependent terminators lack the sequence of repeated A residues in the template strand
but usually include a CA rich sequence called a rut (rho utilization) element. Optimally these
sites consist of stretches of about 40 nucleotides that do not fold into a secondary structure i.e.
they remain largely single stranded. They are also C rich. The second level of specificity is that
rho fails to bind any transcript that is being translated i.e. transcript bound to ribonucleotides. In
bacteria transcription and translation are coupled tightly, translation initiates on growing RNA
transcript as soon as they start exiting polymerase, while they are still being synthesized. Thus,
rho typically terminates only those transcripts still being transcribed beyond the end of a gene or
operon.
24
is a homo-hexameric terminator protein with a size of ~275 kD (each subunit size is 419
residues). The X-ray structure of protein reveal that the six monomers form an open ring. The
ring is not flat. The sixth subunit is further down in the plane of the page than the first. Its first
and sixth subunits are separated by a gap of 12 and the helical pitch (rise along the helix axis)
between them is 45 . The RNA transcript on which acts, is believed to bind along the bottom
of each subunit and then thread through the middle of the ring. Each subunit consists of two
domains that can be separated by proteolysis: Its N-terminal domain or RNA binding domain
binds single stranded polynucleotides and its C-terminal domain or ATP-hydrolysis domain,
which is homologous to the and subunits of the F1-ATPase, binds an NTP. It hydrolyzes
ATP in the presence of single stranded RNA, probably through recognition of a specific
structural feature rather than a consensus sequence. The RNA, which is only partially visible in
the structure, binds to the so-called primary RNA binding sites on the N-terminal domains that
face the interior of the helix and to the so-called secondary RNA binding sites on the C-terminal
domain that have been implicated in mRNA translocation and unwinding.
The protein has an ATP-dependent RNA-DNA helicase activity. It binds to nascent RNA at
specific binding sites or recognition sequences (Fig. 12). It then uses its RNA-dependent ATPase
activity to provide the energy to translocate along the RNA in the 5 3 direction to a
sequence that is rich in C and poor in G residues preceding the actual termination site. C is by far
the most common base (41%) and G is the least common base (14%). As a general rule, the
efficiency of -dependent terminators increases with the length of C-rich or G-poor region. Rho
hydrolyzes ATP in presence of single stranded RNA, probably through recognition of a specific
structural feature rather than a consensus sequence.
Coding strand
Template strand
RNA -DNA hybrid
ATP + H 2O
A
RN
RNA polymerase
ADP + Pi
Rho protein
5'ppp
25
Proteins, in addition to , mediate and modulate termination. For eg. Nus A protein enables RNA
Pol in E. coli to recognize a characteristic class of termination sites. In E. coli, specialized
termination signals called attenuators are regulated to meet the nutritional needs of the cell.
Transcription in eukaryotes
Robert Roeder and William Rutter discovered that eukaryotic transcription machinery is much
more complex as compared to that of prokaryotes, as large number of polypeptides are
associated with the eukaryotic transcription machinery. The mechanism of eukaryotic
transcription is, however, similar to that in prokaryotes.
Unlike in bacteria, eukaryotic genome is packaged into the chromatin structure (nucleosomal
structure) and therefore is inaccessible to the transcription machinery. Prior to transcription of a
specific gene, its chromatin structure is modified to become more accessible to the transcription
apparatus. The two most well understood mechanisms of chromatin modifications are:
(i) Specific modifying complexes: Many eukaryotic gene activator proteins modify chromatin
structures by recruiting histone acetyltransferases.
(ii) Nucleosome remodeling by chromatin remodeling complexes.
Acetylation and remodeling prepares the gene promoter to initiation assembly of RNA Pol, other
accessory proteins and gene specific transcription factors to initiate the transcription process.
In transcription, only some regions of the genome are transcribed and the regions chosen vary in
different cells or in the same cell at different times i.e. one to several thousand transcripts can be
made of a given region in a single cell.
26
among the polymerases lies in their responses to the fungal toxin -amanitin, a cyclic
octapeptide that contains several modified amino acids. The activities of different RNA Pols are
distinguished by their different sensitivities to the toxin. Properties of different eukaryotic RNA
polymerases have been summarized in Table 4.
In addition to these three different nuclear RNA Pols, eukaryotic cells contain separate
polymerases in mitochondria and chloroplast. These small (~100 kD) single subunit RNA Pols,
which resemble those encoded by certain bacteriophages are much simpler than the nuclear RNA
Pols, although they catalyze the same reaction.
27
synthesis, the pre-rRNA transcripts are packed along the rRNA genes and may be visualized in
electron microscope as Christmas tree structures. In these structures, the RNA transcripts are
densely packed along the DNA and stick out perpendicularly from the DNA. The 45S
pretranscript is cleaved to give one copy each of 28, 18, 5.8S rRNAs, which are 5000, 2000 and
160 nucleotides long respectively.
(i) Structure: RNA Pol II is somewhat larger than and has several subunits that have no
counterpart in Thermus aquaticus / bacterial RNA Pol. Pol II is a huge enzyme with a molecular
mass of up to 600 kD. The enzyme contains two nonidentical large (>120 kD) subunits
comprising ~65% of its mass that are homologs of the prokaryotic RNA Pol and subunits
and up to 12 additional small (<50 kD) subunits, two of which are homologs of prokaryotic
RNA Pol subunits and one of which is a homolog of prokaryotic RNA Pol subunit. Of these
small subunits, five are identical in all three eukaryotic RNA Pols and two others (the RNA Pol
homologs) are identical in RNA Pol I and III. Thus, 10 of the 12 RNA Pol II subunits are
either identical or closely similar to subunits of RNA Pol I and III. Moreover, the sequences of
these subunits are highly conserved (~50% identical) across species from yeast to humans (and
to a less extent between eukaryotes and bacteria). In fact, in all ten cases tested, a human RNA
Pol II subunit could replace its counterpart in yeast without loss of cell viability.
Roger Kornberg determined the X-Ray crystallographic structure of RNA Pol II in yeast. Overall
the shape of yeast RNA Pol II enzyme resembles a crab claw, which is similar to bacterial Taq
RNA Pol. The yeast enzyme has positions and core folds similar to their homologous subunits in
bacterial RNA Pol. The two pincers of the crab claw (RNA Pol II) are made up predominantly
of the RPB1 and RPB2. The active site, which is made up of regions from both these subunits, is
found at the base of the pincers within a region called the active center cleft. The highly
conserved helical segment of RBP1 called bridge bridges the two pincers forming the enzymes
cleft. This helix is straight in all X-Ray structures of RNA Pol II yet determined, but it is bent in
that of Taq RNA Pol. A massive (~59 kD) portion of RPB1 and RPB2 named the clamp swings
down over the DNA to trap it in the cleft. A portion of RPB2 called the wall directs the
template strand out of the cleft in a ~90 turn. A loop called the rudder extends from the clamp.
There are various channels that allow DNA, RNA and ribonucleotides into and out of the
enzymes active center cleft.
(a) RBP1 having C-terminal Domain (CTD) and RBP2: RBP1 is the largest subunit and
exhibits a high degree of homology to the subunit of a bacterial RNA Pol. It contains the
28
active site of the enzyme RNA Pol II.
It has an unusual feature, a long carboxyl terminal domain (CTD) called tail. The tail consists
of many highly conserved repeats of a heptad amino acid sequence Tyr-Ser-Pro-Thr-Ser-Pro-
Ser (YSPTSPS). There are 27 repeats in the yeast enzyme (18 exactly matching the consensus),
52 (21 exact) in the mouse enzyme and 53 in human enzyme. This CTD is separated from the
main body of the enzyme by an unstructured linker sequence. These repeats are essential for
viability. The CTD sequence may be subjected to phosphorylation at Ser and Tyr. Five of the 7
residues in these particularly hydrophilic repeats bear OH groups and at least 50 of them,
predominantly those on Ser residues, are subject to reversible phosphorylation by CTD kinases
and CTD phosphatases. In vitro studies have shown that RNA Pol II initiates transcription only
when the CTD is unphosphorylated. Phosphorylation of CTD occurs during transcription
elongation as RNA Pol leaves the promoter. Charge-charge repulsions between nearby phosphate
groups probably cause a highly phosphorylated CTD to project as far as 500 from the globular
portion of RNA Pol II. The phosphorylated CTD provides the binding sites for numerous
auxillary factors that have essential roles in the transcription process. The CTD has been shown
to be an important target for differential activation of transcription elongation. Such so-called
tail is absent in bacterial enzyme.
(b) RBP3 and RBP11: These two subunits show some structural homology to the bacterial
subunits.
(c) Rbp4 and Rbp7: Genetic studies have demonstrated that some of the Pol II specific
subunits are dispensable. Thus, two subunits, Rbp4 and Rbp7, are not essential for activity and
are present in RNA Pol II in less than stoichiometric amounts. Rbp7 has a 102-residue segment
that is 30% identical to a portion of 70 of E. coli. These subunits are absent in yeast
(Saccharomyces cerevisiae) RNA Pol II.
Although Pol II has the smallest number of subunits, it transcribes the largest and most diverse
array of promoters. A number of other proteins, which are not part of the Pol II complex, are
used by RNA Pol II as subsidiary proteins, thereby contributing to its functional diversity.
(ii) Nucleotide addition and RNA Pol II translocation: RNA Pol II binds two Mg++ ions at
its active site in the vicinity of 5 conserved acidic residues, which suggests that RNA Pol
catalyze RNA elongation via a two-metal ion catalytic mechanism for nucleotide addition similar
to that proposed for all types of polymerase. As is the case with Taq RNA Pol, the surface of the
RNA Pol II is almost entirely negatively charged except for the DNA binding cleft and the
region about the active site, which are positively charged.
29
RNA Pol III transcribes the 5S rRNA component of large ribosomal subunit. This is the only
rRNA subunit to be transcribed separately. Like the other rRNA genes, which are transcribed by
RNA Pol I, the 5S rRNA genes are tandemly arranged in a gene cluster. In humans, there is a
single cluster of around 2000 genes. Less is known about signals and ancillary factors involved
in termination for eukaryotic polymerases. Each class of polymerase uses a different mechanism.
Genetic studies have demonstrated that in contrast to Pol I and Pol II, all subunits of Pol III are
essential. Table 5 summarizes various prokaryotic and eukaryotic RNA polymerase subunits.
(i) Core promoter element: It refers to minimal set of sequence element required for accurate
transcription initiation. It spans positions -31 to +6. It includes transcription start site and hence
overlaps the transcribed region. It has a short conserved sequence element, a short AT rich
sequence around start point called initiator sequence (Inr). This sequence is essential for
transcription (Fig. 13).
30
(ii) Upstream control element (UCE) or Upstream promoter element (UPE): It is located
between residues -187 and -107 bp upstream from the start site (Fig. 13). The element is GC rich.
The UCEs are ~85% identical and ~50-80 bp long. The sequence is bound by specific
transcription factors, which then recruit RNA Pol I to the transcription start site. The UCE is thus
responsible for an increase in efficiency of transcription by 10- to 100-fold compared to that
from the core element alone.
(i) Core promoter (Basal elements): The eukaryotic core promoter refers to the minimal
set of sequence elements required for accurate transcription initiation by the Pol II
machinery. A core promoter is ~40 nucleotides long, extending either upstream or
downstream of the transcription start site. Four elements found in Pol II core promoters
are TATA box, BRE, Inr and DPE. Typically, a promoter includes only two or three of
these four elements. Many Pol II promoters have a few sequence features in common,
including a TATA box (eukaryotic consensus sequence TATAAA) near base pair -30 and
an Inr sequence (initiator) near the RNA start site at +1. However, few Pol II promoters
lack a TATA box or a consensus Inr element or both. The sequence elements summarized
here are more variable among the Pol II promoters of eukaryotes than among E. coli
promoters.
(a) TATA box or Hogness box: An A/T rich sequence (TATAA/TAA/T) called TATA
box is located -25 to -30 bp upstream of the transcription start site. The consensus
sequence (homologous segment, TATA box) is T82A97T93A85A63/T37A83A50/T37 and
the subscripts indicate the % occurrence of corresponding base. This TATA box
resembles the -10 region of prokaryotic promoters (TATAAT), although they differ in
their locations relative to the transcription start site (-27 vs -10). This conserved region
was first discovered by Goldberg Hogness and is also called (GH) box or Hogness box.
31
The TATA box is the major assembly point for the proteins of the preinitiation
complexes of Pol II. The deletion of the TATA box does not necessarily eliminate
transcription; rather it generates heterogeneities in the transcriptional start site,
thereby indicating that the TATA box participates in selecting this site.
(b) TFIIB recognition element (BRE): Immediate upstream of the TATA box is the
TFIIB recognition element, which is targeted by TFIIB. The consensus sequence is:
G/CG/CG/ACGCCC.
(c) Initiator sequence (Inr): The initiator element (Inr) is located around the
transcription start site (+1). The consensus sequence of Inr is:
C/TC/TANT/AC/TC/T. Many initiator elements have a C at position -1 and an A at
+1. The DNA is unwound at the initiator sequence and the transcription start site is
usually within or very near this sequence.
(ii) Upstream regulatory elements (URE): The basal elements primarily determine the
location of the start point, but also sponsor initiation only at a rather low level. Thus, the
32
basal elements are not sufficient for strong promoter activity. Additional elements called
upstream regulatory elements located between -40 and -200 bp (present on template
strand) upstream of transcription start site are important in order to increase the low
activity of basal promoters. These sequences are important in regulating Pol II promoters
and vary greatly in type and number. They serve as binding sites for a wide variety of
proteins that affect the activity of Pol II. These elements are found in many genes, which
vary widely in their levels of expression in different tissues. The examples are:
(a) GC box: The structural genes expressed in all tissues, eg. House keeping genes or
constitutive genes (genes that are continuously expressed rather than regulated), have
one or more copies of the sequence 5-GGGCGG-3 located upstream from their
transcription start sites. They are located at about -90 position, however, the positions
of these upstream sequences vary from one promoter to another. Often multiple
copies are present in the promoter and they occur in either orientation. The structural
genes that are selectively expressed in one or a few types of cells often lack these GC
rich sequences.
(b) CAAT box: The gene region extending between -50 and -110 also contains promoter
elements. They can occur in either orientation. For instance, many eukaryotic
structural genes, including those encoding the various globins, have a conserved
sequence of consensus 5-GGNCAATCT-3 (the CAAT box) located between about
-70 and -90 whose alteration greatly reduces the transcription rate of the gene. Globin
genes have, in addition, a conserved CACCC box upstream from CCAAT box that
has also been implicated in transcriptional initiation.
The CAAT and GC boxes in eukaryotes differ from that of the similar regions in
prokaryotes. The positions of these upstream sequences vary from one promoter to
another, in contrast with the quite constant location of the -35 region in prokaryotes.
The CAAT box and the GC box can be effective when present on the template strand,
unlike the -35 region, which must be present on the coding strand. These differences
between prokaryotes and eukaryotes reflect fundamentally different mechanisms for
the recognition of cis acting elements. The -10 and -35 sequences in prokaryotic
promoters correspond to binding sites for RNA Pol and its associated factor. In
contrast, the TATA, CAAT, GC boxes and other cis acting elements in eukaryotic
promoters are recognized by proteins other than RNA Pol itself.
33
(C) RNA Pol III promoter
The promoters recognized by RNA Pol III are well characterized. Interestingly, some of the
sequences required for the regulated initiation of Pol III are located within the gene itself,
whereas others are in more conventional locations upstream of the RNA start site (Fig. 15).
(i) 5S rRNA genes: The genes for 5S rRNA are organized in a tandem cluster. The
promoters of genes transcribed by RNA Pol III can be located entirely within the
transcribed region (i.e. internal) of the gene. These sequences are therefore conserved
sequences in both 5S rRNA and DNA.
Donald Brown established this through the construction of a series of deletion mutants of
a Xenopus borealis 5S RNA gene. The 5S rRNA promoter contains the following
conserved sequences, which are depicted in Fig. 15.
(a) C box: It is located 81-99 bases downstream from the transcription start site.
(b) A box: It is located at around 50-65 bases downstream of the transcription start site.
The sequence of the Box A is: 5-TGGCNNAGTGG-3.
Box A
Box C
Conserved sequences: TGGCNNAGTGG
(ii) tRNA genes: RNA Pol III promoters of tRNA genes contain two highly conserved
sequences within the DNA encoding the tRNA (internal transcription control regions),
namely Box A and Box B. These regions lie downstream from the transcription start site
i.e. after the transcription start site and within the transcription unit (Fig. 16).
(a) Box A: It is located around 50-65 bases downstream of transcription start site. The
sequence of the Box A is: 5-TGGCNNAGTGG-3.
(b) Box B: It is located downstream of transcription start site. The sequence of Box B is:
5-GGTTCGANNCC-3.
As both of these sequences lie within the gene, these are conserved in both tRNA and
DNA. Thus, these sequences also encode important sequences in the tRNA itself,
called the D-loop and the TC loop.
34
Transcription start site
(+1)
+55
Box A Box B
Conserved sequences: TGGCNNAGTGG GGTTCGANNCC
(iii) Alternative RNA Pol III promoters: A number of RNA Pol III promoters are regulated
by upstream as well as downstream promoter sequences.
Further studies have shown, however, that the promoters of other RNA Pol III-
transcribed genes lie entirely upstream of their start sites. These upstream sites also bind
transcription factors that recruit RNA Pol III. These promoters require only upstream
sequences including the TATA box and other sequences found in RNA Pol II promoters.
Some promoters such as the U6 small nuclear RNA (U6 snRNA) and small RNA genes
from the Epstein-Barr virus use only regulatory sequences upstream from their
transcription start sites. The coding region of the U6 snRNA has a characteristic A box.
However, this sequence is not required for transcription. The U6 snRNA upstream
sequence contains sequences typical of RNA Pol II promoters, including a TATA box at
bases -30 to -23. These promoters also share several other upstream transcription factor
binding sequences with many URNA genes, which are transcribed by RNA Pol II. These
observations suggest that common transcription factors can regulate both RNA Pol II and
RNA Pol III genes.
(3) Enhancers
Promoters are not the only types of cis acting sequences. Transcription from many eukaryotic
promoters can be stimulated by control elements that are located many thousands of base pairs
away from the transcription start site. This was first observed in the genome of the DNA virus
SV40. A sequence of around 100 bp from SV40 DNA can significantly increase transcription
from a basal promoter even when it is placed far upstream or downstream. Such distal sequences
are called enhancers. The enhancer elements thus constitute the distal part of the promoter and
can be located either upstream or downstream of the transcription start site. Enhancers are
common in eukaryotes and rare in prokaryotes (exception: present with 54 factor).
35
total activity of the enhancer. They consist of sets of elements, similar to upstream
promoter, but density of sequences is more i.e. these are more compactly organized as
compared to upstream promoter.
& Like promoters, they are cis-acting regulatory elements.
& They are able to function over long distance of more than 1000 bp whether from an
upstream or downstream position relative to start site. They are therefore also called long-
range regulatory elements. In contrast, promoters are small range elements.
& They can modulate (activate) transcription of the cognate genes when placed in either
orientation with respect to linked genes. They are active even when placed in reverse
orientation. They thus contain bidirectional elements and are orientation-independent (Fig.
17).
5
E P
Transcription
Transcription
& Interestingly, the positions of enhancers relative to promoters are not fixed and they can
vary substantially. They can modulate (activate) transcription of the cognate genes even
when moved away from its original location either upstream or downstream of the coding
sequence. Thus, in natural genomes, enhancers can be located within genes also. They are
thus position-independent.
& Enhancers contain the same sequence elements that are found at promoter. The density of
sequence components is greater in the enhancer than in the promoter.
& They may be ubiquitous or tissue / cell type-specific. They may be active in only certain
cells. Enhancers play key roles in regulating gene expression in a specific tissue or
developmental stage.
& A given enhancer binds regulators at a given time and place. Alternative enhancers bind
different groups of regulators and control expression of the same gene at different times
and places in response to different signals.
& They exert strong activation of transcription of a linked gene from the correct start site.
They exert preferential stimulation of the closest of two tandem promoters. These DNA
sequences, although not promoter themselves, can enormously increase the effectiveness of
promoters.
36
& Enhancer sequences are targeted by a number of sequence-specific DNA binding proteins
called gene specific transcription factors and activators. The assembly or clustered group of
activators at enhancer region is called enhancons. It is believed that enhancers can regulate
transcription of a specific gene from a distant location by bending or looping out of the
intervening DNA sequence (interstitial DNA between promoter and enhancer regions) so
that the transcription factors bound to it can directly interact with the RNA Pol II
machinery bound at promoter and influences its action.
& Activation at a distance raises a problem. When an activator binds at an enhancer, there
may be several genes within its range, yet a given enhancer typically regulates only one
gene. Other regulatory sequences called insulators or boundary elements are found between
enhancers and some promoters. Insulators block activation of the promoter by activators
bound at the enhancer. These elements, although still poorly understood, ensure activators
do not work indiscriminately.
& Elements analogous to enhancers in yeast are called Upstream Activator Sequences
(UASs). It, however, works only upstream of the promoter and cannot function when
located downstream.
The general transcription factors collectively perform the functions similar to that performed by
in bacterial transcription. However, these factors do not show any significant sequence
homology to factor. They have been shown to assemble on basal promoters in a specific order
and they may be subject to multiple levels of regulation. They help polymerase to bind to the
promoter.
The binding of a transcription factor to its cognate DNA sequence enables the RNA Pol to locate
the proper initiation site. Such highly complex assembly of RNA Pols and associated proteins is
absent in prokaryotes. The binding of the TFs to the promoter leads to the melting of DNA
(comparable to the transition from closed to open complex in bacteria). They also help
polymerase escape from the promoter and embark on elongation phase.
37
The general transcription factors, TFIIs, required at every Pol II promoter are highly conserved
in all eukaryotes. The properties of various GTFs required by RNA Pols are summarized in
Table 6.
38
Many RNA Pol II promoters, which do not contain a TATA box, have an initiator element
overlapping their start site. It seems that at these promoters, TBP is recruited to the promoter by
a further DNA binding protein, which binds to the initiator element. TBP then recruits the other
transcription factors and RNA Pol in manner similar to that, which occurs in TATA box
promoters.
Similarly, transcription factors, TFI and TFIII, are required to stimulate the transcription by RNA
Pol I and III, respectively.
One of the characteristics of the gene specific transcription factors is that they possess distinct
structural motifs essential for DNA recognition and transactivation function. They are often
classified on the basis of such structural features such as homeodomain, helix turn helix, helix
loop helix and Zn finger. Quite often two gene specific transcription factors belonging to the
same structural family dimerize and bind to the target sequence in a bipartite manner. One such
eg. is the transcription factor AP-1 which is a dimer of Jun (39 kD) and Fos (65 kD) proteins.
39
They belong to the leucine zipper family and target the sequence TGACTCA. Gene specific
transcription factors are often targeted by various signal-transducing kinases such as MAP
kinase, which phosphorylates them to induce their activities. Some gene specific transcription
factors are also localized in the cytoplasm in an inactive form and upon activation are
translocated to the nucleus for activity. For eg. transcription factor NK kappa B remains bound to
an inhibitory protein called I kappa B which retains it in the cytoplasm. Upon receiving
appropriate signal, I kappa B is ubiquintylated and degraded, resulting in the release of NK
kappa B that is then translocated to the nucleus.
(i) Assembly and Initiation: The eukaryotic transcription involves the assembly of RNA Pol II
and transcription factors at a promoter. The step-by-step pathway described below leads to active
transcription in vitro. In the cell, many of the proteins may be present in larger, preassembled
complexes, simplifying the pathways for assembly on promoters. Two major points of
differences in the initiation phase of transcription in prokaryotes and eukaryotes are: melting
requires ATP hydrolysis and secondly promoter escape occurs after phosphorylation of
polymerase.
The formation of preinitiation complex or basal transcription apparatus thus involves following
steps:
$ Binding of TBP: In the first step, TBP, a component of TFIID transcription factor, binds
TATA box 105 times as tightly to the TATA box as to noncognate sequences. Both DPE
and initiator sequences are also targeted by TFIID. TBP bound to TATA box is the center
point of the initiation complex. This binding induces large conformational changes in the
bound DNA. When TBP binds to TATA box, it distorts the DNA using a -sheet inserted
into the minor groove. This distortion generates a binding site for TFIIB, which in turn
provides a platform for the recruitment of the Pol II and TFIIF. This complex is distinctly
asymmetric. The asymmetry is crucial for specifying a unique start site and ensuring that
transcription proceeds unidirectionally.
40
Table 8: Elongation factors involved in eukaryotic transcription
41
p y y
Promoter
DNA
TATA +1
box TBP TFIID
TFIIA TFIIB
Promoter recognition,
binding, melting
TFIIF
& clearance
Initiation
RNA Pol II
with CTD tail
TFIIE TFIIH
RNA Pol II
movement Elongation
P P
P
P
P
P
RNA synthesis
Termination
DNA
+
RNA + RNA Pol II
with CTD tail
42
$ Binding of TFIIA: In the next step, TFIIA binds directly to TBP and stabilizes its
interaction with DNA and thereby enhances transcription. TFIIA binding, although not
always essential, can be important at non-consensus promoters where TBP binding is
relatively weak.
$ Binding of TFIIB: The formation of a closed complex begins when the TBP binds to the
factor TFIIB, which also binds to DNA on either side of TBP.
$ Recruitment of TFIIF-Pol II: The TFIIB-TBP complex is next bound by another complex
consisting of TFIIF and Pol II. TFIIF helps target Pol II to its promoters, both by
interacting with TFIIB and by reducing the binding of the polymerase to nonspecific sites
on the DNA.
$ Binding of TFIIE and TFIIH: Following recruitment of Pol II-TFIIF, two more
transcription factors viz TFIIE and TFIIH are recruited to complete the assembly of the
closed preinitiation complex. They bind upstream of Pol II.
TFIIH is a complex factor having multiple enzymatic activities including ATPase, helicase,
kinase and DNA repair activities. The DNA helicase activity of TFIIH promotes the unwinding
of DNA near the RNA start site (i.e. Inr), thereby creating an open complex. This process
requires the hydrolysis of ATP. The helicase activity is required for unwinding the DNA and the
DNA repair activity presumably couples transcription with DNA repair to avoid transcription of
any faulty gene. TFIIH has an additional function during the initiation phase. A kinase activity in
one of its subunits phosphorylates Pol II at many places in the CTD. Several other protein
kinases, including CDK9, which is part of the complex p-TEFb, also phosphorylate the CTD. In
the preinitiation complex, TFIIE stimulates the kinase activity of TFIIH resulting in the
hyperphosphorylation of the carboxyl terminal domain (CTD) of Pol II. Sometimes in the
formation of this complex, the carboxyl terminal domain of the polymerase is phosphorylated on
the serine and threonine residues and then the Pol II escapes the promoter to begin transcription.
The importance of the CTD is highlighted by the finding that yeast cell containing mutant Pol II
with fewer than 10 repeats is not viable. Phosphorylation of CTD causes a conformational
change in the overall complex that weakens the interaction of Pol II with TBP, thereby aiding in
initiation of transcription. Most of the factors are released before the Pol leaves the promoter and
can than participate in another round of initiation.
Transcription regulatory proteins called activators help recruit polymerase to the promoter,
stabilizing its binding there. This recruitment is mediated through interactions between DNA
bound activators and parts of the transcription machinery. Often the interaction is with the CTD
tail of the large polymerase subunit through one surface, while presenting other surfaces for
interaction with DNA-bound activators. This explains the need for mediator to achieve
significant transcription in vivo. Despite this central role in transcriptional activation, deletion of
individual subunits of mediator often leads to loss of expression of only a small subset of genes,
different for each subunit (it is made up of many subunits). This result likely reflects the fact that
different activators are believed to interact with different mediator subunits to bring polymerase
43
to different genes. In addition, mediator aids initiation by regulating the CTD kinase in TFIIH.
The need of nucleosome modifiers and remodellers also differs at different promoters or even at
the same promoter under different circumstances. When and where required, these complexes are
also recruited by the DNA-bound activators. Nucleosome modifying enzymes include histone
acetyltransferase, histone deacetylase and histone methylase.
(ii) Elongation: Once RNA Pol has initiated transcription, it shifts into the elongation phase.
This transition involves the Pol II enzyme shedding most of its initiation factors, for eg. general
transcription factors and mediator. During synthesis of the initial 60-70 nucleotides of RNA,
TFIIE is released. Subsequently, TFIIH is released. However, TFIIF remains associated with Pol
II throughout elongation. Pol II then enters the elongation phase of transcription. In the place of
transcription factors and mediator, another set of factors is recruited. This new set of factors
stimulates Pol II elongation and RNA proof reading. These proteins that greatly enhance the
activity of the Pol II are called elongation factors. Examples include TFIIS, pTEFb, hSPT5,
Elongin and ELL. The elongation factors suppress pausing or arrest of transcription by the Pol II-
TFIIF complex and also coordinate interactions between protein complexes involved in
posttranscriptional processing of mRNAs. The enzymes involved in all these processes are, like
several of the initiation factors, recruited to the C-terminal tail of large subunit of Pol II, the
CTD. In this case, however, the factors favor the phosphorylated form of the CTD. Thus,
phosphorylation of the CTD leads to an exchange of initiation factors for those factors required
for elongation and RNA processing. As is evident from the crystal structure of yeast Pol II, the
polymerase CTD lies directly adjacent to the channel through which the newly synthesized RNA
exits the enzyme. This, together with its length (it can extend some 800 from the body of
enzyme) allows the tail to bind several components of the elongation and processing machinery
and to deliver them to the emerging RNA. Some other elongation factors are required for RNA
processing.
44
(iii) Termination and release: Once the RNA transcript is completed, transcription is
terminated. The enzyme RNA Pol II does not terminate immediately. Rather, it continues to
move along the template, generating a second RNA molecule that can become as long as several
hundred nucleotides before terminating i.e. termination of mRNA synthesis is combined with
polyadenylation (hence the details of termination step are described after polyadenylation). Pol II
is dephosphorylated, dissociated from the template, recycled and is then ready to initiate another
transcript. In the process, new RNA is released, which may be degraded without ever leaving the
nucleus.
The synthesis of rRNA (5.8S, 18S and 28S) involves transcription factors and complexes, for eg.
Upstream binding factor (UBF) and eukaryotic transcription complex called Selectivity factor
(SL-1) (similar complex in different species are called TIF-IB, Rib1). UBF is a specific DNA
binding protein, which binds to UCE. It greatly stimulates the transcription rate. In its absence, a
low rate of basal transcription is seen. SL-1 contains four subunits: one TBP (TATA binding
protein) and three TAFIs (TBP associated factors for RNA Pol I).
The process of transcription of rRNA (5.8S, 18S and 28S) is outlined below and depicted in Fig.
20.
$ UBF binding: UBF binds to the sequence in the upstream part of core element, called
upstream control element (UCE) of RNA Pol I promoter. Other UBF also binds to the
upstream region of the core element (core promoter). The sequences in the two UBF
binding sites have no obvious similarity. One molecule of the UBF is thought to bind to
each sequence element. UBF-UBF binds by protein-protein interaction causing intervening
DNA to form loop between the two binding sites. (Some are of the view that a single UBF
binds to two different sites, viz UCE and the upstream part of the core element).
45
+1
UCE Core
UBF
UBF
UBF
TAF Is
Is
SL1
TAF
TBP
TAF Is
SL1
UBF
UBF
RNA Pol I
SL1
UBF
UBF
$ Selectivity factor binding: Selectivity factor (SL-1) binds to and stabilizes the UBF-DNA
46
complex. It interacts with the free downstream part of the core element. Binding of UBF
increases transcription initiation activity by SL-1. Acanthamoeba has a simple
transcription control system. This has a single control element and a single factor TIF-1,
which are required for RNA Pol I binding and initiation at the rRNA promoter.
$ RNA Pol I binding: SL-1 binding allows RNA Pol I to bind the complex and initiate
transcription and is essential for rRNA transcription.
+1
A Box B Box
TFIIIC
A Box B Box
TFIIIC
TBP
B''
TFIIIB
BRF
TBP
B''
A Box B Box
BRF
TFIIIC
TBP
B''
A Box B Box
BRF
TFIIIC
47
$ TFIIIC binding: TFIIIC binds to both Box A and Box B of the tRNA promoter.
$ TFIIIB binding: TFIIIB binds TFIIIC-DNA complex and interacts with DNA upstream
from TFIIIC binding site (TFIIIB binds 50 bp upstream from A box).
$ RNA Pol III binding: TFIIIB helps in recruitment of RNA Pol III. The enzyme RNA Pol
III then initiates transcription, presumably displacing TFIIIC from DNA template as it
goes.
Termination of transcription occurs without accessory factors. A cluster of dA residues is often
sufficient for termination and the termination efficiency depends on surrounding sequence. An
example of an efficient termination signal in somatic 5S rRNA genes of Xenopus borealis is 5-
GCAAAAGC-3.
(ii) 5S rRNA: The promoter of tRNA genes has two consensus sequences downstream
transcription start site, namely Box A and Box C, as described in earlier section. The process
of transcription of 5S rRNA genes involves the transcription factors TFIIIA, TFIIIC and TFIIIB.
TFIIIA is assembly factor for positioning TFIIIB at right location. TFIIIB is true initiation factor
for Pol III. TFIIIB has no sequence specificity and therefore its binding site appears to be
determined by the position of the TFIIIC binding to DNA.
The process of transcription involves following steps.
$ TFIIIA binding: TFIIIA binds strongly to Box C promoter sequence.
$ TFIIIC binding: TFIIIC then binds to TFIIIA-DNA complex interacting also with Box
A sequence.
$ TFIIIB binding: Once TFIIIC has bound, TFIIIB can interact with the complex.
$ RNA Pol III binding: TFIIIB then recruits RNA Pol III to initiate transcription.
In eukaryotes, RNA Pol II synthesizes mRNA as longer precursors (pre-mRNA), the population
of different pre-mRNAs being called heterogeneous nuclear RNA (hnRNA). Once transcribed,
eukaryotic precursor mRNA has to be processed in various ways before being exported from the
nucleus where it can be translated (Table 7).
48
Transcription initiation at eukaryotic 5S rRNA promoter:
+1
A Box C Box
TFIIIA
+1
A Box C Box
TFIIIA
TFIIIC
A Box B Box
TFIIIC TFIIIA
TBP
B''
TFIIIB
BRF
TBP
B''
A Box B Box
BRF
TFIIIC TFIIIA
TBP
B''
A Box B Box
BRF
TFIIIC TFIIIA
1. End modification It occurs during the synthesis of eukaryotic and archael mRNAs. This
involves addition of nucleotides to the 5 or 3 ends of the primary
transcripts or their cleavage products. Such events do not occur in case of
prokaryotes. These include:
(i) Capping of 5 end of mRNA
(ii) Polyadenylation of 3 end of mRNA
2. Splicing It is the removal of introns (non-coding sequences in the genes) from the
precursor RNAs (i.e. eukaryotic mRNAs, and some eukaryotic rRNAs
and tRNAs). It leads to physical change in the length of the transcript.
3. Cutting events These involve cutting of primary transcripts (or removal of nucleotides)
of rRNA and tRNA with endonucleases or exonuclease to produce
mature transcripts in both prokaryotes and eukaryotes. It leads to physical
change in the length of the transcript.
4. Chemical These modifications are made within the rRNAs, tRNAs and mRNAs.
modifications The rRNAs and tRNAs of all organisms are modified by addition of new
chemical groups. These groups are added on either the base or the sugar
moiety of specific nucleotides in RNAs. It occurs to a much lesser extent
with pre-mRNA in eukaryotes. Equivalent events in archaea are poorly
understood. Chemical modification of mRNA called RNA editing is seen
in a diverse group of eukaryotes.
49
Strikingly, there is an overlap in proteins involved in elongation and those required for RNA
processing. As mentioned earlier, the transcription elongation factor, pTEFb activates another
elongation factor hSPT5, which helps in recruitment and stimulation of the 5 capping enzyme.
Another example is the pTEFb-induced recruitment of the elongation factor TAT-SF1, which
further recruits the components of the splicing machinery. Thus, it seems that transcription and
RNA processing are interconnected, presumably to ensure their proper coordination, allowing
cotranscriptional processing of primary transcript.
(a) Capping of 5 end: Capping involves the addition of a modified G base (m7G) to the 5 end
of mRNA. Specifically it is a methylated G and it is joined to the RNA transcript by an unusual
5 5 linkage involving three phosphates. The cap is added in reverse polarity (5 to 5), thus
acting as a barrier to 5 exonuclease attack, but it also promotes splicing, transport and
translation.
In eukaryotes, each intron in nuclear pre-mRNA, also called GU-AG intron, is characterized by a
signature sequence in both 3 and 5 ends that are recognized by the spliceosome. The borders
between introns and exons are thus marked by specific nucleotide sequences within the pre-
mRNAs. Thus, for splicing, an intron should have a 5-GU, an AG-3 and a branch point
sequence. Thus, the sequences within the RNA delineate where splicing will occur (Scheme 3).
50
Exon 1 Exon 2 Exon 3 Exon 4
Pre-mRNA 5 3
Intron 1 Intron 2 Intron 3
Splicing
mRNA
3 splice site
Branch site
5 splice site
51
Various snRNPs involved in the process and their sizes and functions are summarized in the
Table 10.
Table 10: Small nuclear ribonucleoprotein particles (snRNPs) in the splicing of nuclear
mRNA precursors
52
3'
3' Exon 2
3'
Exon 2 G
G P
3' splice site G
P P G 3'OH
A
G G 3'OH G
A G
A A Exon 1
C A
C C
N N N
Yn Yn
5' Yn
5'
53
stage of spliceosome assembly. Presumably this strategy lessens the chance of aberrant splicing.
Linking the formation of the active site to the successful completion of the earlier steps in
spliceosome assembly makes it highly likely that the active site is available only at legitimate
splice sites.
$ Joining of exons and release of mature mRNA: The juxtaposition of the 5 splice
site pre-mRNA and the branch site facilitates the first transesterification reaction. The
second reaction, between the 5 and 3 splice sites, is aided by the U5 snRNP, which
helps to bring the two exons together. The final steps involve release of mRNA
product and the snRNPs. The snRNPs are initially bound to the lariat, but get recycled
after rapid degradation of that piece of RNA.
Components of the splicing machinery arrive or leave the complex at each step due to changes
associated with structural rearrangements necessary for the splicing reaction to proceed. There is
evidence to suggest that some of the components shown do not arrive or leave precisely when
indicated in the figure, they may, for eg., remain present but weaken their association with the
complex rather than dissociating completely. It is also not possible to be sure of the order of
some changes shown, particularly the two steps involving changes in U6 pairing: when it takes
over from U1 snRNP at the 5 splice site, compared to when it takes over from U4 snRNP in
binding U2 snRNP. Despite these uncertainties, the critical involvement of different components
of the machinery at different stages of the splicing reaction and the general dynamic nature of the
spliceosome, are as shown in Fig. 24.
Some eukaryotic pre-mRNAs do not fall into the GU-AG intron category. They have different
consensus sequences at their splice sites. These are AU-AC introns, which have been found in
approximately 20 genes in organisms as diverse as humans, plants and Drosophila. These introns
require U11 / U12 snRNPs.
Eukaryotic mature mRNA transcripts have more nucleotides beyond 3 end. Indeed, the
nucleotide preceding the poly (A) is not the last nucleotide to be transcribed.
Polyadenylation was once looked on as a post transcriptional event but it is now recognized
that the process is an inherent part of the mechanism for termination of transcription by RNA Pol
II.
54
Spliceosomal mediated splicing reaction:
5' A 3'
U1 snRNP
BBP U2AF65 35
+
5' A 3'
U2 snRNP
BBP
A
U5 snRNP
Tri snRNP U2AF65 35
U6 snRNP
particle
U4 snRNP
U1 snRNP
U4 snRNP
A
Lariat form of
intron
5' 3'
Spliced exons
55
by a GU rich region. Both the poly (A) signal sequence and the GU rich region are binding sites
for multisubunit protein complexes.
& Cleavage and polyadenylation specificity factor (CPSF) binds poly (A) signal sequence.
& Cleavage stimulation factor (CstF) binds GU rich region.
Ongoing transcription
5 3
Cleavage site
AAAAAAAAAAAAA Poly A
5 Pol
Besides, Poly (A) polymerase and at least two other protein factors must associate with bound
CPSF and CstF in order for polyadenylation to occur.
After cleavage by the endonuclease, template-independent RNA polymerase called poly (A)
polymerase adds about 250 adenylate residues to the 3 end of the transcript. Virtually, all
eukaryotic mRNAs have a series of up to 250 adenosines at their 3 ends. This enzyme uses ATP
as a precursor and adds A residues using the same chemistry as RNA polymerase. These A
residues are not specified by DNA sequence, i.e. these A(s) are added without a template. Thus,
the long tail of A(s) is found in the RNA but not the DNA. It is not clear what determines the
length of the poly A tail, but that process involves other proteins that bind specifically to the poly
A sequence (described later). The polymerase does not act at the extreme 3 end of the transcript,
but at an internal site, which is cleaved to create a new 3 end to which the poly (A) tail is added.
The reaction catalyzed is as follows:
The additional factors required include polyadenylate-binding protein (PABP). These PABPs
catalyze the following functions:
& To help the polymerase to add the adenosines
& Possibly influences the length of the poly (A) tail that is synthesized
& Appears to play a role in maintenance of the tail after synthesis
& Also play a role in translation
In yeast, the signal sequences in the transcript are slightly different, but the protein complexes
are similar to those in mammals and polyadenylation is thought to occur by more or less the
56
same mechanism.
CPSF is known to interact with TFIID and is recruited into the polymerase complex during the
initiation stage. By riding along the template with RNA Pol II, CPSF is able to bind to the poly
(A) signal sequence as soon as it is transcribed, initiating the polyadenylation reaction. Both
CPSF and CstF contact with the CTD of the polymerase. It has been suggested that the nature of
these contacts changes when the poly (A) signal sequence is located and that this change alters
the properties of the elongation complex so that termination becomes favored over continued
RNA synthesis. As a result, transcription stops soon after the poly (A) signal sequence has been
transcribed. The details of the termination step linking cleavage and polyadenylation to
termination of transcription are outlined in Fig. 26.
It is noteworthy that the long tail of A(s) is unique to transcripts made by RNA Pol II, a feature
that allows experimental isolation of protein coding mRNAs by affinity chromatography. The
mature mRNA is then transported from the nucleus.
It is not known what links polyadenylation to termination, but it is clear that the polyadenylation
signal is required for termination (interestingly, RNA cleavage is not). Two basic models have
been proposed to explain the link between polyadenylation and termination:
& First that the transfer of 3 processing enzymes from the polymerase CTD tail to the RNA
triggers a conformational change in the polymerase that reduces processivity of the
enzyme, leading to spontaneous termination soon afterward.
& The second model proposes that the absence of a 5 cap on the second RNA molecule is
sensed by the polymerase, which, as a result, recognizes the transcript as improper and
terminates. The absence of the cap reflects the absence of the capping enzymes on the CTD
at this stage of the transcription cycle (these enzymes are loaded onto the CTD at the point
where initiation turns to elongation and are then displaced in favor of the splicing
machinery).
The role of poly (A) tail is still not firmly established despite much effort. Even though
polyadenylation can be identified as an inherent part of the termination process, this does not
explain the necessity to add a poly (A) tail to the transcript. Evidence that it enhances translation
efficiency and the stability of mRNA is accumulating. The poly (A) tail on pre-mRNA is thought
to help stabilize the molecule since a poly (A)-binding protein binds to it, which should act to
resist 3 exonuclease action. In addition, the poly (A) tail may help in the translation of the
mature mRNA in the cytoplasm. Blocking the synthesis of poly (A) tail by exposure to 3-
deoxyadenosine (cordycepin) does not interfere with the synthesis of primary transcript. The
mRNA devoid of a poly (A) tail can be transported out of the nucleus. However, an mRNA
molecule devoid of a poly (A) tail is usually a much less effective template for protein synthesis
than is one with a poly (A) tail. Thus, poly (A) tail has a role in initiation of translation. It is
further supported by research showing that poly (A) polymerase is repressed during those
periods of the cell cycle when relatively little protein synthesis occurs. Indeed some mRNAs are
stored in an unadenylated form and receive the poly (A) tail only when translation is imminent.
The half-life of an mRNA molecule may also be determined in part by the rate of degradation of
its poly (A) tail. Histone pre-mRNAs do not get polyadenylated, but are cleaved at a special
sequence to generate their mature 3 ends.
57
Pre-mRNA
5..AAUAAA..CAAAAAAAAAAAAAA3
RNA Polyadenylation
signal sequence (AAUAAA)
DNA
CstF
CPS Cleavage proteins attaches to signal sequence
F
RNA CPSF CstF
DNA
Fig. 26: Termination signal and the link between polyadenylation and termination of transcription by
RNA Pol II
58
(d) Pre-mRNA methylation: The final modification or processing event that many pre-
mRNA undergo is specific methylation of certain bases. In vertebrates, the most common
methylation event is on the N6 position of A residues, particularly when these A residues occur
in the sequence 5-RRACX-3, where X is rarely G. Up to 0.1% of pre-mRNA A residues are
methylated and the methylations seem to be largely conserved in the mature mRNA, though their
function is unknown.
(a) Alternative poly (A) sites: Some pre-mRNAs contain more than one poly (A) site and these
may be used under different circumstances (eg. in different cell types) to generate different
mature mRNAs. The cell or organism has a choice of which one to use. It is possible that if the
upstream site is used then sequences that control mRNA stability or location are removed in the
portion that is cleaved off. Thus mature mRNAs with the same coding region, but differing
stabilities or locations, could be used in the same cell at a frequency that reflects their relative
efficiencies (strengths) and the cell would contain both types of mRNA. The efficiency of a poly
(A) site may reflect how well it matches the consensus sequences. In other situations, one cell
may exclusively use one poly (A) site, while a different cell uses another. The most likely
explanation is that in one cell the stronger site is used by default, but in the other cell a factor is
present that activates the weaker site so it is used exclusively, or that prevents the stronger site
from being used. In some cases, the use of alternative poly (A) sites causes different patterns of
splicing to occur. In some cases, factors will bind near to and activate or repress a particular site.
(b) Alternative promoters: The use of different promoters in different cell types and at different
developmental stages lead to the generation of different mature mRNAs.
(c) Alternative splicing: In many cases, the generation of different mature mRNAs from a
particular type of gene transcript can occur by varying the use of 5- and 3-splice sites. This is
called alternative splicing. Hence, a single transcript can be spliced in multiple ways resulting in
a number of protein coding sequences.
59
Intron 1 Intron 2
Exon 1 Exon 2 Exon 3
5' 3' DNA
Transcription
Intron 1 Intron 2
Exon 1 Exon 2 Exon 3 Primary
5' 3' RNA transcript
Splicing
Intron 1
Exon 1Exon 2Exon 3 Exon 1 Exon 3 Exon 1 Exon 2 Exon 3 Exon 1 Exon 2Exon 3 Exon 1 Exon 2
+ Spliced mRNA
Exon 1 Exon 3
By this strategy, a gene can give rise to more than one polypeptide product with partially
overlapping sequences and is more common in higher eukaryotes. Some pre-mRNAs can be
spliced in more than one way, generating alternative mRNAs. It is estimated that 30% of the
genes in human genome are spliced in alternative ways to generate more than one protein per
gene. Some examples of alternatively spliced pre-mRNA are: troponin, tropomyosin, myosin,
actin, fibronectin, fibrinogen, nerve growth factor, aldolase, alcohol dehydrogenase, calcitonin,
SV40 T-antigen, Drosophila sxl, tra and dsx pre-mRNA for sex determination etc.
As these splicing events occur differently in different cell types, it is likely that cell type-specific
factors are responsible for activating or repressing the use of processing sites near to where they
bind. Thus, the application of SR proteins (serine-arginine rich) and hnRNPs to guide alternative
splicing mechanism has been suggested.
(d) RNA editing: An unusual form of RNA processing in which the sequence of the primary
transcript is altered is called RNA editing. RNA editing, like RNA splicing, is a process in which
sequence of RNA changes after or during its transcription i.e. at the level of mRNA. In this form
of RNA processing, the nucleotide sequence of the primary transcript is altered by changing /
inserting / deleting residues at specific points along the molecule. Thus, the protein produced
upon translation is different from that predicted from the gene sequence i.e. coding sequence in
RNA differs from the sequence of DNA from which it was transcribed. This is thus a method for
increasing protein diversity, similar to alternative splicing. RNA editing occurs in two different
situations, with different causes.
60
The mammalian genome contains a single (interrupted) apolipoprotein B gene whose sequence is
identical in all the genes, with a coding region of 4563 codons. This gene is transcribed into an
mRNA that is translated into a protein of 512 kD, called apo B100, representing the full coding
sequence in the liver. A shorter form of protein, called apo B48 of ~250 kD size is synthesized in
intestine. This protein consists of the N-terminal half of the full-length protein. It is translated
from an mRNA whose sequence is identical with that of liver except for a change (deamination
by cytidine deaminase) from C to U at codon 2153 in 26th exon. This substitution changes the
codon CAA for glutamine into the ochre UAA for termination. The two proteins though
translated from the same gene have different functions. Apo B48, which is formed only in small
intestine functions in chylomicrons to transport triacylglycerols from the intestine to the liver. On
the other hand, Apo B100, which is formed only in liver functions in VLDL, IDL and LDL to
transport cholesterol from liver to peripheral tissues.
Another example is provided by glutamate receptors in rat brain. Editing at one position changes
a glutamine codon in DNA into a codon for arginine in RNA; the change affects the conductivity
of the channel and therefore has an important effect on controlling ion flow through the
neurotransmitter. At another position in the receptor, an arginine codon is converted to a glycine
codon.
Besides the above-mentioned types, two other terms are associated with RNA editing. These
include:
& Insertional editing: This type of editing occurs with some RNAs, for eg., the
paramyxovirus P gene, which gives rise to at least two different proteins because of the
insertion of the Gs at specific positions in the mRNA. Guide RNAs do not specify these
insertions, instead they are added by the RNA Pol as the mRNA is being synthesized.
& Polyadenylation editing: This type of editing is seen in many animal mitochondrial
mRNAs. Five of the mRNAs transcribed from the human mitochondrial genome end with
just a U or UA, rather than with one of the three termination codons. Polyadenylation
converts the terminal U or UA into UAAAA.. and so several features that appear to
have evolved in order to make vertebrate mitochondrial genome as small as possible.
61
(B) Post transcriptional processing of tRNA and rRNA (maturation of tRNA and rRNA)
tRNA and rRNA are synthesized as precursors and they undergo cleavage by nuclease i.e.
undergo processing and are not translated. In E. coli, three kinds of rRNA molecules and a tRNA
molecule are excised from a single primary RNA transcript that also contains spacer regions.
Other transcripts contain arrays of several kinds of tRNA or of several copies of the same tRNA.
Mature rRNAs and tRNAs are generated by cleavage and other modifications of nascent RNA
chains.
In eukaryotes, rRNA and tRNA molecules, in contrast with mRNAs and small RNAs that
participate in splicing, do not have caps. Because rRNAs and tRNAs are non-coding, chemical
modifications to their nucleotides affect only the structural features and possibly, catalytic
activities of the molecules.
Similar to prokaryotes, eukaryotic pre-tRNA contains extra nucleotides at 5 and 3 ends and
also modified bases. Besides, some eukaryotic pre-tRNAs and archael transcripts also contain
introns, which are different from pre-mRNA introns. Such introns are rare in bacteria. The
primary transcript forms a secondary structure with characteristic stems and loops, which allow
endonucleases to recognize and cleave off the 5 leader and the two 3 nucleotides. Unlike
prokaryotes, 5-CCA-3 at the 3 end of the mature tRNAs are added by separate enzymatic
reactions and not encoded by the genes.
(b) Cleavage of a 5-leader sequence: The primary transcript of tRNA contains extra
nucleotides at the 5 end in both prokaryotes and eukaryotes. These nucleotides are cleaved by an
endonuclease, RNase P. This generates the correct 5 terminus of all tRNA molecules in E. coli.
This enzyme is a ribozyme containing a catalytically active RNA molecule, capable of catalyzing
a chemical reaction in the absence of protein. It is therefore a very simple ribonucleoprotein
(RNP). RNase P enzymes are found in both prokaryotes and eukaryotes, being located in the
nucleus of the latter. They are therefore small nuclear RNPs (snRNPs). In E. coli, the
endonuclease is composed of a 377 nucleotide RNA and a small basic protein of 13.7 kD. The in
vitro RNase P ribozyme reaction requires a higher Mg++ concentration than occurs in vivo, so the
62
protein component probably helps to catalyze the reaction in cells.
(c) Attachment of CCA at 3 end: tRNA nucleotidyl transferase enzyme then adds CCA at
the 3 end in eukaryotes. tRNA nucleotidyl transferase is unusual enzyme that binds three
ribonucleotide triphosphate precursors in separate active sites and catalyzes the formation of
phosphodiester bonds to produce CCA (3) sequence. So this sequence is not DNA or RNA
dependent. The template is the binding site of enzyme. A major difference between prokaryotes
and eukaryotes is that, in the former, the 5-CCA-3 at the 3 end of the mature tRNAs is
encoded by the genes. In eukaryotic nuclear-encoded tRNAs, this is not the case.
(d) Chemical modifications of several bases and ribose units: Another processing event is
the modification of bases and ribose units of tRNAs in both prokaryotes and eukaryotes. Such
unusual bases are found in all tRNA molecules. They are formed by the enzymic modification of
a standard ribonucleotide in a tRNA precursor. Modification involves methylation, acetylation,
deamination, reduction, rearrangement, attachment of isopentenyl or SH group of bases. Many of
these modifications were first identified in tRNAs, within which approximately one in ten
nucleotides become altered. For eg., uridylate residues are modified after transcription to form
ribothymidylate and pseudouridylate. These modifications generate diversity, allowing greater
structural and functional versatility. These modifications are thought to mediate the recognition
of individual tRNAs by the enzymes that attach amino acids to these molecules and to increase
the range of the interactions that can occur between tRNAs and codons during translation,
enabling a single tRNA to recognize more than one codon.
Most of these modifications are carried out directly on an existing nucleotide within the
transcript but two modified nucleotides, quenosine and wyosine are put in place by cutting out an
entire nucleotide and replacing it with the modified version.
Different pre-tRNAs are processed in a similar way, but the base modifications are unique to
each particular tRNA type.
(e) Removal of introns: Some eukaryotic pre-tRNAs and archael transcripts also contain
introns. In eukaryotes and archaea, therefore the next step in tRNA processing is the removal of
the intron, which occurs by endonucleolytic cleavage at each end of the intron followed by
ligation of the half molecules of tRNA. The introns of yeast pre-tRNA can be processed in
vertebrates and therefore the eukaryotic tRNA processing machinery seems to have been highly
conserved during evolution. Fig. 28 shows various processing events of pre-tRNA in E. coli.
63
pre-tRNA processing in E. coli:
RNase D
RNase
D
3'
A
5' C
RNase P C Endonuclease
(RNase E / F)
D loop T loop
Variable arm
Anticodon loop
In many eukaryotes, the precursor rRNA contains one copy of the 18S coding region and one
copy each of the 5.8S and 28S coding regions, which together are the equivalent of the 23S
rRNA in prokaryotes (Fig. 30). The eukaryotic 5S rRNA is transcribed by RNA Pol III from
unlinked genes to give a 121-nucleotide transcript, which undergoes little or no processing.
64
those parts of rRNAs thought to be most critical for the activity of these molecules in ribosomes.
Modified nucleotides might, for eg., be involved in rRNA catalyzed reactions such as synthesis
of peptide bonds.
After the binding of proteins, modifications such as base and sugar (usually adenosine)
methylations take place, using S-adenosyl methionine (SAM) as methylating agent. In contrast to
the modifications made to bacterial rRNAs, which are carried out by enzymes that directly
recognize the sequences and / or structures of the regions of RNA containing the nucleotides to
be modified, the methylation in eukaryotes requires small nucleolar RNPs (snoRNPs). However,
the bacterial rRNAs are less heavily modified than eukaryotes ones. The snoRNAs are 70-100
nucleotides in length and are located in nucleolus. The snoRNAs contain segments of 10-21
nucleotides that are precisely complementary to segments of mature rRNAs containing O2
methylation sites. These snoRNA sequences are located between the conserved sequence motifs
known as box C (RUGAUGA) and box D (CUGA), which are respectively located on the 5 and
3 sides of the complementary segments. The site for methylation in rRNA is exactly the 5th
position upstream of box D. Methylation is mediated by a complex of nucleolar proteins
including methyltransferase. For conversion of uridine to pseudouridine, snoRNAs having
conserved motifs i.e., box H / ACA, are involved. These snoRNAs contain the sequence motifs
ACANNN at the 3 end and box H (conserved sequence ANANNA) at its 5 end. The conserved
motifs of such snoRNAs form a specific base paired interaction with its target site containing U,
which is then recognized by the modifying enzyme.
The chemical modifications occurring during maturation of rRNA and tRNA are listed in Table
11.
(b) Cleavage of precursor rRNA by nucleases: The cleavage includes two steps: The
primary cleavage event, which is mainly carried out by RNase III, releases precursors of the 5S,
16S and 23S molecules. The secondary cleavage step further cleaves at the 5- and 3-ends of
each of these precursors by RNases M5, M16 and M23, respectively, leading to release of mature
rRNA (Fig. 31).
For mammalian pre-rRNA, the 47S precursor (13500 nucleotide) undergoes a number of
cleavages, firstly in the external transcribed spacers (ETSs) 1 and 2. Cleavages in the internal
transcribed spacers (ITSs) then release the 20S pre-rRNA from the 32S pre-rRNA (Fig. 32). Both
of these precursors must be trimmed further and the 5.8S region must base pair to the 28S rRNA
before the mature molecules are produced. As with prokaryotic pre-rRNA, the precursor folds
and complexes with proteins as it is being transcribed. This takes place in the nucleolus.
65
Table 11: Examples of chemical modifications of nucleotides during rRNA and tRNA
processing
Precursors
RNase
M16 M16 M16 M16 M5M5
Mature rRNAs
16S rRNA 23S rRNA 5S rRNA
66
Processing of mammalian primary rRNA primary transcript:
Pre-5.8S rRNA
Pre-18S rRNA Pre-28S rRNA 47S pre-rRNA
ETS1 ETS2
primary transcript
ITS1 ITS2
RNase
45S pre-rRNA
primary transcript
RNase
41S pre-rRNA
primary transcript
RNase
20S & 32S pre-rRNA
precursors
RNase
Mature rRNAs
18S 5.8S 28S
rRNA rRNA rRNA
(c) Removal of introns: Some eukaryotic and archael rRNA pretranscripts, for eg.,
Tetrahymena thermophila, contain an intron in the precursor for the largest rRNA. Such introns
in pre-rRNA are extremely rare in bacteria. These pre-rRNAs undergo an unusual form of
processing before it can function. The RNA folds into an enzymatically active form or ribozyme
and splice out the introns. Although this process occurs in vivo in the presence of protein, it has
been shown that the intron can actually excise itself in the test tube in the complete absence of
protein.
Inhibitors of transcription
There are two types of inhibitors of transcription:
& RNA Pol binding inhibitors
& DNA specific inhibitors
Two related antibiotics, rifampicin B, which is produced by Streptomyces mediterranei and its
semisynthetic derivative rifampicin specifically inhibit transcription by prokaryotic, but not
eukaryotic RNA polymerases. This selectivity and their high potency (bacterial RNA Pol is 50%
inhibited by 2 X 10-8 M rifamycin) have made them medically useful bactericidal agents against
Gram-positive bacteria and TB. Rifamycins inhibit neither the binding of RNA Pol to the
promoter nor the formation of the first phosphodiester bond, but they prevent further chain
67
elongation. The inactivated RNA Pol remains bound to the promoter, thereby blocking its
initiation by uninhibited enzymes.
HO
H3C O
CH3 O
OH
OH OH
CH3
CH3
CH3
O NH
H3C
H3CO N
N
N
O
O OH
O
CH3
Rifampicin
CH3 CH3
R ifam ycin B:
HO
H 3C O
CH3 O
OH
OH OH
CH3
CH3
O NH
H 3C
H 3CO
H
O
_
O OCH OO
2
O
CH3
Rifamycin B
Amanitin: OH
CH OH
H3C 2
O
H
HN N H
N
HO
O
O
O
N
O
NH CH3
O
S
H2N OH
N
CH3
O
HN C
O
NH O
NH
O
Amanitin
68
The poisonous mushroom Amanita phalloides contains a series of unusual bicyclic octapeptides
such as -amanatin, which disrupts mRNA formation in animal cells by blocking Pol II and at
higher concentrations, Pol III. Neither Pol I nor bacterial RNA Pol is sensitive to -amanatin nor
is the RNA Pol II of Amanita phalloides itself.
Several other intercalation agents, including ethidium bromide and proflavin also inhibit nucleic
acid synthesis, presumably by similar mechanisms. Acridine inhibits RNA synthesis in a fashion
similar to Actinomycin D i.e. by intercalation and deformation of DNA. Ethidium bromide is a
DNA specific dye, which intercalates between the DNA and binds preferentially to supercoiled
DNA. Aflatoxin (Fig. 35) obtained from the fungus Aspergillus flavus, inhibits both replication
and transcription. 2-acetyl amino fluorine is a synthetic carcinogen and inhibits both replication
and transcription.
$ Source: Genes of all cellular organisms are made of DNA. The same is true for some viruses,
but for others the genetic material is RNA. Viruses are genetic elements enclosed in protein coats
that can move from one cell to another but are not capable of independent growth. One well-
studied example of an RNA virus is TMV, which infects the leaves of tobacco plants. This virus
consists of single strand of RNA (6930 nucleotides) surrounded by a protein coat of 2130
identical subunits. RNA directed RNA polymerase catalyze the replication of this viral RNA.
Another important class of RNA virus comprises the retroviruses, so called because the genetic
information flows from RNA to DNA rather than from DNA to RNA. This class includes HIV-1
as well as a number of RNA viruses that produce tumors in susceptible animals. Retrovirus
particles contain two copies of single stranded RNA molecule.
69
Ethidium brom ide:
O O
Aflatoxin B1: H N NH
2 2
O
+
N
C H
2 5
O O OCH3
Acridine:
+
(CH ) N N N(CH )
32 H 32
Acridine
Actinom ycin D: O O
=
C C
H3C CH
3
CH CH M ethyl-Val CH CH
H C CH
3 3
N CH N CH
3 3
C=O O=C
CH CH
2 2
H C N Sarcosine N CH
3 3
O O=C O=C O
HC CH
HC HC
H 2C CH
Pro 2
N N
HC CH
C=O O=C
H3C CH
3
CH CH D-Val CH CH
H C CH
3 3
NH NH
C=O C=O
CH CH Thr CH CH
CH HN NH CH
3 3
O=C C=O
N NH
2
O O
CH 3 CH 3
Phenoxazone ring system
Actinomycin D
70
Reverse transcriptases have been isolated and purified from several different RNA tumor viruses;
they have molecular weights ranging from 70000 to 160000. The RNA viruses containing RTs
are known as retroviruses (retro is Latin prefix for backward). Some RTs have also been isolated
from malignant cells of some animals and from human patients with leukemia, which closely
resemble the reverse transcriptase of some RNA tumor viruses. RTs, however, have also been
found in cells of animals and people thought to be normal and not infected by tumor viruses;
they have also been found in wild type E. coli. Telomerase is also a specialized RT.
$ Reaction catalyzed and properties: On infection with RNA viruses, the single stranded
RNA viral genome (~10000 nucleotides) and the enzyme enter the host cell. The RT first
catalyzes the synthesis of a DNA strand complementary to viral RNA, then degrades the RNA
strand of viral RNA-DNA hybrid and replaces it with DNA. The resulting duplex DNA often
becomes incorporated into the genome of eukaryotic host cell. These integrated (and dormant)
viral genes can be activated and transcribed and the gene products, viz viral proteins and the viral
RNA genome itself are packaged as new viruses (Scheme 4).
RNA
RT (RNA dependent RNA Pol)
Packaging of new virus
71
Like many DNA and RNA Pols, RT contains Zn++. Each RT is most active with its own virus,
but each can be used experimentally to make DNA complementary to a variety of RNAs. The
DNA synthesis and RNA degradation activities use separate active sites on the protein. The
reverse transcriptases closely resemble the DNA directed DNA polymerases and RNA
polymerases in that they make DNA in the 5 3 direction, utilize deoxyribonucleotides as
precursors and require both a template and a primer strand, which must have a free 3-OH
terminus. RTs require RNA template for nucleic acid synthesis; however, they can also utilize
DNA templates, but the latter are less effective than RNAs. The RTs are very active on natural
RNA templates, including the very large RNAs present in the viral particles. The DNAs
produced hybridize with their RNA templates. RTs, like RNA Pols, do not have 3 5 proof
reading exonucleases. They generally have error rate of about 1 per 20000 nucleotides added. An
error rate this high is extremely unusual in DNA replication and appears to be a feature of most
enzymes that replicate the genomes of RNA viruses. A consequence is a high mutation rate and
faster rate of viral evolution, which is a factor in the frequent appearance of new strains of
disease causing retroviruses.
$ Functions and applications: The function of RTs in normal cells is not understood; its
resence suggests that the transcription of messages from RNA into DNA is a normal process, for
eg., in synthesis of multiple copies of certain genes. The recognition of RTs has thus opened
some new avenues of research in biochemical genetics. RTs have become important reagents in
the study of DNA-RNA relationships and in DNA cloning techniques. They make possible the
synthesis of DNA complementary to an mRNA template and synthetic DNA prepared in this
manner called complementary DNA (cDNA) can be used to clone cellular genes.
Suggested Readings
1. Berg J.M., Tymoczko J.L., Stryer L., Biochemistry, International Edition, V Edition, W.H. Freeman & Co. New
York.
2. Watson J.D., Baker T.A., Bell S.P., Gann A., Levine M., Losick R., Molecular Biology of the Gene, Fifth
Edition, Pearson Education.
3. Lewin B., Genes VIII, International Edition, Pearson Education International.
4. Glick B.R., Pasternak J.J., Molecular Biotechnology Principles and Applications of Recombinant DNA, III
Edition, ASM Press.
5. Turner P.C., McLennan A.G., Bates A.D., White M.R.H., Instant notes Molecular Biology, II Edition.
6. Das H.K., Textbook of Biotechnology, Wiley Dreamtech.
7. Nelson D.L., Cox M.M., Lehninger Principles of Biochemistry, IV Edition, W.H. Freeman & Co., New York.
8. Voet D., Voet J.G., Biochemistry, John Wiley & Sons.
9. Twymann R.M., Advanced Molecular Biology, Viva Books Pvt. Ltd.
10. Brown T.A., Genomes 2, Wiley Liss Publ.
11. Metzler D.E., Biochemistry: The clinical reactions of living cells, II Edition, Volume 2, Elsevier Publ.
72