De Novo DNA Synthesis Using Single Molecule PCR
De Novo DNA Synthesis Using Single Molecule PCR
De Novo DNA Synthesis Using Single Molecule PCR
17 e107
doi:10.1093/nar/gkn457
*To whom correspondence should be addressed. Tel: +972 8 934 4506; Email: [email protected]
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
benefit from smPCR, as demonstrated by our work reactions: phosphorylation, elongation, PCR and
reported here. The use of smPCR is described in the Lambda exonucleation. They are described in the order
context of our recently introduced DNA synthesis proce- of execution by our protocol.
dure (5), which combines recursive synthesis and error-
correction, and operates as follows. Divide and Conquer Phosphorylation. Phosphorylation of all PCR primers
(D&C), the quintessential recursive problem solving tech- used by the recursive construction protocol is performed
nique, is used in silico to divide the target DNA sequence beforehand simultaneously, according to the following
to be constructed into fragments short enough to be protocol: A total of 300 pmol of 50 DNA termini in a
synthesized by conventional oligo synthesis, albeit with 50 ml reaction containing 70 mM Tris–HCl, 10 mM
errors (15); these oligos are synthesized and are recursively MgCl2, 7 mM dithiothreitol, pH 7.6 at 378C, 1 mM
combined in vitro, forming target DNA molecules with ATP, 10 U T4 polynucleotide kinase (NEB, Ipswich,
roughly the same error rate as the source oligos; error- MA, USA). Incubation is at 378C for 30 min and inactiva-
free parts of these molecules identified by cloning and tion at 658C for 20 min.
sequencing are extracted and used as new, typically
longer and more accurate inputs to another iteration of Overlap extension elongation between two ssDNA
synthesis, as discussed in the results section below. products originating from interaction between the PCR
These included improved primer selection, computational primers (Figure 2a). This often inhibits the amplification
optimization and experimental calibration of template of the single molecule template, typically resulting in either
concentration, real-time diagnosis of faulty reactions, no amplification of the target molecule due to dimer for-
avoiding the cloning of heteroduplexes, bar-coding mole- mation or in amplification of the primer dimer on top of
cules and creating a process with adequate fidelity. the correct PCR product (Figure 2a). Consequently, a
large fraction of the smPCRs performed cannot be used
for synthesis since they did not amplify or have nonspecific
Careful selection of adequate primers is needed to enable
amplification products. This has to be compensated for by
single molecule amplification
performing more smPCRs than are actually needed for
smPCR amplification requires extensive cycling (9–12). synthesis. To solve this problem we designed a special
This often leads to the amplification of nonspecific primer for smPCR consisting of a single sequence
e107 Nucleic Acids Research, 2008, Vol. 36, No. 17 PAGE 4 OF 10
(complementary to both ends of the single molecule tem- template molecules. As these two cases cannot be avoided,
plate), which contains a sequence of cytosine and adenine smPCR is done as a batch of multiple parallel reactions,
DNA bases only (see Supplementary Data for sequence). with the hope that at least some would be true smPCRs,
We reasoned that this should reduce the formation of namely successful PCR reactions that amplify single tem-
PCR products that originate from primer-primer interac- plate molecules. ‘False positive’ smPCRs, which amplify
tions due to the noncomplementary nature of the cytosine multiple template molecules, are identified using sequenc-
and adenine bases. This successfully eliminated nonspeci- ing (Figure 3 and Supplementary Figure 6). The cost
fic amplification resulting from interaction between of sequencing is a major component of synthetic DNA
primers and its inhibiting effect on single molecule ampli- synthesis, and the sequencing of false positives can
fication (Figure 2b), which in turn significantly decreased render smPCR unpractical if their fraction in the total
the total number of PCRs needed to obtain the minimal number of reactions is too high. Standard gel/capillary
number of smPCR clones required for synthesis of error- electrophoreses (CE)/RT-PCR analyses can be used to
free DNA. The sites for the C–A primer (as well as the differentiate no template (negative) reactions from (posi-
random bar coding bases to be discussed later on) at the tive) PCRs with template, however, they cannot be used to
termini of the target molecules are incorporated by either differentiate a true smPCR from false positive reactions.
an a priori PCR (16) or during the synthesis of the mole- Diluting the template to one molecule per well on average
cule as part of the target sequence. maximizes the fraction of true smPCRs out of all the reac-
tions in the batch (Supplementary Figure 3a, blue plot).
Computational optimization and experimental calibration However, it does not maximize the ratio of true smPCRs to
of template DNA concentration false positives (Supplementary Figure 3a, green plot) which
smPCR reactions are generally similar to regular PCR is important for avoiding futile sequencing. For example,
reactions in their basic biochemistry, the difference is aiming for one molecule per well on average leads to >50%
that while PCR typically start the amplification with mul- futile sequencing of false positives (Supplementary
tiple copies of the template molecule, the goal in smPCR is Figure 3a, green plot). Further reducing template concen-
to amplify a single template molecule. This is achieved by tration reduces the extent of futile sequencing of PCRs
diluting a solution with template molecules in a known with multiple template molecules, however, it increases
concentration so that the template aliquot is expected to the extent of futile PCRs due to no template reactions.
have about one molecule. As the dilution is a stochastic Determining the template concentration that would
process, at any such dilution some aliquots would have result in an optimal ratio between true smPCRs, false posi-
no template molecule and some would have multiple tives and no template reactions can only be determined by
PAGE 5 OF 10 Nucleic Acids Research, 2008, Vol. 36, No. 17 e107
associating a cost to performing sequencing and smPCR handling robots). We also found that RT-PCR can be
reactions. We calculated the optimal concentration to be used to accurately determine the dilution required to
0.6 template molecules per smPCR well if an equal cost dilute the template to the calculated optimal concentration
is associated with smPCR and sequencing (Supplementary (0.2 molecules/well). A one-time calibration (see Supple-
Figure 3b), and 0.2 molecules per well if sequencing mentary Data Methods for description) allows the routine
is assigned the more realistic cost of eight times that use of RT-PCR to determine the dilution required before
of smPCR (Supplementary Figure 3c). Performing each smPCR experiment. This strategy proved as accurate
smPCRs at the optimal template concentration reduces and as robust as performing the dilution according to a
the overall cost of obtaining each sequenced true smPCR 260 nm OD measurement and was used throughout the
and the overall cost of using smPCR with de novo DNA work presented in this paper.
synthesis since it reduces futile sequencing from 50%
(with 1 molecule/well) to 10% (with 0.2 molecules/well) RT-PCR facilitates the diagnosis of faulty reactions
(Supplementary Figure 3a). A standard 260 nm OD mea- We used RT-PCR to confirm that the efficiency at which
surement can be used to determine the optimal our C–A primer amplifies DNA is close to 100%. Given this
concentration. efficiency, we predict the number of PCR cycles required
Even though most of the smPCRs performed using 0.2 to reach PCR amplification saturation from the initial
molecules/well (i.e. 80% of reactions having no template), and typical final template concentrations (Supplementary
these no-template PCRs are easily identified and dis- Figure 4, green plot). Our RT-smPCR results confirm that
tinguished from ‘true’ smPCRs, and their sequencing this prediction is accurate all the way down to single mole-
is avoided. Additionally, the cost of no template PCRs cule amplification, which displays an amplification curve
is further diminished by performing the reactions in that is detectable from approximately cycle 32 and satu-
very low volume (down to 2 ml in standard liquid rates after 42 cycles (Figure 2c and Supplementary
e107 Nucleic Acids Research, 2008, Vol. 36, No. 17 PAGE 6 OF 10
Figure 4, blue plot). This prediction allows real-time deter- polymerization and by annealing of previously elongated
mination of whether PCRs are true smPCRs or false posi- strands are shown in Figure 3c and d, and Figure 3a and
tives (e.g. contaminated, actually had many template b, respectively.
molecules or primer dimers) since they do not exhibit a
typical amplification curve which indicates single molecule Single-molecule verification with random oligos
amplification (Figure 2c), eschewing their further analysis. To facilitate the simple identification of rare smPCRs that
despite the measures reported above were still not per-
Heteroduplexes prevent in vitro cloning of synthetic DNA formed on single molecules, we integrated another feature
Initially, the sequencing of all our true smPCR experiments in our procedure, previously proposed for other smPCR
resulted in shifted sequencing chromatograms which could applications (16). We incorporated oligos with three
not be read properly, despite the fact that in vivo clones random bases at both ends of the synthetic DNA con-
from the same DNA sequenced fine. The cause of this structs that are to be cloned, effectively bar-coding the
turned out to be that de novo constructed DNA is double molecules with a four-letter code at six positions
stranded (1–4,6,7), with each strand having different (46 = 4096 tags) (Figure 4a). Sequencing these molecules
show that the sequence at the location of the random bases
turned out to be error-prone (Supplementary Table 2) construction with no error-correction, depending on the
even though the segments used for their reconstruction error-rate of the oligos used (Figure 5c, dark blue and
were error-free. These segments seemed error free in the green plots).
sequencing of smPCR clones since most of the errors Nevertheless, technically the procedure was successful
inserted during smPCR amplification (i.e. approximately (i.e. there were no frame-shifting heteroduplexes, properly
during the last 37 of the 40 cycles required) are invisible in calculated limiting dilution, no primer–dimer problems,
the sequencing chromatogram (Supplementary Figure 7). etc.), indicating that the remaining difficulty is indeed
To make sure the errors originated from smPCR and the error rate of the polymerase.
not from the oligos we repeated the exact same error-
correction procedure using traditional in vivo cloning of
the GFP fragments into E. coli instead of smPCR. As with De novo synthesis of a 1.8-kb mitochondrial DNA using
the smPCR procedure, error-free segments were chosen the smPCR procedure
and used for reconstruction of the target GFP molecule.
This control procedure yielded error-free GFP molecules We set out to test the procedure using Accusure, a more
out of almost every clone (Supplementary Table 2). accurate (proof-reading) DNA polymerase (Materials and
Therefore, the entire procedure using Taq is noneffec- Methods). This time we also attempted to construct a
tive for de novo DNA synthesis since the error-rate result- longer synthetic construct 1.8 kb long, since a fragment
ing from smPCR amplification is roughly the error-rate of of this length would demonstrate that the procedure can
the synthetic molecules before any error-correction. be used for the complete in vitro synthesis and error cor-
Moreover, error-correction using smPCR with Taq may rection of most synthetic genes. Its synthesis and error
even increase the number of clones needed compared to correction was conducted as a comparative analysis
e107 Nucleic Acids Research, 2008, Vol. 36, No. 17 PAGE 8 OF 10
of success of obtaining error-free clones determines colony picking does exist it requires relatively expensive
the number of clones that one should sequence specialty equipment, while the process reported in this
(Figure 5a and b). We aimed at designing a process that manuscript only requires standard lab equipment and
yields error-free clones with high probability. Our results turned out to be a highly robust and reproducible process.
show that even with high success requirements (90% prob- Furthermore, automation of traditional cloning doesn’t
ability) the difference between our smPCR procedure and sum up to only automated colony picking. It also requires
traditional cloning is negligible up to the 2-kb range at inoculation of bacteria in sterile conditions into a Petri
least (Figure 5a and b). For example, finding error-free dish and overnight growing of colonies. These are difficult
fragments after error correction 1 kb and 2 kb long with to automate and time consuming, respectively. It should
probability of at least 90% requires only 4 and 8 clones be noted that automated colony picking may be substi-
respectively after using our smPCR method compared to 2 tuted by in vivo cloning-by-dilution, but this may hold
and 3 clones after using in vivo cloning (Figure 5a and b). difficulties of its own such as the absence of selection for
blue/white colonies which helps avoid futile sequencing.
In any case, all this is preceded by the process of insert-
DISCUSSION ing DNA into cells (the transformation itself) which may
Paris ‘‘Centre de Recherche Interdisciplinaire’’ (FTO/ 10. Nakano,M., Komatsu,J., Kurita,H., Yasuda,H., Katsura,S. and
CRI). All the other authors have declared no conflicts of Mizuno,A. (2005) Adaptor polymerase chain reaction for single
molecule amplification. J. Biosci. Bioeng., 100, 216–218.
interest. 11. Kraytsberg,Y. and Khrapko,K. (2005) Single-molecule PCR: an
artifact-free PCR approach for the analysis of somatic mutations.
Expert Rev. Mol. Diagn., 5, 809–815.
REFERENCES 12. Lukyanov,K.A., Matz,M.V., Bogdanova,E.A., Gurskaya,N.G. and
Lukyanov,S.A. (1996) Molecule by molecule PCR amplification of
1. Bang,D. and Church,G.M. (2008) Gene synthesis by circular complex DNA mixtures for direct sequencing: an approach to
assembly amplification. Nat. Methods, 5, 37–39. in vitro cloning. Nucleic Acids Res., 24, 2194–2195.
2. Carr,P.A., Park,J.S., Lee,Y.J., Yu,T., Zhang,S. and Jacobson,J.M. 13. Margulies,M., Egholm,M., Altman,W.E., Attiya,S., Bader,J.S.,
(2004) Protein-mediated error correction for de novo DNA Bemben,L.A., Berka,J., Braverman,M.S., Chen,Y.J., Chen,Z. et al.
synthesis. Nucleic Acids Res., 32, e162. (2005) Genome sequencing in microfabricated high-density picolitre
3. Kodumal,S.J., Patel,K.G., Reid,R., Menzella,H.G., Welch,M. and reactors. Nature, 437, 376–380.
Santi,D.V. (2004) Total synthesis of long DNA sequences: synthesis 14. Shendure,J., Porreca,G.J., Reppas,N.B., Lin,X., McCutcheon,J.P.,
of a contiguous 32 kb polyketide synthase gene cluster. Proc. Natl Rosenbaum,A.M., Wang,M.D., Zhang,K., Mitra,R.D. and
Acad. Sci. USA, 101, 15573–15578. Church,G.M. (2005) Accurate multiplex polony sequencing of an
4. Tian,J., Gong,H., Sheng,N., Zhou,X., Gulari,E., Gao,X. and evolved bacterial genome. Science, 309, 1728–1732.
Church,G. (2004) Accurate multiplex gene synthesis from