Genetic Engineering

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 59

Genetic engineering

Genetic engineering is the act of modifying the genetic makeup of an organism. Modifications can be generated
by methods such as gene targeting, nuclear transplantation, transfection of synthetic chromosomes or viral insertion.
Selective breeding is not considered a form of genetic engineering.

Latest Research and Reviews

CRISPR engineering in organoids for gene repair and disease modelling

Organoids bridge the gap between 2D cell lines and in vivo studies. With their 3D organization and cellular
heterogeneity, adult stem cell-derived organoids closely resemble their tissue of origin. The development of CRISPR-
mediated genome engineering and the recent additions of base and prime editing to the CRISPR toolbox have greatly
simplified the generation of exact, isogenic models for Mendelian diseases. Here, we review recent developments in
CRISPR-mediated genome engineering and its application in human adult-stem-cell-derived organoids in the construction
of isogenic disease models. These models allow accurate qualification of the impact of allelic disease variants observed in
patients. Furthermore, we discuss the use of organoids as models for safety and efficacy of CRISPR for gene repair.
Although transplantation of repaired tissue remains challenging, benchmarking CRISPR tools in organoids can bring
genome engineering one step closer to patients.

key points

 CRISPR–Cas9-mediated genome engineering acts by introducing double-stranded DNA breaks into the genome.
The damage repair process can be used for gene knockout or precise targeted introduction of exogenous DNA.
 Next-generation CRISPR tools, including base and prime editing, allow for induction of precise base changes and
small insertions and deletions, bypassing potentially deleterious double-stranded DNA breaks.
 Owing to their 3D organization, adult-stem-cell-derived organoids closely resemble the tissue of origin and are
therefore a good model system to study human health and disease.
 CRISPR–Cas9-mediated genome engineering can be used to create isogenic models to investigate the onset,
cause and treatment of human diseases.
 CRISPR tools can be benchmarked for efficiency and safety by studying gene repair ex vivo in adult-stem-cell-
derived organoids, facilitating CRISPR–Cas9 clinical translation.
 Ex vivo repaired adult-stem-cell-derived organoids can potentially be transplanted into patients to relieve disease
phenotypes

Introduction

Variant sequences in the genome have a fundamental role in the onset, course and outcome of hundreds of
human diseases. A combination of the increasing number of genome-wide association studies and the decreasing price of
whole-exome and whole-genome sequencing has been driving the identification of genetic disease variants1,2. Although
the number of detected genetic variants increases, it remains difficult to accurately qualify their impact on disease
progression. To obtain a deeper understanding of the molecular mechanisms that underlie genetic variations and to
develop novel therapeutic strategies, isogenic human disease models hold great promise. These models consist of human
cells that are engineered to accurately model the genetic variant and are matched with wild-type controls with the same
genetic background. The development of clustered regularly interspaced short palindromic repeats (CRISPR) as an
effective and versatile tool for genome engineering has greatly advanced the generation of isogenic in vitro models of
human disease3,4,5.

Classical 2D tissue culture techniques have been used extensively to better understand the homeostasis and
pathophysiology of the human body6. 2D cell lines are easy to maintain and are amenable to CRISPR–Cas9-mediated
genome engineering. However, such cell lines — typically derived from malignancies — do not reflect the cellular
complexity of the organ they are derived from. To overcome these issues, the development of more complex in vitro
tissue culture techniques has gained traction. These efforts have ultimately led to the development of organoids. These
‘mini-organs’ exhibit faithful micro-anatomy and are grown in a matrix that allows for 3D expansion of stem cells, which
give rise to cell types present in the native tissue7. Current organoid culturing technology exploits either induced
pluripotent stem cells or adult stem cells (ASCs). Organoids derived from induced pluripotent stem cells are taken through
an extensive fate-specialization procedure mimicking the embryonic developmental trajectory of the organ of interest8.
ASC-derived organoids, the subject of this Review, model the adult homeostatic state of organs. They can be derived
from most wild-type human and murine epithelial tissues, including large and small intestine9 (Fig. 1a), stomach10,
kidney11, pancreas12, breast13, endometrium14 and cervix15, liver16,17 (Fig. 1b), upper airway and lung18,19, taste
bud20, lacrimal gland21, prostate and bladder22,23 and thyroid24,25 (Fig. 1c,d). ASC-derived organoids do not require
an extensive maturation process, are genetically stable and can be passaged indefinitely. Moreover, organoids can be
clonally expanded from a single adult stem cell, aiding the generation of CRISPR-mediated isogenic 3D cultures that
closely resemble the tissue of origin26. Besides variant impact qualification, these isogenic organoid models can be used
for drug efficacy screening and de novo drug discovery (Fig. 1d).

Confocal images of adult-stem-cell-derived organoids. a, Small intestine organoids. Nuclei are stained by DAPI
(turquoise) and actin by phalloidin (red). b, Fetal hepatocyte organoids. Nuclei are stained by DAPI (orange) and actin by
phalloidin (blue). c, Murine thyroid organoids. Nuclei are stained by DAPI (purple), actin is stained by phalloidin (green)
and the hormone carrier protein thyroglobulin (Tg) is stained in red. Scale bars in panels a to c are 50 µm. d, Organoids
can be derived from most epithelial tissues of murine and human origin. By using CRISPR engineering, putative disease
variants can be introduced into the genome. By pairing up with wild-type controls, an isogenic system is created that can
be used for drug screening, variant impact qualification and drug discovery. Part a, image courtesy of Joep Beumer. Part
b, image courtesy of Shashank Gandhi. Part c, image courtesy of Jelte van der Vaart.

In this Review we provide the rationale behind the generation of isogenic disease models in human ASC-derived
organoids. First, we review the recent advances in CRISPR-mediated genome engineering that enable efficient induction
of mutations and genetic variants in the genome. Then, we discuss the efforts made to use these technologies in
organoids for modelling and repair of genetic variants that cause disease. Next, we provide technical considerations to
generate genetically altered organoids and create complex isogenic disease models. Finally, we provide an outlook on the
combination of CRISPR–Cas9-mediated genome engineering and ASC-derived organoids.

CRISPR–Cas-mediated genome engineering

CRISPR is superior to previously developed strategies aimed at altering the genome (Box 1) and has quickly been
adopted by laboratories all over the world. To date, six classes of Cas genes have been described, of which the class II
CRISPR system (which includes Cas9) is the most studied27. In conventional CRISPR–Cas9-mediated genome
engineering, the effector protein Cas9 is guided towards a genomic target site by an RNA sequence called the guide RNA
(Fig. 2). This guide RNA consists of a CRISPR RNA (crRNA) sequence, complementary to the target site, and a trans-
activating CRISPR RNA (tracRNA) that is needed for crRNA maturation and binding to Cas9. The guide RNA was simplified
by creating a chimeric crRNA–tracRNA fusion, yielding a single-guide RNA (sgRNA) for target recognition3. The target site
in the genome consists of two elements, the protospacer and a short essential sequence directly downstream of the
target site, called the protospacer-adjacent motif (PAM). The PAM motif of the most frequently used Cas9 (derived from
the bacterium Streptococcus pyogenes, SpCas9) is NGG3. Upon binding of the sgRNA, the Cas9–sgRNA complex screens
the genome for PAMs28. After a suitable PAM is found, sgRNA complementarity to the protospacer is tested by opening
the DNA around the target site in an R-loop into two single DNA strands (ssDNA). These ssDNA strands are individually
cleaved by the RuvC and HNH domain of Cas9, resulting in a double-stranded DNA break (DSB). The cell recognizes this
DSB and has two main pathways to repair the damage. In most cases, non-homologous end joining (NHEJ) is initiated. In
this error-prone process, the two DNA ends are quickly ligated together, which often results in a small deletion or
insertion (indel) at the cut site29. The CRISPR–Cas9-induced indel, if out-of-frame, results in early termination of the
targeted protein. Alternatively, the homology-directed repair (HDR) pathway is initiated by supplementing the reaction
with an exogenous DNA repair template that contains homology to the DNA adjacent to the DSB and the edit of interest.
Therefore, by hijacking the endogenous DNA repair pathways, CRISPR can be used for genome engineering in a sgRNA-
mediated manner3,4,5 (Fig. 2).

Fig. 2: Principles of CRISPR–Cas9-mediated genome engineering.

Upon binding of the Cas9–single-guide RNA (sgRNA) complex to the genomic target site, which consists of a
protospacer and a protospacer-adjacent motif (PAM), the genome is opened in an R-loop, resulting in two single-stranded
DNA (ssDNA) strands. These ssDNA strands are individually cleaved by the RuvC and HNH domain of Cas9, resulting in a
double-stranded DNA break (DSB). The cell has two endogenous repair mechanisms to resolve DSBs. DSB repair by non-
homologous end joining (NHEJ) results in the induction of indels at the target site, which can be used to knock out genes
of interest. Homology-directed repair (HDR) can be used to introduce exogenous DNA, containing the genetic alteration of
interest at the target site.

There are key downsides to using conventional CRISPR–Cas9 for the construction of isogenic disease models.
First, because the human genome consists of three billion base pairs, the chances are great that the sgRNA will initiate
DSBs at off-target sites30. To overcome this issue, high-fidelity variants of Cas9 such as SpCas9-HF31 and hifi-
Cas932 have been developed. Alternatively, off-target free sgRNAs can be selected using profiling strategies such as
Guide-seq or Circle-seq prior to use in experiments33,34. Next, even if an off-target free sgRNA is used, on-target DSB
repair can result in undesired editing outcomes, such as the induction of large deletions, insertions and translocations 35.
In extreme cases, CRISPR–Cas9 could lead to chromothripsis, a process of chromosome shattering and massive structural
variation downstream of the sgRNA target site36. Finally, HDR upon CRISPR-mediated DSB induction is required for
modelling specific genetic variants. However, HDR can only occur during the G2 and S phase of the cell cycle when sister
chromatids are present37, which thereby risks NHEJ repair pathway domination. Cell cycle synchronization and addition of
NHEJ inhibitors are two strategies to push the cell towards HDR-mediated DSB repair instead38,39. Nevertheless, because
both alleles are often cleaved, the end result of an HDR experiments is often correct variant introduction on one allele
whereas the second allele is knocked out owing to the cell’s bias towards using the NHEJ for DNA repair29. Because only
about 2.4% of disease-causing variants are indels, alternative strategies of genome engineering that do not require DSB
induction have been pursued40.

Box 1 A brief history of genome engineering

Strategies to alter the genome have been around since the early 1900s. Researchers have used ultraviolet and X-
ray irradiation to induce random, unspecific mutations in the DNA132. This random mutagenesis was, and still is, used in
forward genetic screens to elucidate the function of genes. Most notably, Sydney Brenner performed forward genetic
screens to ascribe function to hundreds of genes in the nematode Caenorhabditis elegans133, whereas Christiane
Nusslein-Volhard and Eric Wieschaus used saturation mutagenesis screens in the fruitfly Drosophila melanogaster and
discovered — among many other things — the role and function of homeobox genes134. Although efficient, these
strategies lack the possibility for targeted alterations in the genome.

Genome engineering, defined by the insertion, deletion or replacement of DNA at a specific site in the genome of
an organism or cell, really took off in 1979 with the description of targeted insertion of DNA in yeast by Scherer and
colleagues135,136. They harnessed the cell’s intrinsic process of homologous recombination to seamlessly insert
exogenous DNA into the genome of Saccharomyces cerevisiae, which was later also demonstrated in mammalian
cells135,137. However, homologous recombination is inherently inefficient, therefore, a similar process called homology-
directed repair (HDR) was explored for genome engineering. Upon targeted DSB induction in yeast, the cells initiate HDR,
resulting in seamless introduction of exogenous DNA138. This process exemplified the requirement of a system that
introduces targeted DSBs into the genome of living cells for efficient genome engineering. Soon afterwards, different
strategies that enabled this feat were developed. Meganucleases are a class of homing endonucleases that recognize
large DNA sequences in a similar fashion to restriction enzymes. In 1994, meganucleases created targeted DSBs in mouse
chromosomes139 and further engineering of chimeric versions allowed targeting of specific sites in the genome 140.
Meganucleases, however, are difficult to engineer, which is why their use has now become mostly obsolete. Two easier-
to-reprogram genome-engineering tools, the zinc-finger nucleases (ZFNs) and the transcription activator-like effector
nucleases (TALENs), were developed soon afterwards141,142. Both ZFNs and TALENs are fusion proteins of a repeat of
sequence-specific DNA binding domains to a non-specific DNA-cleaving nuclease such as FokI. Because of the predictable
binding of the individual DNA-binding domains, ZFNs and TALENs are easier to reprogram than meganucleases, but the
requirement for protein design and engineering for each subsequent target site still hampered scalable application 141. In
2007, Philippe Horvath and colleagues uncovered the biological function of a series of tandem repeats called clustered
regularly interspaced short palindromic repeats (CRISPR), found within bacterial genomes143. These repeats, together
with CRISPR-associated (Cas) genes, constitute a bacterial defence mechanism against foreign DNA144. Doudna,
Charpentier and colleagues then harnessed the system and developed CRISPR–Cas as a genome-editing tool3. The first
descriptions of CRISPR to alter the human genome were published soon thereafter4,5. Contrary to meganucleases, ZFNs
and TALENs, which require protein engineering for each subsequent target, CRISPR–Cas-mediated genome engineering
simply requires a new RNA molecule. Because of its easy retargeting and comparatively low cost, CRISPR–Cas9-mediated
genome engineering was quickly adopted by the research community and it came as no surprise when Doudna and
Charpentier were awarded with the Nobel prize in chemistry in 2020 for their groundbreaking discovery.

DSB-free genome editing

A single histidine residue at position 840 of the HNH domain of SpCas9 cleaves the PAM strand, whereas the
aspartate at position 10 in the RuvC domain cleaves the opposite strand3. Mutating both amino acids to alanines (D10A
and H840A) results in nuclease-inactive or ‘dead’ Cas9 (dCas9). dCas9 still recognizes its target site and opens up the
DNA in an R-loop but does not induce DSBs. The binding of dCas9 to its target site alone can function as a repressor of
transcription and is dubbed CRISPR interference (CRISPRi)41. Alternatively, dCas9 can be used as a vehicle to localize
DNA effector proteins to the genome. Examples of this strategy are CRISPR activators (CRISPRa)42 and CRISPR–DNMT3
fusion proteins for targeted methylation43. To induce genetic variants, DNA-alteration enzymes are fused to dCas9 to
overcome the limitations associated with DSB induction in genome engineering. These ongoing strategies could facilitate
CRISPR-based genome-engineering clinical translation (Box 2).

Box 2 CRISPR from bench to bedside

The biggest concern for in vivo genome engineering is safety. As the human genome is three billion base pairs in
size, there is a good chance that even a single-guide RNA (sgRNA) designed for specificity will create off-target effects.
Careful design of sgRNAs can enable genome engineering without off-target effects at other loci in the genome of
mice145, but off-target effects must be mapped for each individual genomic target site to ensure safety in the clinic.

Cas9 mRNA delivery by lipid nanoparticle packaging efficiently target the liver, whereas other organs remain more
difficult to target. Therefore, much effort has been put into developing adeno-associated viruses (AAVs). These viruses
benefit from not integrating into the host genome and have reduced immunogenicity compared to viral vectors such as
lentivirus and adenovirus146. Nine individual AAV serotypes exist naturally, each with their own tropism towards
individual organs147. Therefore, selecting an AAV serotype that has tropism only towards the target organ can limit Cas9
exposure, thereby reducing unwanted off-target effects. By using AAVs in combination with tissue-specific promoters,
expression of CRISPR components in cells that do not need to be repaired can be further decreased 148. Furthermore, by
mixing the capsid and genome of AAV serotypes — called pseudotying — the tropism of AAVs can be further refined149.

A downside of AAVs is the limited packaging space of roughly 5.2 kb (ref. 150). SpCas9 itself, without the
promoter, is already 4.1 kb, leaving little space for anything else in the AAV genome. Delivery of base editors or prime
editors in a single AAV is even more challenging, owing to their size. Split inteins can post-translationally collate proteins
and have been developed for base and prime editors for in vivo delivery in mice49,151,152. Alternatively, a completely
new delivery method describing a virus-like particle that is engineered out of proteins native to mammals has been
proposed153. It remains to be seen whether these strategies are translatable to humans.

A much safer strategy is performing CRISPR ex vivo and then transplanting the tissue back into patients. Using
CRISPR in blood, similar to strategies used in chimeric antigen receptor (CAR) T cells could be the most straightforward
approach154. CRISPR-engineered bone-marrow stem cells have already been autologously transplanted successfully into
patients suffering from β-thalassaemia and sickle cell disease155. Because ASC-derived organoids can be grown from
most epithelial tissues, there could be similar benefits in transplanting CRISPR engineered organoids to patients.
Moreover, organoids from most epithelial tissues can be maintained indefinitely and can be split and propagated rapidly
to yield the biomass needed for successful transplantation156. Current efforts to transplant ASC-derived organoids have
shown some success in mouse intestine124,125,157. Similarly, cholangiocyte organoids are able to reconstitute bile ducts
in the human liver. Despite using human donor livers undergoing ex vivo normothermic perfusion, meaning that there
was no patient transplantation, ASC-derived organoids readily integrate into human organs126. Nonetheless, there is a
need to develop tissue-specific transplantation strategies to unlock the autologous transplantation of CRISPR-repaired
organoids throughout the body.

Base editing

The first base editor fuses dCas9 to the rat cytidine deaminase apolipoprotein B mRNA editing catalytic
polypeptide-like (rAPOBEC1), which catalyses the conversion from cytidine to uracil. The cell repairs this uracil into
thymidine, resulting in a construct (BE1) that replaces a C•G by a T•A base pair, called a cytosine base editor (CBE) 44.
First-generation CBEs were inhibited by uracil glycosylation. Therefore, second-generation base editors (BE2) were
developed by fusing an uracil glycosylase inhibitor (UGI) to the dCas9–rAPOBEC1 fusion45. To increase editing efficiency,
dCas9 can be converted into a nickase SpCas9-D10A. In this optimized base editor architecture (BE3), the strand that is
not modified by rAPOBEC1 is cleaved. The cell detects the nick and initiates DNA repair to resolve the damage. The
strand containing the base change is then used as a template for repairing the nick, yielding stable integration of the edit
with an efficiency between 15% and 75% depending on the sgRNA44. The BE3 architecture was further improved by
fusing an additional UGI in combination with linker optimization, resulting in the fourth-generation cytosine base editor
(BE4). BE4s have improved editing efficiency (by around 50%), with two-fold reduction of unintended byproducts such as
indels and point mutations46. Subsequent codon optimization47 and ancestral reconstitution48 led to a CBE architecture
that currently enables the most robust base editing in 2D cell lines, organoids and in vivo by improving expression and
nuclear localization of the proteins49 (Fig. 3a). A similar base editor, that enables C-to-T base changes, was developed by
fusing cytidine deaminase 1 (CDA1) to SpCas9 D10A in a system called Target-AID50. This base editor has a shifted activity
window (from positions 4–8 in the sgRNA for CBE to positions 1–5 in target-AID)50. Moreover, C-to-G and C-to-A changes
are frequently observed in Target-Aid. These unwanted byproducts also occur in first-generation BE3, but have been
resolved in newer iterations of CBE such as BE4 (ref. 46).
Fig. 3: Principles of cytosine and adenine base editing.

a, A cytosine base editor consists of a nickase Cas9-D10A fused to a cytidine deaminase and a tandem repeat of
uracil glycosylase inhibitors (UGI). After binding of the Cas9–single-guide RNA (sgRNA) complex, the DNA opens up in an
R-loop, which enables cytidine deamination and conversion into uracil. After nicking of the non-edited opposite strand,
the U•G base pair is repaired into a T•A base pair, effectively resulting in C>T base editing. b, An adenine base editor
consists of a nickase Cas9-D10A fused to a heterodimer of TadA, an adenine deaminase. The asterisk indicates the
evolved TadA variant and further exemplifies the heterodimer state of the fusion protein. After R-loop generation by
binding of the Cas9–sgRNA complex to the target site, adenine is deaminated, effectively turning it into inosine. The
inosine residue is converted into guanine by nicking of the non-edited strand, after which DNA repair is guided towards
the correct A>G edit. c, The cytidine and adenine deaminases function only on single-stranded DNA. Therefore, base
editor activity is limited to a small editing window within the R-loop that spans from roughly the 4th to the 8th base from
the start of the sgRNA.

The opposite base change can be performed with the use of adenine base editors. For example, tRNA-specific
adenosine deaminase, TadA, is a protein that enables editing of adenine residues in the DNA51. Because wild-type TadA
does not act on DNA, the protein was evolved using a process called phage-assisted continuous evolution (PACE) 52.
Fusion of the seventh generation of evolved TadA in a heterodimer with a wild-type TadA to SpCas9-D10A results in an
adenine base editor (ABE) with an A-to-G editing efficiency of up to 50% depending on the target site, which is
comparable or higher than that of third-generation CBEs44,51. As opposed to CBEs, the TadA heterodimer in ABEs
deaminates adenine residues, which are then converted to inosine. Upon cleavage of the non-edited strand and resolving
of the DNA mismatch, the inosine residue is converted into guanine, effectively resulting in a A•T to G•C edit (Fig.  3b).
The applicability of ABEs was further improved by codon optimization and additional PACE-mediated directed evolution,
resulting in optimized eighth-generation base editors with a 1.5–3.2-fold improvement in editing efficiency for ABE8
(ref. 53), and a 9.4–24-fold increase for ABE8e54, depending on the sgRNA and nucleotide location in the editing
window49.

The deaminases fused to Cas9 in base editors function only on ssDNA. Therefore, base editors act only on a few
bases of the single-stranded R-loop that is generated upon Cas9 target recognition3,4,5. This so-called ‘editing window’
roughly spans four nucleotides, between positions 4 and 8 from the 5′ end of the protospacer (Fig.  3c). This ssDNA
dependence greatly reduces the sgRNA-mediated off-target effects of base editors but requires very specific localization
of Cas9 to induce the desired edits. Relaxing the PAM requirements and increasing the target space of Cas9 has resulted
in a series of evolved SpCas9 variants. For example, PACE and structural guided evolution resulted in xCas9 and SpCas9-
NG, which recognize an NGN PAM55,56. Further structural modification led to nearly PAMless SpCas9 variants that target
NRN (where R = A or G) and NYN (where Y = C or T)57. An alternative strategy to increase the target space of base
editors relies on Cas9 homologues such as Streptococcus aureus (PAM = NNGRRT). Other approaches resulting in evolved
SpCas9 variants and SpCas9 homologues with alternative PAM requirements and editing windows have also been
developed40,58.

However, base editors have limitations. Although ABEs essentially yield zero off-target effects, genome-wide
profiling of CBEs has shown genome-wide C>T mutations owing to the overexpression of APOBEC in the cell59,60.
Evolved APOBEC domains can be used in CBE to decrease these side effects61. Moreover, CBE-induced uracil residues
sometimes yield adenine and guanine residues instead of the desired thymidine owing to unwanted cellular uracil DNA
glycosylation during the base excision DNA repair pathway44,47. Despite these editing outcomes being undesired, this
observation has led to the development of new classes of base editor. For example, removal of UGI from the CBE
architecture pushes DNA repair towards guanine instead of thymidine and has allowed the development of C>G base
editors62,63. Furthermore, not all desired point mutations can currently be generated by base editors. Finally, base
editors cannot introduce indels or larger genetic variants. Prime editors have been developed to overcome these
limitations and allow for more versatile DSB-free genome engineering.

Prime editing

The rationale behind prime editing is to bring exogenous DNA with the edit of interest close to the Cas9 binding
site. In the first generation of prime editors (PE1), a reverse transcription (RT) domain derived from the Moloney murine
leukaemia virus was fused to nickase SpCas9-H840A64. The RT domain converts RNA into DNA and finds its template in
the 3′ extension of the specially designed sgRNA, called the prime-editing guide RNA (pegRNA), that guides the Cas9 in
PE1 to the target site. Upon target recognition, the PAM-containing strand is nicked by the active HNH domain of Cas9-
H840A. Then, the pegRNA extension binds to the nicked strand at the primer-binding site (PBS), after which the RT
domain of PE1 uses the remaining pegRNA (RT template) to synthesize a 3′-DNA flap containing the edit of interest. This
DNA-flap is resolved by cellular DNA repair processes integrating the edit of interest. Efficiencies of prime editing can be
further enhanced by using a rationally evolved variant of the RT domain (PE2) and by inducing a proximal second nick in
the opposing DNA strand, guided by a second (PE3) guide RNA64 (Fig. 4). However, the use of a PE3-guide in prime
editing comes with a cost as indel numbers are substantially higher compared to PE2 (6.8% average indels for the sgRNA
with the highest editing efficiency)64. This issue can be resolved by using a PE3b-guide that matches the edited strand,
resulting in a second nick once the edit is made65. In addition to all transition and transversion mutations, the first
description of prime editing reported the induction of deletions of up to 80 and insertions of up to 44 base pairs64. For
efficient use of prime editing, extensive optimization of the pegRNA and PE3 guide is required. The length of both the PBS
and the RT, as well as the distance between the pegRNA and PE3-guide nick influence the editing efficiencies of prime
editing. Optimization can be easily performed in the HEK293T cell line, but is more difficult in organoids or in vivo.
However, once fully optimized, prime editing is the most versatile DSB-free genome-engineering technology to date.

Fig. 4: Principles of prime editing.


The prime-editing guide RNA (pegRNA) complexes with the nickase SpCas9-H840A–RT prime-editing fusion
protein and binds to the target DNA. Upon protospacer-adjacent motif strand cleavage by SpCas9-H840A, the primer-
binding site of the pegRNA extension binds the single-stranded DNA, upon which the reverse transcriptase (RT)
synthesizes a 3′-DNA flap containing the edit of interest. This 3′-flap is resolved by cellular DNA processes, which can be
further enhanced by introducing a proximal second nick in the opposing DNA strand, guided by a second (PE3) guide
RNA. The red scissors indicate the nick site of the SpCas9-H840A. PAM, protospacer-adjacent motif; PBS, primer-binding
site.

Prime editing holds great promise owing to its versatility in potential edits; however, the need for optimizing
pegRNA and PE3-guides limits its application in organoids. To overcome this issue, three key modifications have been
made to the prime-editing system. First, the use of two pegRNAs in trans with overlapping RT domains increases prime-
editing efficiencies in human cells as well as plants66,67,68. Second, engineered pegRNAs can have evopreq or tmpknot
domains fused to the 3′ end. These domains increase the stability of the pegRNA, which can increase prime-editing
efficiency69. Finally, including the R221K and N394K amino acid changes increases the nuclease function of SpCas9,
resulting in a more efficient PE2Max70.

Isogenic organoid disease models

Because ASC-derived organoids more closely resemble their tissue of origin compared to 2D cell lines, they are
more suitable for the study of human physiology. The rapid developments of CRISPR-mediated genome engineering now
allow for rapid generation of isogenic organoid models that harbour specific mutations that have a role in the onset and
course of human diseases.

Tumorigenesis and cancer

The majority of CRISPR-generated isogenic ASC-derived organoid models currently focus on tumorigenesis and
carcinogenesis. Two similar studies in human intestinal organoids recreated the Vogelstein model of sequential driver
mutation accumulation in colorectal tumorigenesis71,72,73. By removing selected growth factors or adding small molecule
inhibitors, organoids with mutations in APC (removal of Wnt), TP53 (addition of Nutlin), KRAS (removal of epidermal
growth factor, EGF), SMAD4 (removal of Noggin) and PIK3CA (addition of MEK-inhibitors) can be generated.
Subcutaneous transplantation of these growth-factor-independent organoids in mice results in metastasizing
carcinomas72,73. Inspired by these first two studies, multiplexed genome engineering was applied in ASC-derived
organoids with subsequent transplantation into mice to elucidate the minimal requirements for tumorigenesis in other
tissues. For example, subcutaneous transplantation of CRISPR-mediated knockout of PTEN, TP53, RB1 and NF1 in breast
organoids results in tumour formation resembling oestrogen- and progesterone-receptor-positive and human epidermal
growth factor receptor 2 (HER2)-negative luminal B breast cancers in mice74. Furthermore, CRISPR–Cas9-mediated
knockouts of TP53, SMAD4, PTEN, NF1 and BAP1 were generated in cholangiocyte organoids to elucidate the role of the
tumour suppressor BAP1 in cholangiocarcinoma75. Loss of BAP1 results in impaired chromatin accessibility and thus gene
expression, crucial for epithelial integrity in the organoids and in mice75. Two studies generated isogenic models for
pancreatic ductal adenocarcinoma (PDAC) in human ASC-derived ductal pancreas organoids. When combined with
oncogenic KRASG12V, CRISPR-based multiplexed mutation of TP53, CDKN2A and SMAD4 results in organoids with PDAC
phenotypes76, whereby overexpression of KRAS leads to organoids mimicking PDAC precursor states77. Two
independent studies created CRISPR–Cas9-mediated knockout models of DNA repair genes. Mutational signature analysis
of human colonic organoids with loss-of-function mutations in MLH1 revealed the predominant occurrence of COSMIC
signature 20 associated with errors made during normal DNA replication65,78. Moreover, knockout of NTHL1 in colonic
organoids results in an increase in C>T transitions, which resembles COSMIC signature 30, whereas XPC knockout
generates organoids deficient in nucleotide excision repair, yielding COSMIC signature 8 (ref. 79). In a more sophisticated
approach, the common fusion genes DLG1–BRAF, PTPRK–RSPO3 and EIF3E–RSPO2 were modelled into human colon
organoids80. Co-transfecting two sgRNAs that target both loci of interest results in complex genomic rearrangements only
in organoids that lacked TP53 expression80. CRISPR–Cas9 strategies can be similarly used to create single- or double-
mutant isogenic knockout models of TP53 in human hepatocyte organoids81, ARID1A in human gastric
organoids82, RB1 in human intestinal organoids to model neuroendocrine neoplasms83 and RNF43 to model early-onset
colorectal cancers84.

Multiple genes can also be CRISPR-screened in a single experiment. For example, performing a small targeted
CRISPR screen in human intestinal organoids enables mapping of RASGAP dependencies in colorectal cancer progression.
Only loss of NF1 results in enhanced RAS-ERK signal amplification85. To increase the throughput of CRISPR, genome-
wide CRISPR screening platforms have been developed86,87,88, allowing for positive and negative survivability screens
while assessing loss-of-function mutations across all genes in the genome. Furthermore, protocols for genome-wide
CRISPR screening have been developed for use in ASC-derived organoids. For example, a positive selection genome-wide
CRISPR screen performed in WT, APC-KO and APC-KO; KRASG12D mutant intestinal organoids, identified genes involved in
a previously undescribed link between TGFβ and WNT signalling, revealing PBRM189 and ARID1A and SMARCA490 as
novel hits driving TGFβ resistance. These studies emphasize the possibility of genome-wide CRISPR screening in 3D
models.

To model mutations observed in cancer patients accurately, simple CRISPR–Cas9-mediated knockouts by indel
formation is not sufficient. Different mutations in the same cancer gene can have drastically different effects, as shown
for TP5391 and KRAS92, highlighting the need for specific mutations instead of ‘blunt’ CRISPR–Cas9-mediated knockouts.
For example, CRISPR-based prime editing for cancer modelling can be performed by introducing common TP53 mutations
in human colon and hepatocyte organoids. Besides prime-editing-mediated induction of TP53R175H and TP53R249S, it is
possible to directly compare ABE to prime editing by introducing the same mutation using both techniques. Targeting
of TP53Y220C results in organoids that are able to grow on medium containing the mouse double minute 2 homologue
(MDM2) inhibitor nutlin-3, which kills wild-type organoids by stabilizing TP53 (ref. 93). ABE substantially outperforms
prime editing with 1.5–2-fold increased editing efficiencies but induces undesired additional base changes 93. Similarly,
introducing common in-frame deletions in CTNNB1 exon 3 in human cholangiocyte organoids generates mutant
organoids, which can grow without exogenous Wnt94. These results highlight the efficacy of DSB-free genome
engineering in organoids for cancer modelling.

Isogenic disease models beyond cancer

An example of combining CRISPR with ASC-derived organoids beyond cancer applications is the creation of
isogenic models of DGAT1 loss in intestinal organoids as a model for congenital diarrhoeal disorder95, resulting in mutant
organoids being more susceptible to lipid-induced cell death compared to their controls95. Another example comprises a
genome-wide positive CRISPR screen to study confounding factors contributing to ulcerative colitis in mice. Despite wild-
type organoids dying when treated with interleukin-17A (IL17A), organoids that harbour mutations
in IL17RA and NFKBIZ upon treatment are enriched96. Furthermore, organoids are a great model to study the 2019
pandemic causing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)97,98,99. To elucidate essential host
factors for SARS-CoV-2 entry, a targeted CRISPR screen generated knockouts of 19 previously implicated genes 100.
Interestingly, SARS-CoV-2 is not able to infect intestinal organoids harboring mutations in the host
genes ACE2 and TMPRRS2, whereas none of the other 17 target genes shows a substantial decrease in infection potential
of the virus100.

Assessing CRISPR-mediated gene repair

Because ASC-derived organoids more closely resemble their tissue of origin than do 2D cell lines, they hold the
promise of mapping the efficacy and safety of therapeutic genome engineering in vitro prior to in vivo application.
However, genome-engineering tools are benchmarked for efficiency and safety in conventional 2D cell lines such as
HEK293T and U2OS4,5,44. Despite the ease of handling of these in vitro cultures, on- and off-target efficiencies can differ
vastly from the cell types that are targeted in patients. For example, Cas9 binding can be influenced by methylation of
CpG islands and chromatin accessibility, which differs greatly between cell types101,102. Therefore, testing CRISPR
efficiency and safety in the target cell type in ASC-derived organoids could provide more accurate prediction of CRISPR-
based genome engineering in patients. Furthermore, transplantation of ASC-derived organoids might complement whole-
organ transplantation17,103. In vitro CRISPR-repaired autologous organoids could similarly be transplanted back to
patients after rigorous off-target determination by whole-genome sequencing for safety purposes (Box 2).

Cystic fibrosis

The first hereditary disease to be repaired in human stem cells with the use of CRISPR–Cas9-mediated genome
engineering was cystic fibrosis104. Cystic fibrosis is caused by various mutations in the cystic fibrosis transmembrane
conductance regulator gene (CFTR), with the deletion of phenylalanine-508 being the most common (F508del)105. ASC-
derived intestinal organoids model the function of the CFTR channel, in this case, through the forskolin-induced swelling
assay, which correlates with clinical disease severity106,107,108. Sequencing-based screening of co-transfected Cas9 with
a targeting sgRNA towards exon 11 and a donor plasmid containing a repaired F508 CFTR sequence and an intronic
insertion of puromycin reveals correction of the F508del mutation in 17 out of 89 sequenced organoid clones (19.1%).
These repaired organoids restore the forskolin-induced swelling response to wild-type levels104. Another strategy relies
on repairing deleterious splice site mutations that disrupt CFTR function: 3272–26 kb A>G and 3849+10 kb C>T. Instead
of directly repairing the point mutations, an allele-specific disruption of the mutation can be chosen by using Cas12a, a
type 5 Cas protein109,110. Lentiviral transduction of the Cas9 and sgRNA into intestinal organoids derived from patients
with cystic fibrosis results in 40% allele-specific indel induction depending on the corrected splicing as measured by
forskolin-induced swelling110.

An intestinal organoid biobank was subsequently established containing 664 organoid lines that represent 154
distinct CFTR mutations111. From this biobank, organoids that could be repaired by SpCas9-ABE were selected. Intestinal
organoids harbouring the R785* mutation were transfected with base editing reagents, after which forskolin-induced
swelling revealed functional repair in about 9% of the transfected organoids. The target space of base-editor-
mediated CFTR repair was increased by using xCas9-ABEs to repair R553*, R1162* and W1282* mutations in intestinal
and upper airway organoids111. Similarly, prime editing in intestinal organoids allows DSB-free repair of the F508del
mutation93. The lack of genome-wide off targets as measured by whole-genome sequencing of intestinal organoids
after CFTR repair with base editing and prime editing underlies the safety of DSB-free genome engineering for
therapeutic purposes93,111. To increase the in vitro editing efficiency of prime editing, a fluorescent prime editing and
enrichment reporter called fluoPEER was developed112. In FluoPEER, mCherry is expressed if active prime editing occurs
within the cell. Fluorescence-activated cell sorting (FACS)-based selection of mCherry-positive cells facilitates the
generation of isogenic prime-edited organoids. Using FluoPEER, the repair efficiency of CFTRF508del is increased to 80%,
enabling reparation of the elusive CFTRG542* mutation112.

Diseases beyond cystic fibrosis

Applying prime editing in patient-derived isogenic intestinal organoids allows restoration of the most common
mutation in DGAT1, c.629_631delCTT,p.s210del, which causes a defect in fatty acid storage in lipid droplets, as measured
by survivability after fatty acid addition to the culture medium94. Similar results were observed in an experiment
repairing ATP7B mutations in liver organoids of patients with Wilson disease94. Subsequently, using the FluoPEER system
effectively corrected mutations in ABCB4 and ATP8B1 responsible for intrahepatic cholestasis112. CRISPR-engineered
isogenic organoids can also be used to study primary cilia dyskinesia disease. Using an organoid differentiation protocol
allows visualization of cilia defects in airway organoids from patients with primary cilia dyskinesia 113. From a mini-
biobank of patient-derived organoids, organoids harbouring a splicing mutation in the cilia gene DNAH11 can be repaired
using prime editing with efficiencies of up to 85%. Owing to limitations in clonal outgrowth of human airway organoids,
no morphological analysis of repaired organoids could be performed113.

Technical considerations

Because conventional and next-generation CRISPR tools have primarily been developed for 2D cell lines,
translation into 3D cell cultures is not straightforward. Therefore, key considerations need to be addressed before using
ASC-derived organoids to create isogenic disease models.

The right genome-engineering tool

To circumvent undesired on-target and off-target effects of conventional CRISPR–Cas9, the use of next-
generation CRISPR tools, in this case, base editing and prime editing, is advisable. CBE can be effectively used for the
introduction of stop codons in the genes114, mediating C•G to T•A base changes turning arginine (CGA to TGA),
glutamine (CAA/CAG to TAA/TAG) and tryptophan (TGG to TAG/TGA/TAA) into stop codons. According to the CRISPR-
STOP method, because CBE does not require DSBs, lower levels of apoptosis are observed, resulting in cells that are less
stressed upon transfection of genome-editing components. If no suitable sgRNAs are available for the CBE-mediated
introduction of stop codons, ABE can be used to disrupt either the start codon 115 or splice sites116 to effectively create a
gene knockout. Importantly, using CRISPR-STOP eliminates the need for Sanger deconvolution or sub-cloning to
determine individual indel outcomes of both alleles, as is required for conventional CRISPR–Cas9-NHEJ, resulting in a
more efficient genotyping process117.

We recommend first designing the CBE-mediated stop-codon insertion before designing ABE-mediated methods
because the cell could still express the protein of interest owing to alternative splicing or use of an alternative start site.
Moreover, it is important to realize that the DNA-altering fusion proteins in base editors function in a specific sequence
context. For example, a machine-learning protocol called BE-Hive reports that guanine residues in front of the cytosine
substantially decrease the levels of base editing when using BE4118. Alternatively, modifications of CBE and ABE, such as
evoA-BE4 and ABE-CP, can be used to perform base edits in alternative sequence contexts119. Moreover, base editors
function most effectively within the editing window that roughly spans from the 4th to the 8th nucleotide from the start of
the sgRNA (Fig. 3c). If a specific edit is required, but editing of additional nucleotides within the editing window would
result in unwanted amino acid changes, prime editing is the preferred strategy because the RT-template can be designed
to exclusively incorporate the edit of interest.

Delivery and selection

The delivery of genome-editing agents into organoids is considerably more difficult than in popular 2D cell lines.
Simple lipofection of plasmid DNA can reach up to 95% transfection efficiency in HEK293T, whereas efficiencies in 3D
organoids differ greatly per line and can be as low as 1.5%111. Lentiviral transduction can be a more efficient strategy.
However, even if titrated properly, the viral genome is prone to integrate multiple times, which might influence organoid
fitness90. Electroporation of ribonucleoprotein complexes into organoids substantially increases transfection
efficiencies120, but it decreases flexibility, because for each subsequent next-generation CRISPR tool, a new protein has
to be produced. The low transfection efficiencies in organoids do not limit successful isogenic model generation as long as
strategies to select for either the transfected or functionally edited organoids is taken into account during the
experimental setup. For functional selection, CRISPR-engineered organoids are selected for based on the introduced
genetic variant, which can be based either on survivability or phenotypic changes upon genome engineering. Examples of
functional selection based on survivability are the removal of Wnt and Rspondin-1 for selecting WNT pathway mutants
such as APC and removal of EGF in combination with the addition of the EGFR inhibitor gefitinib or MEK inhibitors for the
selection of oncogenic KRAS or PIK3CA mutations in intestinal organoids, respectively72,73. TP53 mutations can be
selected by the addition of MDM2 inhibitor Nutlin-3, which enables straightforward mutagenesis of multiple cancer genes
by CRISPR multiplexing with TP53 (ref. 75). An example of morphology-based functional selection is the swelling
response of intestinal organoids that carry either a wild-type (swelling) or mutant (no swelling) CFTR gene93,104,111. If
no functional selection is available for the desired edit, transfection selection is the preferred strategy. The choice can be
made either by FACS sorting based on fluorescence101 or by antibiotic resistance that is acquired upon integration into
the coding sequence of a gene to create a knockout77 or by co-transfection of hygromycin resistance
piggyBac119 (Fig. 5).

Fig. 5: The process of creating isogenic disease models in adult-stem-cell-derived organoids.

Patient-derived organoids are dissociated into single cells, upon which genome-engineering agents are delivered
by electroporation, lipofection or lentiviral transduction. Selection for edited cells can be performed based on transfection
selection or functional selection. After Sanger validation, clonal isogenic organoid pairs can be used for variant impact
qualification, drug discovery and screening and assessment of the safety of genome-engineering approaches.

Outlook

Conventional CRISPR engineering through active Cas9 nucleases that create DSBs in the genome is highly
efficient; however, it can induce DNA damage and be detrimental to the cell. To overcome these issues, next-generation
CRISPR tools have been developed that no longer require the induction of a DSB to induce genetic alterations. The first
class, base editors, allow for the introduction of either C>T or A>G base changes and have already proved their potential
in vitro and in vivo. The need for more versatile genome-editing tools has led to the development of prime editors that
can induce all transition and transversion point mutations, as well as introduce DNA insertions and deletions. These
developments have allowed modelling or repair of over 90% of all genetic variants described in human disease, simply by
selecting the most optimal genome-engineering tool for the desired genetic alteration. The application of CRISPR tools
can be extended to organoids that can be derived from ASCs from both healthy and diseased donors. Complex ASC-
derived isogenic disease models have been developed to reveal the mechanisms of disease progression.

Although creating knockouts of interest in ASC-derived organoids is not difficult, modelling single-nucleotide
variants and larger genomic alterations remains a challenge. CBE and ABE have proved to be efficient for genome
engineering and disease modelling of organoids; however, simple and robust application of prime editing could
substantially increase the scope of single-nucleotide variants modelling because it can mediate all base changes. The
suggested improvements to prime-editing strategies enhance the editing efficiency in HEK293T cells, but it remains to be
seen whether they also prove to be effective in ASC-derived organoids, where editing efficiencies appear to be more
difficult to predict.

Even if further development of the current genome-engineering toolbox results in more robust genome
engineering, we can still engineer ‘only’ 90% of the genetic variants observed in patients using the current toolset. The
remaining 10% of disease-causing mutations involve larger chromosomal alterations such as larger inversion, deletions
and insertions, up to the loss or duplication of a complete chromosome. Efforts are ongoing to increase the target scope
of genome engineering. Two new iterations of prime editing could be part of the solution. Insertions of up to 5 kb and
inversion of DNA pieces of up to 40 kb can be achieved by pairing prime editing with site-specific recombinases 66.
Combining prime editing with integrases increases the potential of genomic integration up to 36 kb without the need for
DSBs121.

Genome-wide CRISPR screens have already been used in organoids to obtain biological insight in tumour
development and colitis. However, to perform a high-quality CRISPR screen, a library saturation of up to 500-fold is
needed, requiring tens of millions of transfected or transduced adult stem cells 90. Scalability in ASC-derived organoids is
expensive owing to the cost of 3D matrices and growth factors. To resolve this issue, a 3D matrix consisting of only 5%
basement membrane extract (compared to the conventional 50–100%) has been benchmarked for CRISPR screens in
organoids, resulting in easy and, most importantly, cheap expansion of cancer organoid cells122. This protocol adaptation
could further simplify genome-wide CRISPR screens in organoids derived from different tissues. Moreover, most genome-
wide CRISPR libraries still use NHEJ to create genetic knockouts. Genome-wide base editor screens for DSB-free
screening of disease variants have been developed123, which could be expanded to ASC-derived organoids to enable
high-throughput and accurate qualification of genetic variants in the future.

The combination of CRISPR and ASC-derived organoids could also benefit the clinical translation of genome
engineering. Despite its high efficiency, safety remains the biggest concern for CRISPR–Cas9-mediated in vivo genome
engineering. ASC-derived organoids can address safety concerns because gene repair can be performed ex vivo followed
by rigorous off-target analysis. Moreover, organoids can be rapidly expanded, and the safely corrected clone can then be
expanded and transplanted back into the patient to repopulate the affected organ. However, tissue-specific
transplantation protocols do not yet exist for most tissues. Transplantation of ASC-derived organoids have originally
focused on the first established organoid system, the mouse intestinal organoids 4,124,125, whose success laid the
foundations of choloangiocyte, thyroid and salivary gland organoids transplantation25,126,127.

Despite transplantation into humans not being common practice at present, ASC-derived organoids could already
improve the safety of genome-engineering technologies. Clinical trials applying CRISPR as a therapeutic strategy have
already started, with one standing out in particular. Systemic injection of lipid nanoparticles containing SpCas9 mRNA and
a sgRNA targeting TTR, whose mutation is associated with transthyretin amyloidosis, substantially decreased the baseline
serum TTR in all subjects128. Prior to injecting human subjects with CRISPR reagents, the efficacy of the strategy was
assessed in cynomolgus monkeys after careful selection of the sgRNA based on on-target efficiency and specificity.
However, because this treatment aims to target the entire human liver, billions of cells must undergo CRISPR engineering
with nuclease-active Cas9s. It is almost impossible to control off- and on-target adverse effects in such a vast number of
cells, because uncontrolled cell growth can be induced by a single chromosomal rearrangement, thereby compromising
safety. Moreover, despite validating specificity in cynomolgus monkeys, no safety experiment has been performed in the
cells of patients. ASC-derived organoids could fill this gap.

With the development of DSB-free genome-engineering strategies, the majority of safety concerns can be
addressed. Although the first generations of CBEs exhibited extensive off-target effects owing to overexpression of
APOBEC, ABEs have not shown any genome-wide off-target effects59. Upgraded iterations of CBEs have also reported
reduced sgRNA-independent off-target effects129. Furthermore, despite the undesirable on-target edits observed in prime
editing, edits at off-target sites are mostly absent130. One likely reason is the Cas9-nickase architecture, which requires
two nicks close to each other to induce a DSB131. Therefore, we envision that these DSB-free ‘next-generation CRISPR
tools’ will take over the therapeutic space.

CRISPR-based genome engineering in organoids holds great promise for disease modelling and for patient care in
the future. The rapid development of new genome-engineering technologies that allow the scalable induction of
increasingly complex DNA mutations further highlights the applicability of CRISPR in ASC-derived organoids.

Predicting prime editing efficiency and product purity by deep learning


Prime editing is a versatile genome editing tool but requires experimental optimization of the prime editing guide
RNA (pegRNA) to achieve high editing efficiency. Here we conducted a high-throughput screen to analyze prime editing
outcomes of 92,423 pegRNAs on a highly diverse set of 13,349 human pathogenic mutations that include base
substitutions, insertions and deletions. Based on this dataset, we identified sequence context features that influence
prime editing and trained PRIDICT (prime editing guide prediction), an attention-based bidirectional recurrent neural
network. PRIDICT reliably predicts editing rates for all small-sized genetic changes with a Spearman’s  R of 0.85 and 0.78
for intended and unintended edits, respectively. We validated PRIDICT on endogenous editing sites as well as an external
dataset and showed that pegRNAs with high (>70) versus low (<70) PRIDICT scores showed substantially increased
prime editing efficiencies in different cell types in vitro (12-fold) and in hepatocytes in vivo (tenfold), highlighting the
value of PRIDICT for basic and for translational research applications.

CRISPR-mediated generation and characterization of a Gaa homozygous c.1935C>A (p.D645E) Pompe


disease knock-in mouse model recapitulating human infantile onset-Pompe disease

Pompe disease, an autosomal recessive disorder caused by deficient lysosomal acid α-glucosidase (GAA), is
characterized by accumulation of intra-lysosomal glycogen in skeletal and oftentimes cardiac muscle. The c.1935C>A
(p.Asp645Glu) variant, the most frequent GAA pathogenic mutation in people of Southern Han Chinese ancestry, causes
infantile-onset Pompe disease (IOPD), presenting neonatally with severe hypertrophic cardiomyopathy, profound muscle
hypotonia, respiratory failure, and infantile mortality. We applied CRISPR-Cas9 homology-directed repair (HDR) using a
novel dual sgRNA approach flanking the target site to generate a Gaaem1935C>A knock-in mouse model and a myoblast cell
line carrying the Gaa c.1935C>A mutation. Herein we describe the molecular, biochemical, histological, physiological, and
behavioral characterization of 3-month-old homozygous Gaaem1935C>A mice. Homozygous Gaaem1935C>A knock-in mice
exhibited normal Gaa mRNA expression levels relative to wild-type mice, had near-abolished GAA enzymatic activity,
markedly increased tissue glycogen storage, and concomitantly impaired autophagy. Three-month-old mice demonstrated
skeletal muscle weakness and hypertrophic cardiomyopathy but no premature mortality. The Gaaem1935C>A knock-in mouse
model recapitulates multiple salient aspects of human IOPD caused by the GAA c.1935C>A pathogenic variant. It is an
ideal model to assess innovative therapies to treat IOPD, including personalized therapeutic strategies that correct
pathogenic variants, restore GAA activity and produce functional phenotypes.

Introduction

Glycogen storage disease type II, also called Pompe disease (PD; OMIM#232300), is an autosomal recessive
disorder resulting from malfunction of lysosomal acid α-glucosidase (GAA; EC 3.2.10.20) caused by mutations in
the GAA gene (OMIM#606800). GAA deficiency leads to reduced glycogen degradation and accumulation of intra-
lysosomal glycogen with pronounced glycogen storage in cardiac and skeletal muscle. Increased glycogen storage in
myocytes, brain, and spinal cord anterior horn neurons results in muscle weakness, which varies in age of onset and
severity according to the level of residual GAA enzymatic activity1. PD presents as a spectrum of phenotypes, typically
classified into infantile-onset form (IOPD) and late-onset form (LOPD) based on the time of disease onset 2,3,4. Patients
with severe IOPD have neonatal onset and a rapidly progressive disease with prominent cardiomyopathy, general muscle
weakness and hypotonia, respiratory problems and drastically reduced life expectancy. Patients with LOPD have a more
slowly progressive proximal skeletal myopathy eventually resulting in mobility problems and respiratory difficulties, but
generally do not present with hypertrophic cardiomyopathy3.

Recombinant GAA (rhGAA) enzyme replacement therapy (ERT) was developed to treat PD and approved by the
FDA in 2006. ERT improves the survival of patients and is very effective at reducing glycogen levels in heart muscle and
reversing cardiac symptoms. However, only partial recovery of muscle strength can be achieved with ERT. Surviving
children still have glycogen buildup in other muscles and experience challenges performing basic activities such as
walking, speech enunciation, eating or even breathing5,6.

The GAA gene has a very heterogeneous mutational spectrum, with more than 900 GAA variants documented in
the Pompe disease GAA variant database7,8,9. Among these variants, the GAA c.1935C>A transversion in exon 14, which
results in the p.Asp645Glu (p.D645E) missense mutation, is the most frequent pathogenic variant associated with IOPD in
the Southern Chinese, Taiwanese, and Southeast Asian populations of Han ancestry, but is not frequently reported in any
other region8,10. In Taiwanese populations, this c.1935C>A variant represents 36–80% of mutations11,12, and occur in
context of a specific haplotype with conserved polymorphic markers linked to Taiwanese Pompe IOPD patients comparing
to normal individuals. This may suggest the existence of a founder effect stemming from a diaspora of Southern Han
Chinese to Taiwan and other locations12.

Here, we report the generation of a Gaaem1935C>A (p.D645E) knock-in (KI) mouse model of PD by CRISPR-Cas9
homology-directed repair (HDR) using a dual sgRNA approach. The primary objective of this study is to characterize the
molecular, biochemical, physiological, histological, and behavioral phenotypes of this KI mouse model. We anticipate that
this novel Gaaem1935C>A mouse model will be a valuable research tool, especially when compared to other Gaa knock-out
(KO) and KI models. Altogether, preclinical KI models of PD will further accelerate our understanding of how
pathogenic GAA mutations result in variable disease onset, progression, and response to current and future therapeutic
strategies.

Results

Gaa c.1935 target locus guide RNA and donor ssODN design


In silico design of CRISPR-Cas9 guide RNAs (gRNAs) specific for the Gaac.1935 target locus was initially performed
using CRISPick, the Genetic Perturbation Platform (GPP) sgRNA Designer13. Candidate gRNAs were selected using the
following criteria: 1) top combined rank score (based on on-target efficacy and off-target specificity scores) and 2)
proximity of predicted Cas9 nuclease cut site to the Gaac.1935 target locus. Further potential gRNA off-target analysis was
performed using Genome Target Scan (GT-Scan)14. Two gRNAs were first selected to be used in
generating Gaac.1935C>A KI C2C12 cells: gRNA-1 (5′- CGCAGATGTCCGCCCCGACC-3′), and gRNA-2 (5′-
GCAGATGTCCGCCCCGACCA-3′).

Generation and characterization of Gaa c.1935C>A KI C2C12 cell line

Gaac.1935 gRNA-1 and gRNA-2 expression vectors and their respective single-stranded donor oligonucleotides
(ssODN) were electroporated into C2C12 mouse myoblasts to assess in vitro on-target editing and HDR
efficiency. Gaac.1935 gRNA-2 demonstrated higher on-target editing (26.7 ± 10.7%) and HDR efficiency (5.4 ± 3.4%) than
gRNA-1 (on-target editing: 13.2 ± 3.7%; HDR efficiency: 3.8 ± 0.6%) (Table 1). Following puromycin-resistant selection,
we were able to successfully isolate and expand Gaac.1935C>A KI C2C12 clonal cells electroporated with Gaac.1935 gRNA-1
and/or gRNA-2 and their respective donor ssODN (Table 1; Fig. 1A). Sanger sequence results confirmed that
the Gaac.1935C>A KI mutation along with a Gaac.1920C>T silent protospacer adjacent motif (PAM) mutation were successfully
introduced into the clonal line (Fig. 1B).

Figure 1

Generation and characterization of a Gaac.1935C>A C2C12 myoblast clonal cell line. (A) Sequences of guide RNAs
targeting the Gaac.1935 target locus. Horizontal arrow indicates antisense guide RNAs used in this study. Protospacer
adjacent motifs (PAM; NGG) are highlighted in color corresponding to the respective guide RNA. The  Gaac.1935 locus for
targeted cytosine to adenine transversion is highlighted in red. (B) Sanger sequencing chromatograms of controls ( Gaawt)
and clonal KI (Gaac.1935C>A) C2C12 myoblast genomic DNA at the Gaac.1935 locus. Black arrow indicates a synonymous
mutation at the PAM site (Gaac.1920C>T). Red arrow indicates the desired KI mutation (Gaac.1935C>A). Gray shaded region
indicates amino acid change from aspartic acid (Asp; GAC) to glutamic acid (Glu; GAA) at position 645. ( C) Periodic-acid
Schiff (PAS) staining of control (Gaawt) and clonal KI (Gaac.1935C>A) C2C12 myoblasts. Fixed cells were stained by PAS
staining (purple-magenta) and counterstained by hematoxylin (blue). Only Gaac.1935C>A KI myoblasts display significant
accumulated PAS staining (see arrows). Representative images were captured on a bright-field microscope at 20 × 
objective magnification. Scale bar represents 50 µm. (D) GAA enzymatic activity in Gaawt and Gaac.1935C>A C2C12
myoblasts. Very low GAA activity (~ 2.3% of wt) was measured in Gaac.1935C>A C2C12 myoblasts compared to Gaawt C2C12
myoblasts. GAA enzymatic activity was measured using a fluorometric 4-MU α-D-glycoside assay and normalized to total
amount of sample protein. Data generated from three independent experiments are shown as mean ± SD. Comparisons
were analyzed with unpaired one-tailed t-tests. ****p < 0.0001.

In comparison to Gaawt cells, Gaac.1935C>A KI cells displayed increased PAS staining, indicating the accumulation of glycogen
(Fig. 1C). Furthermore, GAA enzymatic activity was almost abolished in Gaac.1935C>A KI cells relative to Gaawt cells; less than
2.3% of WT GAA activity was detected in the KI cell line (Fig. 1D). Taken together, these results demonstrate that
our Gaac.1935C>A KI C2C12 cell line exhibits a molecular and biochemical phenotype observed in human PD and can be
utilized as an in vitro model for further study.

Generation and characterization of Gaa em1935C>A transgenic mice

Given our prior success in generating Gaaem1826dupA KI cell and transgenic mouse lines using a bi-directional, dual
overlapping gRNA strategy15, an additional gRNA-3 (5′-GGGCGTGCCCCTGGTCGGGG-3′) was introduced (Fig. 2A).
Comparing gRNA-3 with gRNA-1 by in silico analysis , gRNA-3 had higher predicted on-target efficiency (0.5487 [gRNA-3]
vs 0.3905 [gRNA-1]) by CRISPick13 as well as lower predicted off-targets by GT-Scan14. We then applied the dual
overlapping gRNA strategy in vivo using Gaac.1935 gRNA-2 and gRNA-3 with the ssODN (Fig. 2B) via pronuclear injection of
C57BL/6NJ single-cell embryos by standard methods16. 566 oocytes were injected, 531 oocytes (94.8%) were implanted,
and a total of 39 founder mice were generated. The dual overlapping gRNA method achieved a high percentage of on-
target editing activity in genome-edited founder mice (89.7%; 35 out of the 39 mutant mice) showing significant Cas9
activity/insertion/deletion (indel) mutations within the target region in Gaa. Among the founders, 15 mutants (38.5%)
exhibited on Sanger sequencing the desired c.1935C>A KI mutation (See Table 2) along with the silent PAM and seed
region mutations (Fig. 2C). Of these 15, 13 had significant (> 40% estimated by Sanger sequencing) indel mutations in
the Gaa target region. Founder #1 (> 50% for Gaac.1935C>A) and founder #2 (> 25% for Gaac.1935C>A) had the highest
percentage of the c.1935C>A mutation and lowest percentage of indels. The two founder mice and Gaawt mice underwent
whole genome sequencing (WGS) at > 50 × coverage and on-target locus alignment analysis to better quantitate the
extent of genomic mosaicism. For on-target analysis, Gaac.1935 target loci from aligned FASTQ reads were designated to
four categories: Gaac.1935 KI mutation; indel mutation; no mutation; and nonspecific mutation. WGS analysis demonstrated
highly efficient integration of the desired Gaac.1935C>A KI mutation with indel and nonspecific mutations comprising a
minority of genomic editing events in Gaac.1935C>A founder mice (Fig. 2D). For off-target analysis of the WGS data, first we
examined the seven genomic regions (Supplementary Table 1) predicted by GT-Scan as potential off-target sites of gRNA-
2 and gRNA-3, and the only result was the intended Gaac.1935C>A mutation. No single nucleotide variations (SNVs) were
detected within 500 bp of these sites. Next, we examined the founders’ WGS data for any de novo (compared to WT
WGS) C>A transversions with a de novo N > A mutation 3 or 6 bases upstream, an N>C mutation 12 bases upstream, or
an N>T mutation 15 bases upstream of the C>A suggesting ectopic SpCas9/HDR activity. No ectopic HDR signatures
were identified in the genomes of either founder #1 or founder #2.

Figure 2
Generation of a Gaaem1935C>A transgenic mouse line. (A) Dual overlapping guide RNA approach targeting
c.1935
the Gaa  target locus. Arrowhead direction indicates whether guide RNA is sense (right) or antisense (left). PAM
sequences (NGG) are highlighted in color corresponding to the guide RNA arrows. The Gaac.1935 locus for target adenine to
cytosine nucelotide transversion is highlighted in red. Expected Cas9 nuclease cut sites are shown as vertical arrows in
color corresponding to the guide RNA arrows. (B) Sequence of the target locus for integration (top) aligned with the
ssODN (bottom) to introduce the Gaac1935C>A mutation (green box). PAM motifs are indicated in gold (gRNA-2) or blue
(gRNA-3). Installed synonymous variants at PAM sites ( Gaac.1920C>T, Gaac.1932G>A) and the desired KI mutation are
highlighted in red. Installed gRNA seed region variants ( Gaac.1923G>C, Gaac.1929G>A) are highlighted in green. (C) Sequencing
chromatograms of control (Gaawt), founder #1 (Gaac.1935 Founder #1), and founder #2 (Gaac.1935 Founder #2). Black arrows indicate
synonymous variant edits at PAM sites ( Gaac.1920, Gaac.1932) or gRNA seed regions (Gaac.1923, Gaac.1929). Red arrows indicate
the desired KI mutation (Gaac.1935C>A). Gray shaded region indicates amino acids at position 645 for each mouse. (D) WGS
analysis (> 50 × read depth) of the Gaac.1935 locus in G0 founder #1 and G0 founder #2. WGS analysis demonstrates highly
efficient on-target genome-editing in these founder mice. Data are presented as stacked bar graphs indicating the
percentage of WGS reads for each event category. Gray: nonspecific Gaa mutations; black: no Gaa mutation; red:
intended Gaac.1935C>A mutation and associated synonymous variants; and blue: Gaa insertion/deletions. (E) Pedigree
diagram of mating scheme to segregate the intended Gaaem1935C>A KI allele from mosaic CRISPR-generated founder mice
for generation of homozygous Gaaem1935C>A KI mice. Males are represented as squares and females are represented as
circles.

Founder #1 and founder #2 were mated with WT animals, and their offspring G 1 HETs (male from founder #1
and female from founder #2) were further crossed to obtain the first homozygous c.1935C>A KI mice in the
G2 generation (Fig. 2E). Subsequently, mice harboring the c.1935C>A Gaa variant were backcrossed 10 generations to
the C57BL/6NJ background before KI mice were characterized. As the generation of our KI mice involved CRISPR
endonuclease-mediated mutation introduction, we followed the International Committee on Standardized Genetic
Nomenclature for Mice17 and named the KI transgenic mice as Gaaem1935C>A.

Gaa em1935C>A KI mice have severe GAA enzymatic deficiency and glycogen storage in cardiac, skeletal muscle, and
brain tissue

The missense Gaac.1935C>A mutation in exon 14 of the Gaa gene leads to an amino acid substitution; therefore, we


did not expect any nonsense-mediated decay in Gaac.1935C>A mRNA transcripts. The comparative ΔC t between
mouse Gaa and housekeeping gene Gapdh acquired by RT-PCR among WT, HET, and KI groups are almost identical,
indicating the Gaac.1935C>A mutation does not affect Gaa mRNA levels (Fig. 3A).

Figure 3
Molecular and biochemical characterization of Gaaem1935C>A KI mice. (A) Gaa mRNA expression in tail or liver biopsy
samples from 3-month-old WT (n = 4; black bar), HET (n = 8; striped bar), and KI ( Gaaem1935C>A; n = 6; white bar)
mice Gaa expression levels were measured by TaqMan probe-based quantitative real-time PCR using the ΔC t method for
comparison of the target gene (Gaa) to the reference gene (Gapdh). The average Ct value from WT samples were further
utilized to normalize with other groups. No significant difference in Gaa mRNA transcript expression was detected among
WT, HET, and KI samples. (B) GAA enzyme activity in heart, diaphragm, and gastrocnemius muscle tissues and brain
homogenate from WT (n = 5; black bars), HET (n = 5; striped bars), KI ( Gaaem1935C>A; n = 4; white bars), and KO
(Gaatm1Rabn; n = 3; grey bars) mice was measured using a fluorometric 4-MU α-D-glucopyranoside assay and normalized to
the amount of sample protein. (C) Glycogen level was measured in the same tissues used for analysis in (B) using a
colorimetric assay. KO mice displayed significantly elevated glycogen levels relative to WT and HET mice in all tissues
assayed. However, KI mice showed a significant elevation of glycogen levels in muscle tissues, but no significant elevation
in brain. The amount of glycogen was normalized to the amount of sample protein. Data were generated from at least
three independent experiments and shown as mean ± SD. All comparisons were analyzed using one-way ANOVA with the
Tukey post-hoc test. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. ns: not significant.

GAA enzymatic activity was measured with artificial fluorometric 4-MU substrate as described previously15. The
results were consistent with the other findings from this study, showing that the HET group had close to 50% of the level
of enzymatic activity observed in the WT group in each muscle tissue and brain tissue sample tested, indicating that the
one WT allele produced functional enzyme, but not the c.1935C>A allele. For comparative purposes, we
acquired Gaa homozygous knock-out (KO) (B6;129-Gaatm1Rabn/J; exon 6 knock-out)18 mouse tissues from Jackson
Laboratory (Bar Harbor, ME). Compared to tissue from WT or HET animals, tissue from KI ( Gaaem1935C>A) and KO
(Gaatm1Rabn/J) animals had significantly decreased GAA enzymatic activity (about 1% of WT levels) (Fig. 3B).

Compared to the unaffected WT or HET groups, KI and KO mice had abnormally elevated lysosomal glycogen
storage in heart, diaphragm, and gastrocnemius muscle tissue. Interestingly, increased glycogen storage in whole-brain
homogenates was observed in KO mice, but not in KI mice, which had a slight, but not statistically significant, increase in
glycogen load (Fig. 3C).

Gaa em1935C>A KI mice show increased muscle glycogen content and elevated LAMP1 marker in brain regions
PAS staining is routinely used to demonstrate abnormal carbohydrate accumulation in muscle tissue19. PAS
staining was performed in different muscle tissues (heart, diaphragm, and gastrocnemius) from 3-month-old KI mice.
Scattered red to magenta PAS staining particles representing the accumulation of glycogen were observed in all three
muscle tissue types in the KI mice, but not in WT animals (Fig. 4A). PAS staining with diastase (PAS-D), an enzyme that
digests only glycogen, was also applied to consecutive slides to confirm that the particles consisted of glycogen. A
decrease in red/magenta signal confirms that excessive accumulation products in tissues comprised only glycogen
(Supplementary Fig. 1).
Figure 4

Tissue pathology in Gaaem1935C>A KI mice showing glycogen storage in muscles and lysosomal abnormality in
brains. (A) Representative bright-field images of heart, diaphragm, and gastrocnemius sections from 3-month-old WT and
KI mice, stained with hematoxylin/PAS. Areas of abnormal glycogen accumulation (arrowheads) in cardiac and skeletal
muscle tissues were observed in KI mice compared to WT mice (top). (B) Immunohistostaining with mouse anti-LAMP1
antibody showing increased cell body staining (arrowheads) in frontal and hippocampal neurons and Purkinje cells of KI
mice from all three representative brain areas (frontal cortex, hippocampus and cerebellum). Scale bar represents
100 µm.

The lysosomal associated membrane protein-1 (LAMP1) is commonly used as a biomarker for lysosomal storage.
LAMP1 staining in the brain sections from 3-month-old WT and KI mice were examined in three representative areas of
the brain (frontal cortex, hippocampus, and cerebellum), demonstrating markedly increased LAMP1 immunoreactivity in
KI neuronal cell bodies compared to WT controls (Fig. 4B).

In summary, histopathology showed that the KI mice display early pathological glycogen accumulation in muscle
tissues, which is analogous to muscle pathology in IOPD patients. In addition, the KI mice display a more pronounced
lysosomal burden in the brain areas as early as 3-months of age compared to WT animals.

Gaa em1935C>A KI mice have impaired skeletal muscle autophagy


Excessive autophagic buildup is well-documented in PD patients and in PD mice20,21 and may be a potential
mechanism of PD pathogenesis. Microtubule-associated protein light chain 3 (LC3B) is a protein component of
autophagosomes, which are quickly degraded under normal physiological conditions and are hardly detectable. Cleavage
of LC3B at the carboxy terminus immediately following synthesis yields the cytosolic, non-autophagosome bound LC3B-I
form. LC3B-I is converted to autophagosome-bound LC3B-II via conjugation to phosphatidylethanolamine when
autophagic processes are activated. Following autophagosome-lysosome fusion, LC3B-II is then hydrolyzed back to LC3B-
I via ATG522.

To examine autophagic status of the Gaaem1935C>A KI mice, western blotting for LC3B was performed using tissue
homogenate (Fig. 5A and Supplementary Fig. 2). Both KI and KO models demonstrate elevated synthesis of LC3B-I in
gastrocnemius, evidence of upregulated autophagy (Fig. 5B); further, autophagosomal LC3B-II is increased in KI heart,
diaphragm, and gastrocnemius but not in brain (Fig. 5C). The ratio of LC3B-II:LC3B-I is increased (Fig. 5D),
demonstrating impaired autophagosome-lysosome fusion, in skeletal muscles (diaphragm and gastrocnemius) but not
cardiac muscle of the KI model. This is an observation similar to what has been observed in both  Gaaem1826dupA KI and KO
mouse models15,21.

Figure 5
Autophagy impairment in the Gaac.1935C>A KI mouse model. (A) Representative western blot images of autophagy-
associated proteins (LC3B-I and LC3B-II) from tissue homogenate from heart, diaphragm, gastrocnemius, and brain of
WT (n = 3; black bars), HET (n = 3; striped bars), KI (n = 4; white bars), and Gaatm1Rabn (KO, n = 3; grey bars) mice.
Prominent LC3B-II bands can be seen in KI and KO tissues. (B) LC3B-I and (C) LC3B-II protein levels normalized to the
amount of total protein. Cytosolic LC3B-I is markedly elevated in KI and KO gastrocnemius muscle; autophagosomal
LC3B-II is markedly elevated in KI and KO gastrocnemius, and moderately elevated in KI heart and diaphragm. ( D) LC3B-
II/LC3B-I ratio normalized to WT. Impaired autolysosomal formation (increased LC3B-II/LC3B-I ratio) is observed in KI
skeletal muscle tissues. The ratio of LC3B-II and LC3B-I protein intensity was quantified by densitometric analysis of the
western blots, and the ratio was further normalized with WT in each tissue assayed. Data were generated from at least
three independent western blots and values are shown as mean ± SD. All comparisons were analyzed using one-way
ANOVA with the Tukey post-hoc test. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001.

Gaa em1935C>A KI mice display left ventricular cardiac hypertrophy at 3 months of age


Neonatal-onset hypertrophic cardiomyopathy is a common clinical presentation in patients with IOPD. To explore
the anatomical features and physiological function of hearts in the KI mice, echocardiography was performed on 3-
month-old mice. M-mode images obtained by echocardiography were used to measure multiple parameters including wall
thickness, internal diameter, and heart rate. Many additional functional parameters can be derived from these
measurements to determine temporal left ventricular (LV) wall motion as an index for LV contractile patterns and
chamber size (Fig. 6A).

Figure 6
Three-month-old Gaaem1935C>A KI mice display anatomical features of left ventricular cardiac hypertrophy. (A)
Representative M-mode echocardiographic images showing cardiac dimensions: IVSd: yellow arrows; LVPWd: red arrows,
and LVIDd / LVIDs: white arrows. (B) KI IVSd and LVPWd are significantly increased versus WT and HET animals,
indicative of concentric hypertrophic cardiomyopathy. (C) LVIDd and LVIDs do not significantly differ between WT, HET,
and KI animals. (D) KI fractional shortening is significantly increased versus WT indicative of hyperdynamic contractility;
KI LVMI is significantly increased versus WT indicative again of hypertrophic cardiomyopathy. All measurements from WT
(n = 12; black bars), HET (n = 10; striped bars), and KI ( Gaa.em1935C>A; n = 10; white bars) mice were obtained from 3-
month-old mice. Data are shown as mean ± SD. Heart rate (HR) was maintained greater than 500 bpm throughout
measurements. All comparisons were analyzed using one-way ANOVA with the Tukey post-hoc test. * p < 0.05, **p < 
0.01, ***p < 0.001, ****p < 0.0001. ns: not significant.

Increases in interventricular septal diameter (IVSd), LV posterior wall diameter (LVPWd), and LV mass index
(LVMI) were observed in KI mice, compared to WT and/or HET mice (Fig. 6B), indicating pronounced hypertrophic
cardiomyopathy. Measurements of myocardial contraction showed a slight decrease in LV systolic internal diameter
(LVIDs) in the KI mouse, but no significant difference in LV diastolic internal diameter (LVIDd) was observed among WT,
HET, and KI mice (Fig. 6C). Increased fractional shortening indicative of cardiac contractile dysfunction was observed in
KI mice (Fig. 6D). Echocardiographic data therefore indicates early hypertrophic cardiomyopathy phenotypes in 3-month-
old Gaaem1935C>A KI mice. The data presented in Fig. 5 show no gender differences in these parameters (Supplementary
Fig. 3).

Reduced forelimb grip strength in Gaa em1935C>A KI mice

The forelimb grip strength test is commonly used to evaluate neuromuscular dysfunction in mice by measuring
the deterioration of skeletal muscle. Peak tension force was recorded as the mice lost their grip on the force transducer
bar and normalized to bodyweight for analysis by gender group.

First, mouse body weight is known to differ between genders at 3 months of age23. The mean ± SD body
weights of male and female mice in our study cohort were 28.73 ± 3.13 g and 21.76 ± 2.36 g, respectively. In each
gender cohort, there was no significant difference in body weight across WT, HET, and KI mice (Fig.  7). In addition, at
3 months of age, the male Gaaem1935C>A KI mouse showed a significant reduction (~ 19%) in normalized peak tension force
compared to WT mice, indicating decreased forelimb muscle strength in KI mice (Fig. 7). This reduction was observed
only in male KI mice, but not in female KI mice.

Figure 7
Reduced forelimb grip strength in male Gaaem1935C>A transgenic mice. Forelimb peak tension force and body mass
measurements in 3-month-old male WT (n = 12; black bars), HET (n = 12; striped bars), and KI (n = 14; white bars) mice
(top panel) and female WT (n = 12; black bars), HET (n = 12; striped bars), and KI (n = 11; white bars) mice (bottom
panel). No significant difference in body weight in each gender cohort among WT, HET and ki groups. Male KI mice
demonstrate decreased normalized peak tension force consistent with skeletal muscle weakness. Forelimb peak tension
force was measured using a grip strength meter and taken as the average of 9 trials over 3 days. Data are shown as
mean ± SD. All comparisons were analyzed using one-way ANOVA with the Tukey post-hoc test. * p < 0.05, **p < 0.01,
***p < 0.001. ns: not significant.

Discussion

In populations of Southern Han ancestry, the GAA c.1935C>A (p.Asp645Glu) mutation represents 36%-80% of


mutations11,12,24 in IOPD patients. We have successfully applied CRISPR/Cas9 genome editing to install
the Gaa c.1935C>A mutation in a mouse myoblast C2C12 cell line and create a novel Gaaem1935C>A KI mouse model; each
of which represents a valuable resource for studying IOPD. The KI C2C12 line demonstrates severe GAA enzyme
deficiency and glycogen accumulation; the KI mouse model successfully recapitulates molecular, biochemical, histologic,
and phenotypic aspects of human IOPD.

While no phenotypic differences were noted between GAA c.1935C>A HET and WT mice aside from the expected
50% reduction in HET GAA enzymatic activity, the homozygous KI mice demonstrated a significant, PD-like phenotype. KI
mice had normal Gaa mRNA levels with significantly reduced level of GAA hydrolysis activity (about 1% of WT) in heart
and skeletal muscle, as well as brain tissue. This aligns with observed levels of low GAA enzyme activity (0.08–0.82% of
normal range for control) previously measured in homozygous GAA c.1935C>A patient fibroblasts25. Significant increases
in glycogen storage were observed in KI mouse muscle tissues, consistent with the human GAA c.1935C>A IOPD
phenotype. In addition, increased lysosomal burden, as indicated by LAMP1 immunostaining, was demonstrated in brain
tissue from Gaaem1935C>A KI mice. Autophagic impairment was noted in skeletal muscle tissues, consistent with what is
observed in human PD and other murine PD models18. Gaaem1935C>A mice developed hypertrophic cardiomyopathy at
approximately two months of age, which becomes quite marked at three months of age. This muscle weakness
phenotype may be due to a combination of sequelae from cardiomyopathy, impairment of lysosomal-autophagosomal
fusion into autolysosomes, and catabolism of myofibril contractile proteins22. Studies are ongoing to assess the life span,
natural history, and phenotypic progression of the model.

A significant divergence of the model from human GAA c.1935C>A IOPD is the lack of infantile mortality in KI
mice. This KI mouse, along with the Gaaem1826dupA KI mouse strain previously generated in our laboratory15 and other
previously published Gaa KO models18,26,27, all demonstrate null or nearly-zero GAA enzyme activity. Nevertheless, no
neonatal mortality has been observed in any model, while neonatal death is the inevitable clinical outcome in untreated
IOPD patients28,29. Only one Gaa KO model on a DBA/2J background (homozygous Ltbp4Δ36 alleles) is reported to have a
shorter lifespan (but still not neonatal lethality) in male mice, compared to male Gaa KO mice on the C57BL/6;129
background27. The DBA/2J genetic background may exacerbate the severity of respiratory muscle weakness caused
by Gaa KO deletion, leading to earlier death than is observed in other KO models27.

Genome editing represents a new approach to the treatment of PD, compared to traditional treatments like ERT
or gene therapy. A mouse that both recapitulates clinical features of human disease and harbors orthologous pathologic
gene variants serves as a valuable system for the development of innovative therapies and, most importantly, studies
enabling eventual clinical trials in humans. As this model undergoes in-depth validation and studies of its clinical and
immune response to standard intravenous rhGAA enzyme infusions, subsequent avenues for exploration include variant
rhGAA enzyme infusions, gene therapy, and CRISPR-based genomic editing. The latter approach can be performed using
CRISPR “prime editing”, which is capable of targeting more than 90% of known pathogenic mutations, including the
c.1935C>A transversion30. In addition, multiple tissues can be obtained or derived from our Gaaem1935C>A KI mouse to
investigate the potential tissue-specific efficacy of genome correction-based therapeutics in vitro, before in vivo studies
are attempted. With these advances, and high sequence conservation surrounding the mutation, the  Gaaem1935C>A KI
mouse represents an ideal candidate for the development of personalized therapeutics like prime editing that correct
pathogenic variants, restore GAA enzyme activity and further improve functional phenotypes before translational
application in the clinic.

Materials and methods

Gaa c.1935 guide RNA SpCas9 expression vector cloning


All oligonucleotides applied in this project were manufactured by Integrated DNA Technologies (Coralville, IA).
Guide RNA (gRNA) oligonucleotides with BbsI (New England Biolabs) restriction enzyme overhangs were designed with
forward oligo (5′-CACCG(gRNA)-3′) and reverse oligo (5′-AAAC(reverse complement gRNA)C-3′). Complementary gRNA
oligonucleotides were cloned into pSpCas9(BB)-2A-Puro plasmid (pX459; Addgene plasmid ID# 48139) using the  BbsI
site. Positive pX459-gRNA clones were confirmed by Sanger sequencing and further expanded using the PureLink HiPure
Plasmid Midiprep Kit (Invitrogen). All donor ssODNs were designed with 50-bp homology arms flanking the target locus
and synonymous mutations in the PAM and seed region (5 nt upstream of PAM) to prevent further Cas9 activity after
successful HDR.

In vitro testing of Gaa c.1935 guide RNAs

pX459-Gaac.1935 gRNA expression vectors and donor ssODNs were transfected into murine C2C12 myoblast cells
(ATCC CRL-1772) using the Neon™ Transfection System (ThermoFisher Scientific) as previously described 15. In short, 3 
× 105 cells were mixed with 4.5 μg gRNA expression vector(s) and 450 nM ssODN (Table 1), then and electroporated
using the following parameters: pulse voltage, 1650 V; pulse width: 10 ms; pulse number: 3. Forty-eight hours post-
transfection, cellular genomic DNA was obtained for Sanger sequencing around the Gaac.1935 target locus. SpCas9 nuclease
activity and HDR KI efficiency were determined by Tracking of Indels by Decomposition (TIDE)31 or Tracking
of Insertion, Deletions, and Recombination events (TIDER)32 analysis of DNA sequence electropherogram files.

Generation of a Gaa c.1935C>A KI C2C12 cell line

Similar parameters to those described above were applied to transfect 4.5 μg of gRNA-1, gRNA-2 under the U6
promoter in the pX459 expression vector and 450 nM ssODN into 3 × 105 C2C12 myoblast cells. pCMV6-AC-GFP (OriGene)
was used as a marker for transfection; electroporated cells were selected for successful pX459 transfection by adding
2.5 μg/mL puromycin dihydrochloride (Sigma-Aldrich) to the culture medium beginning 24 h after electroporation.
Puromycin was supplied every 48 h until all pCMV6-AC-GFP-transfected cells were no longer viable. After puromycin-
resistance screening, single cell clones were selected by standard serial dilution methods in 96-well plates in the presence
of 2.5 μg/mL puromycin dihydrochloride. Sanger sequencing was used to confirm the genotype of each single cell clone.

Generation of Gaa em1935C>A KI mice

The generation of Gaaem1935C>A KI mice was performed at the University of California-Irvine Transgenic Mouse
Core, and all study procedures were reviewed and approved under IACUC protocol #AUP16–63. Standard methods were
applied to produce pronuclear stage C57BL/6NJ embryos16. In brief, 3 μM crRNA/tracrRNA/3xNLS-Cas9 protein and
10 ng/μL ssODN were injected into pronuclear stage C57BL/6NJ embryos (Table 2). Surviving embryos were implanted
into oviducts of 0.5dpc ICR pseudo-pregnant females.

Whole-genome sequencing and analysis

Whole genome sequencing (WGS) and analyses were performed on tail samples from G 0 wild type (Gaawt),
G0 founder #1 (Gaac.1935Founder#1), and G0 founder #2 (Gaac.1935Founder#2) mice. In brief, WGS was performed and analyzed on
an Illumina HiSeq X Ten Sequencer at 40–50× read depth (Fulgent Genetics) using TrueSeq DNA libraries created from
1 μg fragmented genomic DNA. WGS on-target and off-target analyses were performed on the OnRamp BioInformatics
platform. Data were aligned to the Mouse genome (mm10) using BWA33. PCR artifacts were identified with the memtest
utility from Sentieon34, and filtered out using samtools35. Alignments were de-duplicated and realigned around insertions
and deletions using LocusCollector, Dedup, and Realigner from Sentieon. SNV calling was performed with GVCFtyper from
Sentieon, using the mouse dbSNP 142 data (http://hgdownload.cse.ucsc.edu/goldenpath/mm10/database/snp142.txt.gz)
as the known SNPs. Known SNPs and variants falling in un-located chromosomes were removed from analysis.

For off-target analysis, we used SNVs that had a C>A transversion and any one of the four following criteria
indicating an ectopic HDR event: a de novo N → A mutation 3 bases upstream, N →A mutation 6 bases upstream, N →C
mutation 12 bases upstream, or N → T mutation 15 bases upstream. This search step was repeated for the reverse
complement sequences. The fully processed BAM files (after Realigner) were used as input to the Manta structural variant
caller36. For each of the non-wild-type (WT) samples, Manta somatic caller was applied with the C57BL6-WT sample as
“normal” and the sample of interest as “tumor,” thereby subtracting the background structural variants in C57BL6-WT
compared to mm10. Vcf (https://vcftools.github.io) was used to annotate the output VCF files from Manta.

Experimental animals

The G2 mice were backcrossed 10 generations onto a C57BL/6NJ background before any characterization was
performed. Mice received ad libitum Teklad Global 16% Protein Rodent Diet (Envigo, Indianapolis, IN) and water in
temperature-controlled environment. Animal were housed in groups of 4 mice/cage, separated by gender except for
mating trios, and provided with 14-h light and 10-h dark cycle. The use and care of animals in this study adhered to the
guidelines of the NIH Guide for the Care and Use of Laboratory Animals, as utilized by the CHOC Children’s Institutional
Animal Care and Use Committee under CHOC IACUC protocol #160,902. In addition, all experiments in this study were
carried out in compliance with ARRIVE guidelines (https://arriveguidelines.org), and all methods were performed in
accordance with relevant guidelines and regulations.

Genotyping was performed by Sanger sequencing to confirm the Gaac1935 target locus with the following
primers: GAA_c1935(F), 5′- CAGGCGTTAGGACAAATGGA-3′; GAA_c1935(R), 5′- TTCCAGCAGGTATGGGATTAAC-3′.
Heterozygous (Gaawt/em1935C>A) males and females were crossed to obtain homozygous KI, heterozygous (HET), and WT
mice for this study. Experiments were performed on age-matched mice of either gender (usually littermates).
Homozygous knock-out (KO) (B6;129-Gaatm1Rabn/J)18 mouse tissues for comparative molecular and biochemical analyses
were acquired from Jackson Laboratory (Bar Harbor, ME).

Quantitative real-time PCR

Total RNA was extracted from tail tip or liver homogenate using a Direct-zol RNA miniprep kit (Zymo Research)
and reverse-transcribed using an iScript™ cDNA Synthesis Kit (Bio-Rad). As per the manufacturer’s instructions, both
oligo(dT) and random hexamer primers were used to synthesize cDNA. The resulting cDNA was diluted tenfold, and a 2-μl
aliquot was used in a 12-μl PCR reaction with SsoAdvanced Universal Probes Supermix (Bio-Rad) and specific TaqMan
primer/probe assays for Gaa (Taqman assay #Mm00484581_m1) and Gapdh (TaqMan assay #Mm99999915_g1). PCR
reactions were run in triplicate and quantified with Bio-Rad CFX96 Touch Real-Time PCR Detection. Gapdh was used as
an internal reference gene, and relative quantification of Gaa gene expression was calculated using the comparative
ΔCt method for the difference in Ct values of Gaa and Gapdh in the given sample. ΔCt values were further normalized with
the average of the Ct value of wildtype samples.
Biochemical analyses

For the GAA activity assay, phosphate-buffered saline (PBS)-flushed mouse tissues or C2C12 myoblast cell pellets
were homogenized in CelLytic M cell lysis reagent (MilliporeSigma). Acidic α-glucosidase enzyme activity was assessed as
previously described with minor modifications15,37. In brief, 10 µL tissue homogenate was mixed with 10 µL of 6 mM 4-
methylumbelliferyl-α-D-glucopyranoside substrate (MilliporeSigma) in McIlvaine citrate/phosphate buffer (pH 4.3) and
quenched with 180 µL glycine carbonate buffer (pH 10.5) after 1-h incubation at 37 °C in a 96-well plate. GAA activity
reactions were run in triplicate, and fluorescence measurements were obtained using an Infinite M Plex
spectrofluorophotometer (Tecan) at excitation and emission wavelengths of 360 nm and 450 nm, respectively. One GAA
enzymatic activity unit was defined as 1 nmol converted substrate per hour. Protein concentration was estimated using a
Pierce BCA assay kit (ThermoFisher), using bovine serum albumin as a standard. Specific activity was calculated as units
of GAA enzymatic activity per mg of protein.

Tissue glycogen levels were measured using a glycogen assay kit (Sigma-Aldrich) according to the manufacturer’s
instructions. In brief, 10 µL tissue homogenate was incubated with hydrolysis enzyme reaction mixture in a final volume
of 50 µL at room temperature for 30 min before adding 50 µL development enzyme reaction mixture for 30 min incubation
at room temperature. Absorbance at 570 nm was measured using an Infinite M Plex spectrofluorophotometer (Tecan). A
standard curve was generated using standard glycogen solution provided in the assay kit. Glycogen quantification assays
were performed in duplicate, and an extra reaction without hydrolytic enzyme treatment was used for background
correction of endogenous glucose levels in each sample. Tissue glycogen level is expressed as µg of glycogen per mg of
protein.

Tissue harvesting, processing, and histological staining

Three-month-old mice were euthanized using CO 2 asphyxiation and transcardially perfused with PBS. Brains were
dissected sagittally along the midline; left hemispheres were rapidly frozen and stored at − 80 °C for biochemical analysis,
and right hemispheres were post-fixed at 4 °C in zinc formalin. Heart, diaphragm, and gastrocnemius muscle were also
harvested. Half of the tissue samples for biochemical studies were rapidly frozen and the other half of tissues were post-
fixed at 4 °C in zinc formalin.

Samples for histological staining were processed and embedded in paraffin blocks for sectioning at 4-μm
thickness, and periodic acid-Schiff (PAS) staining (Sigma-Aldrich) was performed according to the manufacturer's
instructions. EVOS M5000 imaging system (Invitrogen) was used to capture representative images at 20× objective
magnification on RGB-mode illumination.

LAMP1 immunohistochemistry staining in paraffin-embedded brain sections from study animals were performed
with anti-LAMP1 polyclonal antibody (Cat#24170, Abcam, Waltham, MA) using an ImmPACT DAB Substrate Kit with
Peroxidase (Vector Laboratories, Burlingame, CA) following manufacturer’s instructions. Paraffin sections were
deparaffinized and endogenous peroxidase activity was quenched by immersion in 1.5% hydrogen peroxide followed by
heat-induced epitope retrieval in sodium citrate buffer. The sections were subsequently incubated overnight at 4  °C with
mouse anti-LAMP1 antibody (1:50) following secondary antibody amplification before visualizing with diaminobenzidine
(DAB) as chromogen. Represented images were captured by Keyence BZ-X800 microscopes (Keyence American, Itasca,
IL) at 20× objective magnification with the same parameters of exposure.

LC3B western blot analysis

Frozen mouse tissues were homogenized in CelLytic M cell lysis reagent (MilliporeSigma) and cOmplete protease
inhibitors (Roche) was added to prevent protein degradation. Total protein concentration of the supernatants from
centrifuged tissue lysates was determined by BCA protein assay (Pierce). Eight micrograms of total protein lysate were
resolved on 4–15% Mini-PROTEAN TGX Stain-free gels (Bio-Rad) and transferred onto Immuno-Blot PVDF membranes
(Bio-Rad). Membrane blots were blocked with EveryBlot blocking buffer (Bio-Rad) and probed with an anti-LC3B primary
antibody (cat# L7543, Sigma) followed by an HRP-conjugated secondary antibody before applying ECL HRP substrate
(Bio-Rad) for chemiluminescence. Stain-free gels and blots were imaged using the stain-free and chemiluminescence
settings on the ChemiDoc™ MP imaging system (Bio-Rad). LC3B-I and LC3B-II protein levels were measured by
densitometric analysis of western blots using Fuji software (ImageJ version 2.0)38. Signals were normalized to the
amount of total protein as determined by densitometric analysis of stain-free gels. LC3B-II/LC3B-I ratio was normalized to
WT for each organ.

Murine echocardiography

Transthoracic echocardiography (M-mode and 2-dimensional echocardiography) was performed using a Vevo
2100 high-resolution ultrasound system, with a linear transducer of 32–55 MHz (VisualSonics Inc.). Chest fur was
removed by using depilatory cream one day prior to the procedure. Mice were kept warm on a heated platform (37 °C)
and anesthetized with 5% isoflurane delivered via nose cone for 15 s, then maintained at 0.5% throughout the
echocardiography examination. Small needle electrodes for simultaneous electrocardiography were inserted into one
upper and one lower limb. Measurements of chamber dimensions and wall thickness were performed while heartbeats of
the mice were greater than 500 beats per minute (bpm). Percentage fractional shortening (%FS) was used as an
indicator of left ventricular systolic cardiac function and calculated as follows: %FS = (LVIDd – LVIDs)/LVIDd * 100.

Forelimb grip strength assay

Forelimb grip strength was measured as previously described39. Following acclimatization (at least one hour prior
to grip strength measurement), each mouse was weighed and placed on a forelimb pull bar attached to an isometric force
transducer (Columbus Instruments, Columbus, OH, USA). The mouse was pulled away from the bar by its tail, and the
force required was recorded by the force transducer. Over 3 consecutive days, each mouse performed 3 pulls per day for
a total of 9 pulls per test session. Peak tension force (N) was calculated as the average of each subject’s 9 pulls over the
test session and normalized by body weight.

Statistical analysis

All graphs and statistical comparisons were generated using GraphPad Prism 9. Statistical analyses were
performed using the two-tailed unpaired t-test or one-way ANOVA followed by Tukey’s HSD test. All data are presented in
this study as mean ± standard deviation (SD).

Efficient simultaneous double DNA knock-in in murine embryonic stem cells by CRISPR/Cas9
ribonucleoprotein-mediated circular plasmid targeting for generating gene-manipulated mice

Gene targeting of embryonic stem (ES) cells followed by chimera production has been conventionally used for
developing gene-manipulated mice. Although direct knock-in (KI) using murine zygote via CRISPR/Cas9-mediated
genome editing has been reported, ES cell targeting still has merits, e.g., high throughput work can be performed in vitro.
In this study, we first compared the KI efficiency of mouse ES cells with CRISPR/Cas9 expression vector and
ribonucleoprotein (RNP), and confirmed that KI efficiency was significantly increased by using RNP. Using CRISPR/Cas9
RNP and circular plasmid with homologous arms as a targeting vector, knock-in within ES cell clones could be obtained
efficiently without drug selection, thus potentially shortening the vector construction or cell culture period. Moreover, by
incorporating a drug-resistant cassette into the targeting vectors, double DNA KI can be simultaneously achieved at high
efficiency by a single electroporation. This technique will help to facilitate the production of genetically modified mouse
models that are fundamental for exploring topics related to human and mammalian biology.

Introduction

The development of gene-manipulated mouse models and the determination of their phenotypic features is a
robust technology that is used to precisely analyze gene functions or physiological behaviors of specific cells in the animal
body. Several important biological phenomena have been uncovered by using various such gene-manipulated animal
models. Locus-specific exogenous gene integration technique, termed gene knock-in (KI), has several merits, e.g., an
accurate integration site, controlled copy number, and regulated KI gene expression, which more closely resembles the
locus-specific gene expression patterns of particular cell types1,2. The conventional way to develop gene KI mouse
models is to use the embryonic stem cell (ES cells) targeting technique followed by the ES cell injection to the
preimplantation embryo to produce chimeras3,4,5.

The induction of double-strand breaks in the genome increases the integration efficiency of the transgene by
homologous recombination in mammalian cells6. More recently, the introduction of site-specific endonucleases such as
ZFN7,8, TALEN9,10, or CRISPR/Cas911, which induce double-strand breaks at specified sequences, have made it possible
to modify the zygote genome directly. Among these genome editing techniques, CRISPR/Cas9 is now used in many
laboratories as a powerful tool for embryo genome editing due to its simplicity and accuracy. CRISPR/Cas9-mediated
locus-specific gene manipulation using mouse embryos was first reported by introducing single-guide RNA and Cas9
mRNA into the pronucleus of fertilized zygotes using micromanipulation11, and several papers have reported that analysis
of F0 mice generated by CRISPR/Cas9-mediated zygote genome editing revealed several important gene functions in
some fields such as sleeping physiology12,13 or spermatogenesis14. Not only for induction of indel mutation11, but site-
specific short nucleotide KI such as single nucleotide replacements11 or peptide tags insertion15 into murine zygotes can
also accomplish by using single-strand oligonucleotide (ssODN) as KI donor via homology-directed repair. In addition to
the short fragment KI, DNA sequences longer than several kilo-base pairs (kbp) can be integrated into the zygote
genome by CRISPR/Cas9-mediated genome editing using double-strand plasmid15,16, long single-strand DNA
(lssDNA)17,18, or adeno-associated virus (AAV) as KI donors19,20,21.

Although zygote genome editing using several mouse strains such as C3H/HeJ22 or B6D2F111,15 has been
reported, the C57BL/6 strain is commonly used because of its flexibility in vitro handling, including hyper ovulation, in
vitro fertilization, or in vitro embryo culture, as well as pure genetic background. On the other hand, zygote genome
editing is still challenging in particular strains, such as BALB/c, which is an important strain for specific research areas
such as immunology, because of their lower response to hyper ovulation or more susceptibility to in vitro embryo
manipulations23.

Compared to the direct zygote genome edition, ES cell targeting is believed to have still several advantages for
making gene-manipulated mouse models, e.g., high throughput genotyping screening can be performed in vitro , or ES
cell lines established from a variety of inbred strains, including BALB/c24, DBA225 or C57BL/6, can be used for gene
manipulation. Another considerable merit is that establishing KI ES cell clones having multiple exogenous genes, e.g.,
multi-fluorescent reporters, or CreERT2 and loxP-STOP-loxP, at different genomic loci is possible by performing sequential
KI26. However, repetitive gene targeting takes time, and the pluripotency of ES cells is known to decrease with longer
culture periods27. Therefore, there is a need to develop a highly efficient technology that can KI long DNA fragments at
multiple loci in a single gene transfer.

In this study, we describe a simple and highly efficient CRISPR/Cas9-mediated KI method in mouse ES cells for
developing gene-manipulated mouse models. A single gene KI can be achieved efficiently without drug selection by using
a circular plasmid as a targeting vector accompanied by KI site-specific induction of double-strand break via
CRISPR/Cas9. We confirmed that this KI method was applicable in various chromosomal loci in various ES cell lines
including inbred C57BL/6, BALB/c, or hybrid B6-129 F1. Furthermore, by incorporating the drug-resistant cassette in the
targeting vectors, double KI at different genomic loci can be achieved at high efficiency, with > 60%, by a single
electroporation.

Results

Integration of linearized DNA fragments occurs mainly randomly into the ES cell genome

The Rosa26 locus is known as a safe harbor locus for stable gene expression28 and is frequently used for the
development of KI mice with a fluorescein reporter29,30 or Cre/loxP conditional recombination26,31. Thus, we used this
locus for determining homologous recombination-based KI using large-sized donor DNA. As linearized plasmids are
routinely utilized as DNA donors for conventional gene targeting, we first compared the frequency of DNA integration into
the genome between linearized or circular vectors. Linearized or circular plasmid carrying a 1850 bp CAG-EGFP cassette
flanked by both the homology arms (5′ and 3′ of 962 and 1006 bp, respectively) of the Rosa26 locus (pR26-CE) were
transduced into ES cells via electroporation (Supplementary Fig. S1A,B). Seven to 10 days later, the expression of EGFP
was determined via microscopy or flow cytometry. Although the number of EGFP-positive cells was quite low for both
transductions (0.7 ± 0.1% and 0.4 ± 0.1% in the linearized and circular vectors, respectively), the percentage of EGFP-
positive cells significantly increased when the linearized vector was used (Supplementary Fig. S1C,D).

Next, we selected ES cells in which the linearized vector was integrated into the genome via drug resistance and
verified whether the genomic integration of the vector was homology arm-dependent site-specific KI. We transduced ES
cells with linearized pR26-CE-PN, which contained PGK-NeoR cassette subcloned in the 3′ end of CAG-EGFP in pR26-CE,
(Supplementary Fig. S2A), added G418 to the culture medium 24 h post-electroporation, and cultured the cells for an
additional 7–10 days to select stable drug-resistant clones. Following G418 selection, almost all ES cell colonies were
EGFP-positive (Supplementary Fig. S2B). Analysis of each colony for Rosa26 locus-specific KI via genomic PCR showed
that none of the 47 independent clones contained any KI specific bands either on the 5′ or 3′ side (Supplementary
Fig. S2C). These results indicated that integration of linearized targeting vector into the genome is mainly random and is
not site-specific although the targeting vector carried homologous arms, and circular plasmids, which are thought to have
a lower frequency of random genomic integration than a linearized plasmid, were used as targeting vectors hereafter.

CRISPR/Cas9 ribonucleoprotein (RNP)-mediated circular plasmid integration into the ES cell genome was highly
locus-specific

Recent studies have indicated that Cas9-RNP-mediated genome editing exhibits lower cytotoxicity and higher
efficiency of genomic rearrangement induction than that mediated by CRISPR/Cas9 expressing plasmids32,33,34. Thus,
we next compared the efficiency of gene KI in ES cells with the all-in-one plasmid, which expressed both gRNA and Cas9
mRNA, or Cas9-RNP, which consisted of crRNA/tracrRNA/Cas9 protein. We transduced ES cells with the circular pR26-CE
via electroporation with the CRISPR/Cas9 all-in-one plasmid or Cas9-RNP (Fig. 1A,B) and determined EGFP expression 7–
10 days later. In this experiment, we did not use any drug-resistant cassettes for either positive or negative selection.
Few EGFP-positive ES cells appeared when pR26-CE alone was transduced into ES cells (Fig. 1C,D). The introduction of
pR26-CE and the all-in-one vector significantly increased the frequency of EGFP-positive ES cells, although the ratio
remained low (2.7 ± 0.1%, Fig. 1D,E). By contrast, introducing pR26-CE and Cas9-RNP into ES cells increased the EGFP-
positive cell ratio approximately tenfold compared with that found with the all-in-one plasmid (27.7 ± 0.7%, Fig. 1D,E).
Next, ES cells that became EGFP-positive were sorted by FACS and passed into a fresh medium; cloned colonies were
selected and used for KI genotyping by PCR. Of the selected clones, 41 out of 47 were positive for bands from both 5′
and 3′ KI (Fig. 1F), and the KI ratio was 87.2% (Fig. 1G). These results indicated that Cas9-RNP-mediated double-strand
DNA KI in the Rosa26 locus of ES cells became more evident than that found with the all-in-one plasmid. We then PCR
amplified the Rosa26 locus to see whether this plasmid KI was a homo- or hetero-integration event and analyzed 11
clones. One of these clones lacked the PCR amplicon (Supplementary Fig. S3A,B), whereas the remaining 10 clones had
indel mutations (Supplementary Fig. S3C). This suggested that most cases in the Rosa26 KI were hetero-integration KI
events, whereas genetic rearrangement frequently occurs in both chromosomes after the Cas9-RNP induction.

Figure 1
CRISPR/Cas9 ribonucleoprotein (RNP)-mediated circular plasmid integration into ES cell genome is highly locus-
specific. (A,B) Schematic representations of circular plasmid introduction strategies. Linearized vector containing CAG-
EGFP cassette flanked by approximately 1 kbp next to the gRNA target was transduced into ES cells via electroporation
(A). Circular plasmid as a targeting vector was introduced into ES cells with CRISPR/Cas9 expression vector (left) or
CRISPR/Cas9 ribonucleoprotein (RNP, right) by single electroporation (B). (C) Representative ES cell colonies under a
fluorescent microscope. Top panels show EGFP images and bottom panels show merged images of EGFP and bright field.
Left: no electroporation (No-EP); middle-left: transduction of targeting vector-only; middle-right: transduction of targeting
vector with CRISPR/Cas9 all-in-one plasmid; right: transduction of targeting vector with CRISPR/Cas9 RNP. Scale bar,
50 µm. (D) Flow cytometry of ES cells. Gate represents EGFP-positive fraction. (E) EGFP-positive cell ratios are shown as
mean ± SEM. The asterisk depicts a significant difference (n = 3, P < 0.01). (F,G) Genomic PCR analyses of EGFP-positive
ES cell clones collected via flow cytometry. Gel images or KI ratios are shown in F or G, respectively. Red arrows in the
upper or bottom panels indicate the 5′ or 3′ Rosa26 KIs, respectively, and black arrows indicate genomic DNA (gDNA)
PCR control. Both 5′ and 3′ PCR positive clones are indicated with red numbers. NC negative control using genomic DNA
from wild-type B6 mouse tail.

Cas9 RNP-mediated plasmid DNA KI is applicable to various loci in ES cells

Next, we determined whether the Cas9-RNP-mediated circular plasmid KI is attributed to homology arm-mediated
homologous recombination or Cas9-RNP-induced double-strand break-site-specific integration without homologous
recombination. First, Cas9-RNP that recognized either the Rosa26 locus (gRosa26-RNP) or tyrosinase locus (gTyr-RNP)
was transduced into ES cells with pR26-CE as the KI donor using electroporation (Fig. 2A). The indel mutation efficiency
with gTyr-RNP, via the T7 endonuclease 1 (T7E1) assay, was 90.5% in average (n = 3, 88.6%, 90.4%, or 92.5% in each
replicate) (Fig. 2B). Seven to 10 days following electroporation, EGFP-positive ES cells were confirmed using microscopy
(Fig. 2C) or flow cytometry (Fig. 2D). The ratio of EGFP-positive ES cells in the pR26-CE and gRosa26-RNP-transduced
group was 23.8 ± 3.2% (Fig. 2E). By contrast, EGFP-positive cells rarely appeared when pR26-CE was transduced with
gTyr-RNP (Fig. 2E). Next, we transduced gRosa26-RNP and CAG-EGFP plasmid flanked with (pR26-CE) or without (pCE)
Rosa26-homology arm into ES cells, and the GFP expression was determined by flow cytometry 7 to 10 days following
electroporation (Fig. 2F). The ratio of EGFP-positive ES cells in the pR26-CE and the gRosa26-RNP-transduced group was
19.2 ± 1.2%, whereas EGFP-positive cells rarely appeared (0.68 ± 0.1%) when homologous arm-less pCE was transduced
with gRosa26-RNP (Fig. 2G,H). These results suggested that circular plasmid DNA is integrated into the genome by
homology arm-mediated homologous recombination. Rosa26 is known as stable open chromatin in virtually all tissues,
including ES cells; thus, integration efficiency is thought to be relatively higher than that of other genomic loci in mice28.
We next investigated whether this CRISPR/Cas9-mediated plasmid KI without drug selection worked efficiently in various
loci in mouse ES cells. First, we targeted the promoter-less T2A-mCherry cassette to the C-terminus of the  Nanog gene,
of which expression is high in ES cells. This targeting vector has no exogenous promoter, so the fluorescent signal of
mCherry can be observed only when the T2A-mCherry cassette is correctly KI in the Nanog locus by the endogenous
promoter activity (Supplementary Fig. S4A). Transduction of pNanog-T2A-mCherry (pNmC) and gNanog-RNP into ES cells
resulted in 37.5% (18/48) of ES cell clones showing mCherry signal (Supplementary Fig. S4B). We randomly selected
eight mCherry-positive clones and analyzed KI by PCR. As a result, all analyzed clones were confirmed to be KI of
mCherry at the Nanog locus (Supplementary Fig. S4C). We next targeted nine independent loci on six different
chromosomes using various ES cell lines from C57BL/6J (B6J), C57BL/6N (B6N), BALB/c, or B6-129 F1 backgrounds and
successfully developed precise KI in all nine attempted loci. The KI ratio ranged from 6.8 to 59.1% (Table  1). The
expression levels of each endogenous gene at the KI loci in ES cells were analyzed using a public database
(https://www.ebi.ac.uk/gxa/experiments/E-GEOD-27843/Results) and shown in Table 1. Every ES cell line used in this
study could contribute to chimeric offspring following blastocyst injection and embryo transfer to a surrogate mother
(Supplementary Fig. S5). These results suggest that drug selection-free plasmid KI method is applicable in a range of
various loci in ES cells.

Figure 2
Cas9 RNP-mediated plasmid DNA KI is attributed to homologous recombination. (A) Schematic representations of
Rosa26 locus or Tyrosinase locus genome editing by site-specific Cas9-RNP. Circular vector containing CAG-EGFP cassette
flanked by 5’ and 3’ homology arms of Rosa26 was transduced with either gRosa26-RNP or gTyr-RNP into ES cells via
electropolation. (B) T7 endonuclease I mismatch cleavage (T7EI) analysis for validating the efficiency of double-strand
break induction by the tyrosinase locus-targeting CRISPR/Cas9 RNP. Upper, middle, or lower arrowheads indicate uncut
wild-type genome, longer half of mutated genome, or short half of mutated genome, respectively. The numerical values
on the bottom of the image represent induction ratios of indel mutation in the tyrosinase locus of targeted ES cells. Indel
mutation ratios were calculated using the formula mentioned elsewhere
(http://crispr.technology/resources/quantification.html). (C) Representative ES cell colonies under a fluorescent
microscope. Top panels show EGFP images and bottom panels show merged images of EGFP and bright field. Left: no
electroporation (No-EP); middle-left: transduction of pR26-CE plasmid-only; middle-right: transduction of pR26-CE
targeting vector with tyrosinase locus-specific CRISPR/Cas9 RNP; right: transduction of pR26-CE targeting vector
with Rosa26 locus-specific CRISPR/Cas9 RNP. Scale bar, 50 µm. (D) Flow cytometry of ES cells. Gate represents EGFP-
positive fraction. (E) GFP positive cell ratios are shown as mean ± SEM. The asterisk depicts a significant difference (n = 
3, P < 0.01). (F) Schematic representations of Rosa26 locus genome editing by site-specific Cas9-RNP. The targeting
vector with or without 5’ and 3’ homology arms of Rosa26 was transduced with gRosa26-RNP into ES cells via
electroporation. (G) Flow cytometry of ES cells. Gate represents EGFP-positive fraction. (H) GFP positive cell ratios are
shown as mean ± SEM. The asterisk depicts a significant difference (n = 3, P < 0.001).

KI efficiency increased significantly with a targeting vector carrying drug-resistant gene cassettes

Although the ratio of precise KI clones using the drug selection-free plasmid KI method from the total was
between 20 and 30% in the case of Rosa26 targeting (Figs. 1E, 2E), the KI ratio among the EGFP-positive ES cell clones
was 87.2% (41/47) (Fig. 1F,G), suggesting that the frequency of random integration of the targeting vector into the
genome was relatively low. Thus, we hypothesized that KI efficiency could be drastically increased using a targeting
vector that incorporated a drug resistance cassette (Fig. 3A). We targeted pR26-CE-PN, in which the PGK promoter-driven
neomycin resistance cassette is subcloned downstream of the EGFP sequence in pR26-CE, into the Rosa26 locus
accompanied with gRosa26-RNP (Fig. 3B). Several ES cell clones became EGFP-positive without G418 selection, whereas
almost all clones became EGFP-positive when ES cells were treated with G418 as we hypothesized (Fig. 3C). PCR
genotyping of all clones selected (47 clones) was positive for both 5′ and 3′ KI PCR bands (Fig. 3D,E).

Figure 3
KI efficiency increased drastically with targeting vector carrying a drug-resistant gene cassette. ( A,B) Schematic
representation of Rosa26 locus KI and drug selection strategy. (C) Representative ES cell colonies under a fluorescent
microscope. Top panels show EGFP images and bottom panels show merged images of EGFP and bright field. Left: no
electroporation (No-EP); middle: transduction of targeting vector with gRosa26-RNP without drug selection; right:
transduction of targeting vector with gRosa26-RNP followed by G418 selection. Scale bar, 50 µm. (D,E) Genomic PCR
analyses of ES cell clones. Gel images or KI ratios are shown in D or E, respectively. Red arrows in the upper or bottom
panels indicate the 5′ or 3′ Rosa26 KI, respectively, Black arrows indicated genomic DNA (gDNA) PCR control. Both 5′ and
3′ PCR positive clones are indicated with red numbers. NC: negative control using genomic DNA from wild-type B6 mouse
tail. (F) Schematic representation of Stra8 locus KI strategy. (G) Representative ES cell colonies under a fluorescent
microscope. The top panels show GFP images, and the bottom panels show bright field. Left: transduction of targeting
vector without gStra8-RNP followed by puromycin selection; right: transduction of targeting vector with gStra8-RNP
followed by puromycin selection. Scale bar, 50 µm. (H,I) Genomic PCR analyses of ES cell clones. Gel images or KI ratios
are shown in H or I, respectively. Red arrows in the upper or bottom panels indicate the 5′ or 3′ Stra8 KI, respectively.
Black arrows indicated genomic DNA (gDNA) PCR control. Both 5′ and 3′ PCR positive clones are indicated with red
numbers. NC negative control using genomic DNA from wild-type B6 mouse tail.

We next investigated whether drug selection could mediate the enhancement of KI efficiency in another genomic locus.
We made another targeting vector carrying a promoter-less CreERT2 followed by the EF1 promoter-driven copGFP2aPuro
that was expected to be a KI at the Stra8 locus (pStra8-CE-E1GP) (Fig. 3F). Stra8 is a germ cell-specific gene, and its
expression in the ES cells is considerably low (TPM = 6, TPM of well-known pluripotent genes such as Nanog is 389, or
Pou5f1 is 1154, Table 1) (https://www.ebi.ac.uk/gxa/experiments/E-GEOD-27843/Results). Induction of pStra8-CE-E1GP
into ES cells without Cas9-RNP cutting the Stra8 locus (gStra8-RNP) did not produce stable ES cell clones following
puromycin selection (Fig. 3G). By contrast, 17.5 ± 3.0% ES cell became GFP positive by introducing pStra8-CE-E1EP
combined with gStra8-RNP without puro selection (Supplementary Fig. S6A,B), and almost all colonies showed GFP
signals after puromycin selection (Fig. 3G). PCR confirmed site-specific KI, and consequently, all selected clones showed
KI PCR bands for both the 5′ and 3′ sides (46 independent clones) (Fig. 3H,I). We randomly selected 11 clones and
further analyzed the sequences of their KI sites. These transgenes contained the KI precisely in-frame in the  Stra8 locus
(Supplementary Fig. S6C,D). These results indicated that the efficiency of CRISPR/Cas9-mediated plasmid KI could
achieve a high KI ratio up to 100% in ES cells with drug selection.

Single electroporation could achieve dual allele KI in ES cells

Since the KI efficiency could be increased using plasmid KI with drug selection, we hypothesized that it would be
possible to KI multiple alleles simultaneously by using targeting vectors with different drug selection cassettes. To
investigate this hypothesis, we first targeted two different gene cassettes into the same locus, Rosa26, by single
electroporation (Fig. 4A). PCR genotyping following electroporation and subsequent G418 and blasticidin selection
revealed that all clones selected (22 clones) contained KI for both cassettes (Fig. 4B). We then determined whether single
electroporation could achieve double KI into different genomic loci. For this purpose, we targeted the CAG-CreERT2-EF1-
copGFP2aPuro cassette into the Cd6 locus (pCd6-CE-E1GP) and the CAG-loxP-Neo-loxP-Red fluorescent protein (RFP)
cassette into the Rosa26 locus (pR26-lsl-RFP) with gCd6-RNP and gRosa26-RNP into ES cells by single electroporation
(Fig. 4C). The Cd6 locus is another safe harbor locus for expecting stable gene expressions ubiquitously in various tissues
in mice35. After G418 and puromycin selection for 7 days, we selected 17 ES cell clones for PCR analysis to determine the
KI of Cd6 and Rosa26 loci. Strikingly, 11 out of 17 ES cell clones showed both the Cd6 and Rosa26 KI bands (64.7%,
Fig. 4D,E). Using the double KI ES clones, we demonstrated that both KI alleles worked in vitro as RFP signals were only
observed after 4OHT was administered in the culture medium (Fig. 4F,G). We further determined whether these KI alleles
also worked correctly in vivo. Chimeric mice were produced by injecting the ES cell clones into preimplantation blastocyst
followed by chimera embryo transfer to pseudopregnant surrogate mothers. Chimeric mice at 28  days postpartum
(Supplementary Fig. S7) were intraperitoneally administered with tamoxifen once daily for 5 days, and then tissues were
collected and used for microscopic analysis to evaluate RFP fluorescein expression (Fig. 4H). RFP signals were observed in
various tissues from chimeric mice administered with tamoxifen (Fig. 4I), indicating that integrated transgenes worked as
expected in vivo in mice.
Figure 4

Dual allele KI in ES cells could be achieved via single electroporation. (A) Schematic representation
of Rosa26 locus KI strategy using two independent targeting vectors. (B) Genomic PCR analyses of ES cell clones. Red
arrows in the upper or bottom panels indicate Bsd-GOI-A KI or NeoR-GOI-B, respectively. Black arrows indicate genomic
DNA (gDNA) PCR control. Both 5′ and 3′ PCR positive clones are indicated with red numbers. NC1: negative control using
genomic DNA from wild-type B6 mouse tail. NC2: negative control using the targeting vector. ( C) Schematic
representation for the combined Rosa26 and Cd6 locus KI strategy using two independent targeting vectors and gRNA-
RNPs. (D,E) Genomic PCR analyses of ES cell clones. Gel images or KI ratios are shown in D or E, respectively. Red
arrows indicate the 5′ or 3′ KI specific bands in Rosa26 or Cd6 loci, respectively. Both Rosa26 KI and Cd6 KI clones are
indicated with red numbers. NC: negative control using genomic DNA from wild-type B6 mouse tail. ( F) A schematic
illustration of the genetic construct for tamoxifen-inducible Cre/loxP recombination in Rosa26 and Cd6 loci. (G)
Representative double KI ES cell colonies under a fluorescent microscope without (top panels) or with (bottom panels) 4-
hydroxytamoxifen (4OHT). Left panels show GFP, middle panels show RFP, and left panels show bright field. Scale bar,
50 µm. (H) A schematic illustration of tamoxifen administration followed by tissue sampling in the chimeric mice having
both Rosa26 KI and Cd6 KI alleles. (I) Expression of RFP in tissues of chimeric mice. Mice were administered with
tamoxifen intraperitoneally 5 times and were then used for fluorescent microscope observation.
Discussion

We aimed to develop a simple and highly efficient KI method in mouse ES cells to efficiently develop genetically
manipulated mouse models. We found that induction of circular plasmid DNA, as a targeting vector, besides KI site
specific Cas9-RNP, enabled efficient DNA KI in the absence of drug selection. Furthermore, incorporating a drug-resistant
gene cassette into the circular targeting vector produced multiple KI in single electroporation with extremely high
efficiency. The largest size of gene cassette used in this study for KI was 11.7 kbp without drug selection and 6.2 kbp
with drug selection. The length of DNA fragments frequently used for making gene-manipulated mice such as CreERT2,
EGFP, or Cas9 are approximately 2.5, 1, or 4 kbp, respectively. Thus, we believe that CRISPR/Cas9-mediated plasmid KI
can be universally applied in ES cells and is highly beneficial for the efficient generation of genetically modified mice.

Previous studies using mouse ES cells36 and fibroblasts37 have shown that introduction of plasmid DNA into cells
resulted in very limited genome integration. These studies are consistent with our data here where a circular plasmid with
homology arms was rarely incorporated into the genome of ES cells (Fig. 1C,D; Supplementary Fig. S1C,D). Hence, a
linearized DNA fragment with homology arms, which is integrated into the genomic loci by homology-dependent repair
mechanism, has been conventionally used for site-specific KI3,4,5. However, although linearization of plasmid DNA
significantly increases the genomic integration rate, much of the integration is nonspecific even if the DNA has
homologous arms4. Similar findings were observed in this study, i.e., all ES cell clones that became drug-resistant could
not be confirmed to be accurate KI when a linearized DNA fragment with a drug-resistant gene was introduced into ES
cells without CRISPR/Cas9 (Supplementary Fig. S2). On the other hand, using CRISPR/Cas9 genome editing, it has been
reported that the long DNA KI efficiency is increased in mouse ES cells38. This method introduced the circular plasmid
into ES cells, followed by linearization with CRISPR/Cas9. Nevertheless, the accurate KI ratio was about half of the gene-
integrated ES cells, and the rest was incomplete partial KI or random integration38. Also, the efficiency of CRISPR/Cas9-
mediated long DNA KI in zygote was improved when the double-strand targeting plasmid was linearized intracellularly by
CRISPR/Cas9 compared to the circular one16,39,40, whereas no significant difference was observed between circular
plasmid or linearized plasmid for KI efficiency in the case in mouse ES cells16. Our current results showed that circular
plasmid was able to be KI in various chromosomes using CRISPR/Cas9-RNP even without drug selections, and the ratio
was up to around 60%. Therefore, at least in the case of KI to ES cells, it would be said that sufficiently high KI efficiency
can be achieved by using Cas9-RNP and circular plasmid as the targeting vector.

KI efficiency in ES cells was significantly higher in Cas9-RNP than Cas9 plasmid in this study. Unlike Cas9-plasmid,
which requires intracellular transcription and translation, Cas9-RNPs rapidly enter the nucleus and cleave the genome
after transfection. Previous reports have shown that Cas9-RNPs significantly increase genome editing efficiency compared
to Cas9-plasmids41,42. Therefore, it is suggested that the increased rate of genomic double-strand breaks by Cas9-RNPs
led to the increased KI efficiency in ES cells observed in this study.

It is also notable that Cas9-RNPs are degraded more quickly in the cell than Cas9-plasmids, and thus off-target
cleavage occurs at a lower rate41. Furthermore, high-fidelity (HiFi) type Cas9 protein was used as an RNP component in
this study. Several papers have reported that off-target cleavage of HiFi type Cas9 is remarkably lowered, as low as un-
detectable levels by analyzing a next-generation sequencer43,44. Thus, the present KI method using HiFi Cas9 protein
would also consider site-specific, although further analysis is needed to clarify the presence or absence of off-targets. On
the other hand, there were cases in which the PCR KI band could be observed only on either the 5' side or the 3' side in
this experiment, suggesting that incomplete KI occurred by this method to some extent. Therefore, it is essential for KI
screening to perform both 5' side and 3' side PCRs. It is also noted that the 5' or 3' sides may be KI to different
chromosomes; thus, confirming the KI by PCR using primers designed outside each HAs to amplify the entire targeted
allele would also be helpful.

Additionally, simultaneous gene KI in multiple loci was achieved with high efficiency by using plasmids with
different drug resistance gene cassettes as targeting vectors (Fig. 4). This would theoretically halve the duration for
developing dual gene-manipulated ES cell clones compared with the sequential manipulations of two genes. Extended
culture of ES cells is known to lead to a more aberrant methylation status of DNA and an increase in chromosomal
instability25,27, resulting in reduced pluripotency25. Thus, modifying multiple genes in single electroporation with a
shorter culture period would be advantageous for developing chimeric mice while maintaining the pluripotent quality of ES
cells.

Generally, mice with double mutations have been developed by mating with each mutation, and consequently
obtaining age-suitable individuals for phenotypic analysis can take several months. Present plasmid KI method with drug
selections would enable the manipulation of multiple genes in chimeric mice in a short period and thus would be
remarkably useful especially in fields where phenotypic analysis could be performed in chimeric animals45.
The contribution of blastocyst-injected ES cells to the germ cells, i.e., sperm or oocyte, in a chimera (germline
transmission, GLT), is a significant issue in producing stable genetically manipulated mice via chimeric mice 46,47.
Particularly, chimeric mice produced using ES cells derived from inbred lines such as C57BL/6 and BALB/c frequently have
poor GLT46. GLT can be variable and inefficient, partly because there is competition between the host and donor cells in
chimeras that is affected by the quality of ES cells46. Several recent studies suggested that the blastocyst
complementation method, whereby ES cells were injected into genetically germ cell-less blastocysts, can significantly
improve the efficiency of GLT46,47. Although six mouse lines out of nine attempts achieved GLT in our present
experiment (Table 1), it would be worthwhile, in future investigations, to combine our gene targeting technology with
blastocyst complementation methods to establish more efficient technologies for producing genetically modified animals
that can consistently achieve GLT.

An additional point of concern is backcrossing. In certain research areas, analyses using a specific pure-strain
mouse are required. Thus, if ES cell-based chimeric mice or zygote genome-edited mice are generated using undesired
strain, it is necessary to backcross 6 to 10 times to obtain a congenic strain before phenotypic analysis is carried out. This
process requires a long time, more than a year. In this experiment, we demonstrated that accurate and highly efficient
gene targeting is possible using ES cells from various genetic backgrounds, such as hybrids of the B6-129 F1 line, or pure
lines of B6N and BALB/c. The injection of those ES cells into 4n tetraploid blastocyst would enable in vivo gene function
analysis using pure inbred mouse lines in the F0 generation. Future works will require the establishment of high-quality
ES cells from additional pure mouse lines to avoid the time-consuming backcrossing.

In conclusion, our study demonstrates that multiple KI edits can be made with high efficiency in a single step
using a circular plasmid with Cas9/RNP genome editing in murine ES cells. We believe that this technique could help to
substantially reduce both the amount of time taken in the conventional production of genetically modified chimeric mouse
models and the number of animals currently used in this process.

Materials and methods

Animals and ethics

C57BL/6J or BALB/c mice were purchased from Clea-Japan (Tokyo, Japan), and ICR mice were purchased from
Japan SLC (Shizuoka, Japan) or Clea-Japan. Mice were housed in a pathogen-free condition under a 12  h light/12 h dark
photoperiodic cycle with food and water ad libitum in the experimental animal facility at the Institute of Medical Science,
University of Tokyo. All mouse experiments were approved by the Institutional Animal Care and Use Committee of the
University of Tokyo (approval number PA17-63) and performed according to their guidelines as well as the ARRIVE
guidelines (https://arriveguidelines.org).

Mouse embryonic fibroblast propagation and ES cell culture

To obtain mouse embryonic fibroblasts (MEFs), BALB/c and C57BL/6J mice were mated, and pregnant females
were euthanized by cervical dislocation on E14.5. MEFs were isolated by homogenizing collected fetuses and then by
culturing these cells in Dulbecco’s modified Eagle medium (DMEM) (Nacalai, Kyoto, Japan) containing 10% (v/v) fetal
bovine serum (FBS) (Sigma-Aldrich, St. Louis, MO, USA), 100 U/mL penicillin and 100 µg/mL streptomycin (FUJIFILM
Wako Pure Chemical, Osaka, Japan) at 37 °C in humidified 5% CO 2. After 12–14 days of culture, proliferating MEFs were
irradiated with X-rays (50 Gy) to halt the cell cycle, and the mitotically inactivated MEFs were then used for ES cell culture
for the feeder culture condition. ES cells having C57BL/6J or BALB/c background were developed in our laboratory25, with
slight modifications. In brief, mated females were euthanized by cervical dislocation, and embryos were collected by
flushing the oviduct a day after observation of the virginal plug. Collected embryos were cultured for 2  days in potassium
simplex optimized medium (KSOM) (Merck-Millipore, Darmstadt, Germany) to produce blastocysts. Each blastocyst was
transferred to a well in a 24-well culture plate (BD Biosciences, Bedford, MA, USA) containing irradiated MEFs (1.2 × 
105/cm2) and cultured in ES culture medium (ESCM; Knockout-DMEM; Thermo Fisher Scientific, Waltham, MA, USA) with
15% (v/v) FBS (Thermo Fisher Scientific), 2 mM GlutaMax (Thermo Fisher), 100 U/mL penicillin (FUJIFILM Wako Pure
Chemical), 100 µg/mL streptomycin (FUJIFILM Wako Pure Chemical), 0.1 mM 2-mercaptoethanol (Thermo Fisher
Scientific), leukemia inhibitory factor and t2i (0.2 µM PD0325901 (Sigma-Aldrich), and 3 µM CHIR99021 (Axon Medchem,
Groningen, Netherlands)). Five to 7 days after the blastocyst seeding, inner cell mass outgrowth was digested into single
cells by treatment with 0.25% (w/v) trypsin-ethylenediaminetetraacetic acid (EDTA) (Nacalai) and then passaged onto
fresh MEFs and cultured in ESCM for developing self-renewing ES cells. Besides these in-house developed ES cell lines,
JM8.A3 (C57BL/6N)48 or V6.5 (B6-129 F1) ES cell lines were also used in experiments. Each ES cell line was passaged
every 2–3 days at a subculture dilution of 1:10. For feeder-free ES cell culture, dishes precoated with 0.1% (w/v) gelatin
solution (Sigma-Aldrich) were used, and ES cells were cultured in ESCM.

Plasmids
A pAAV-mRosa26-CAG-EFGP (pR26-CE) targeting vector with 5′ (962 bp) and 3′ (1,006 bp) homology arms
to Rosa26 or pAAV-CAG-EGFP (pCE) have been previously reported20 and was kindly gifted from Dr. Mizuno. To make
pRosa26-CAG-EGFP-PGK-NeoR plasmid (pR26-CE-PN), the DNA fragment coding PGK-NeoR-bGHpA was PCR amplified
using pL452 as a template and cloned between EGFP and the 3′ homology arm sequences of pR26-CE using an In-Fusion
HD Cloning Kit (Takara, Shiga, Japan). pNanog-T2A-mCherry (pNmC) or pStra8-CreERT2-EF1-copGFP2aPuro (pStra8-CE-
E1GP) was constructed using pUC19 as a backbone. For making pNmC, the 5′ and 3′ homology arms of the  Nanog locus
were PCR amplified using C57BL/6 J genome as templates. Each homology arm and T2A-mCherry-bGHpA was cloned into
pUC19 using the In-Fusion HD Cloning Kit (Takara). For making pStra8-CE-E1GP, DNA coding CreERT2-Rabbit globin
poly-A was purchased from Genewiz (Genewiz Japan, Saitama, Japan). The 5′ and 3′ homology arms of the Stra8 locus
and EF1-copGFP2aPuro DNA fragments were PCR amplified using C57BL/6 J genome or PB513 (System Biosciences, Palo
Alto, CA, USA) as templates, respectively. Each fragment was cloned into pUC19 using the In-Fusion HD Cloning Kit
(Takara). To make pAAV-mRosa26-CAG-loxP-Neo-loxP-RFP (pR26-lsl-RFP), the loxP-Neo-loxP fragment or RFP fragment
was PCR amplified using pROSA26-DEST (#21189, Addgene) or PB514 (System Biosciences) as templates, respectively,
and cloned in between the CAG promoter and 3′ homology arm of pR26-CE using the In-Fusion HD Cloning Kit (Takara).
To make pCd6-CAG-CreERT2-EF1-copGFP2aPuro (pCd6-CE-E1GP), homology arms for the Cd6 locus or EF1-
copGFP2aPuro fragments were PCR amplified using genomic DNA or PB514 (System Biosciences) as templates,
respectively, whereas the CreERT2 fragment was purchased from Genewiz (Genewiz Japan). All the DNA fragments were
cloned into pAAV-MCS2 (#46954, Addgene) using the In-Fusion HD Cloning Kit (Takara) to make pCd6-CE-E1GP. An all-
in-one plasmid expressing Cas9 mRNA and guide RNA (gRNA), which targets a specific gene of interest were prepared by
cloning double-stranded oligos into the BbsI site of pX459 (Addgene, #48139). The gRNA targets (each 20 nucleotides
long) were as follows: Rosa26, 5′-AAGGGATTCTCCCAGGCCCA-3′; tyrosinase, 5′-GGTCATCCACCCCTTTGAAG-3′; Nanog,
5’-TATGAGACTTACGCAACATC-3’; Stra8, 5′-TAGATTATAATGGCCACCCC-3′; Cd6, 5′-ACAAGTTGGGAAAGGTTTAT-3′. A
summary of the targeting vectors used in this study is shown in Supplementary Table  S1. Detailed targeting vector
information or gRNA sequences for other target loci, assigned with unique project accession numbers (R1-A, R2-10, R2-
11, R2-16, R2-18, R2-19, R2-A, R2-B, and R2-C in Table 1 or Bsd-GOI-A and NeoR-GOI-B in Fig. 4) will be reported in a
separate publication along with their corresponding mouse phenotypes.

Preparation of CRISPR/Cas9 ribonucleoprotein complex (RNP) and electroporation of ES cells

TracrRNA, crRNA, and Cas9 protein were purchased from IDT (Coralville, IA, USA). TracrRNA and crRNA were
dissolved in Duplex Buffer (IDT, 200 µM each) and annealed in a thermal cycler at 95 °C for 10 min followed by 
− 1 °C/min stepdown cycles until 25 °C. Annealed RNA (100 µM) were then incubated with 3 µg/µL Cas9 protein at 37 °C
for 20 min to form Cas9-RNP. A Neon Transfection System (MPK5000, ThermoFisher) was used for electroporation of
plasmid and Cas9-RNP into ES cells. In brief, 1 × 10 5 ES cells with 1 µg of targeting vector were electroporated with or
without all-in-one CRISPR/Cas9 vector (0.5 µg) or Cas9-RNP (at a final concentration of 10 µM annealed RNA and
0.3 µg/µL Cas9 protein) in a 10 µL tip (MPK1096, Thermo). The Neon system used two pulses at 1200 V and 20 ms or a
single pulse at 1400 V and 30 ms for all-in-one vector transfection or Cas9-RNP transfection, respectively. Electroporated
ES cells were cultured in ESCM with or without MEFs.

Drug selection and in vitro induction of tamoxifen-mediated Cre-loxP recombination

For drug selection using either puromycin- or blasticidin-resistant genes, electroporated ES cells were treated with
G418 (400 ng/mL), puromycin (0.5 µg/mL), and/or blasticidin (20 µg/mL) to develop stable transfectants. For tamoxifen-
mediated in vitro Cre/loxP recombination, ES cell clones were treated with 4-hydroxytamoxifen (4OHT, 1 µM) for 2 days;
fluorescent reporter expression was observed using a fluorescein microscope (BZ-X710, Keyence).

Flow cytometry

For cytometric analysis to detect fluorescence reporter expression, ES cell colonies were digested into single cells
with 0.25% (w/v) trypsin–EDTA, and the single cells were resuspended in PBS, 2 mM EDTA, and 1% (w/v) BSA.
Fluorescence expression was observed using the FACSCalibur system (BD Biosciences, San Jose, CA, USA); collected data
were analyzed with FlowJo software (BD Biosciences).

Genotyping and indel mutation rate determination

Genotypes of KI in each ES cell clone were determined through PCR using genomic DNA as a template. Single ES
cell colonies as clones were manually selected and passed onto a new culture dish. The expanded ES cell clone was then
harvested and lysed using Tail Lysis Buffer (Nacalai) containing 7 U/mL proteinase K (#9034, Takara) at 65 °C for at least
2 h. Genomic DNA from the crude lysate was extracted and purified by conventional phenol/chloroform/isoamyl alcohol
(all from Nacalai) treatment followed by ethanol precipitation for PCR analysis. KODOne polymerase (Toyobo, Osaka,
Japan) was used for DNA fragment amplification. Indel mutation ratio was determined via T7 endonuclease I mismatch
cleavage assay using the Alt-R Genome Editing Detection Kit (IDT). The ratio of indel mutation was calculated as
described (http://crispr.technology/resources/quantification.html). Supplementary Table S2 shows the primer sequences
used for KI genotyping. The details of other target loci with unique project accession numbers (R1-A, R2-10, R2-11, R2-
16, R2-18, R2-19, R2-A, R2-B, and R2-C in Table 1 or Bsd-GOI-A and NeoR-GOI-B in Fig. 4) will be reported in a separate
publication as mentioned above. All the raw data of agarose gel electroporation were shown in Supplementary
Figs. S8 and S9.

Chimeric mouse production and tamoxifen administration

Chimeric mice were developed by blastocyst injection of gene-targeted ES cells. In brief, 8-week-old female mice
were superovulated via sequential intraperitoneal administration of 7.5 U equine chorionic gonadotropin (Serotropin,
ASKA Animal Health, Tokyo, Japan) and 7.5 U human chorionic gonadotropin (Gonatropin, ASKA Animal Health) 48 h
apart and then mated with a mature male of the same strain to obtain embryos. Mated females were euthanized on the
next day of vaginal plug observation, and two-cell embryos were collected by perfusion of oviduct with modified Whitten
medium (mWM). Collected embryos were cultured in KSOM (Merck-Millipore) for 2 days to develop blastocysts. Six to ten
ES cells were injected into a single blastocyst, and the injected blastocysts were transferred into the uterus of
pseudopregnant ICR female surrogates (20 to 25 blastocysts per individual female) at 2.5 day postcoitum to obtain
chimeric offspring. For developing chimeras using B6J (black hair), B6N (JM8.A3, Agouti hair), or B6-129 F1 (V6.5, Agouti
hair), blastocysts having an ICR background (albino) were used. In the case of BALB/c (albino) ES cell injection,
blastocysts having B6J background (black hair) were used. For tamoxifen-induced Cre/loxP recombination in vivo, 5-
week-old chimeric mice were intraperitoneally injected with tamoxifen (Sigma-Aldrich, 2 mg/body diluted in corn oil) or
corn oil alone (control) once daily for 5 times and harvested for tissue sampling 3 days after the final tamoxifen
administrations. Collected tissues were observed for fluorescent expression analysis using a fluorescent stereomicroscope
(Leica Microsystems, Tokyo, Japan).

Statistical analysis

All numerical data are shown as the mean ± SEM of three independent replications. Differences between
treatments or genotypes were tested using the two-sided Student’s t test. P-values of less than 0.05 were considered
significant.

PEAC-seq adopts Prime Editor to detect CRISPR off-target and DNA translocation

CRISPR technology holds significant promise for biological studies and gene therapies because of its high
flexibility and efficiency when applied in mammalian cells. But endonuclease ( e.g., Cas9) potentially generates undesired
edits; thus, there is an urgent need to comprehensively identify off-target sites so that the genotoxicities can be
accurately assessed. To date, it is still challenging to streamline the entire process to specifically label and efficiently
enrich the cleavage sites from unknown genomic locations. Here we develop PEAC-seq, in which we adopt the Prime
Editor to insert a sequence-optimized tag to the editing sites and enrich the tagged regions with site-specific primers for
high throughput sequencing. Moreover, we demonstrate that PEAC-seq could identify DNA translocations, which are more
genotoxic but usually overlooked by other off-target detection methods. As PEAC-seq does not rely on exogenous
oligodeoxynucleotides to label the editing site, we also conduct in vivo off-target identification as proof of concept. In
summary, PEAC-seq provides a comprehensive and streamlined strategy to identify CRISPR off-targeting sites in vitro and
in vivo, as well as DNA translocation events. This technique further diversified the toolkit to evaluate the genotoxicity of
CRISPR applications in research and clinics.

Introduction

CRISPR-based genome editing exhibited enormous potential in both biological research and clinical applications.
Compared to small-molecule drugs and antibody drugs, CRISPR therapy has the unique advantage of directly targeting
the nucleic acid sequences of previously undruggable targets. However, non-specific targeting of gRNAs, which might
introduce undesired edits, causes unexpected cell genotoxicity. And it is urged to understand the outcomes of off-target
edits and the resulting DNA translocations, which challenges the great translational potential of CRISPR technology in
harnessing genetic disorders and other human diseases.

To date, versatile tools have been developed to identify CRISPR off-target sites. In vitro techniques capture
nuclease-induced cleavage events directly from purified genomic DNA or chromatin1,2; these approaches typically require
400–500 million reads per sample to identify off-targets. Some other methods incorporated enrichment of fragmentized
DNA by circularizing sequences3 or by introducing biotinylated oligos to overcome the high sequencing requirement4.
However, in vitro techniques typically reported many sites that did not occur in a cellular context, and methods for in
cellula and in vivo detection are highly demanded. GUIDE-seq labeled and enriched double-strand breaks (DSBs) in the
genome of living cells using exogenous double-stranded oligodeoxynucleotides (dsODNs) mediated by an end-joining
process5. However, the high molarity of exogenous dsODNs limited its application to detect off-targets for in vivo CRISPR
editing. BLISS is another type of in cellula technique, which utilizes in situ DSB ligation in fixed cells and characterizes the
off-target sites for both SpCas9 and As/LbCpf16. As CRISPR technology holds therapeutic potential for many unmet
medical needs, the off-target identification of in vivo CRISPR editing and the evaluation of corresponding genotoxicity are
highly demanded. To do so, one strategy is to use in vitro or computational approaches to prioritize a list of genomic
regions and validate them on in vivo samples one by one through targeted amplicon sequencing (Amplicon-seq) 7,8,9,
which risked overlooking in vivo specific off-targets and suffered from tedious labor work if the prior data comes with a
long candidate list. DISCOVER-seq, however, utilized the signal of chromatin immunoprecipitation of MRE11, which is
involved in the DNA repairing pathway, to represent and enrich genomic sites undergoing DSB-induced repairs 10.
However, the dynamic nuclease activity of Cas9 might not be fully captured by the “snapshot” signal from MRE11
immunoprecipitation.

Further, DNA translocation has been a significant concern for CRISPR editing, as it typically causes higher
genotoxicity, although it occurs at a relatively lower frequency11. The potential risk of DNA translocation has often been
concentrated on applying CRISPR editing in producing CAR-T cells since multiple gRNAs were introduced to T cells and
cause risks of translocation between double-strand DNA (DSB) ends12,13. Methods have been developed to identify DNA
translocations, but the sequence information of at least one end of the rearranged DNAs is usually required, e.g.,
HTGTS14,15,16,17. And a systematic identification of DNA translocation is still lacking.

Here, we introduce an off-target identification method, PEAC-seq (Prime Editor Assisted off-target
Characterization), in which we design a Cas9-MMLV fusion protein to take advantage of the sequence insertion ability
from the Prime Editor (PE)18. The native PE system (Cas9n-MMLV) utilizes a pegRNA (Prime Editor gRNA) containing
extra sequences at the 3’ of gRNA, which serve as a priming site and allow reverse transcription (RT) from the exposed
3’-hydroxyl group of the non-targeting strand to incorporate additional DNA sequences into the editing sites. In PEAC-seq,
an optimized RT template is used to incorporate PEAC-seq tag sequences, which are further used to represent and enrich
the local sequences of the edited sites from the genome, including both on-target and off-target sites. PEAC-seq
accompanies the process of CRISPR editing and tag insertion, which ensures the consistency between editing events and
PEAC-seq signals. We apply PEAC-seq on a few promiscuous sites in both in cellula and in vivo samples and demonstrate
that PEAC-seq could effectively identify off-targets by comparing to the results of GUIDE-seq, DISCOVER-seq, WGS, and
Amplicon-seq. Furthermore, benefiting from the directional inserted PEAC-seq tag, we successfully identify DNA
translocations, which could not be directly profiled by current methods and are typically more toxic to cells. Together,
PEAC-seq is an unbiased method of identifying CRISPR off-targets and off-target-related DNA translocations. As it
bypassed the addition of high molarity of exogenous dsODNs, PEAC-seq also holds immense potential to identify off-
targets and translocations for in vivo CRISPR editing, which would be particularly valuable for translational studies.

Results

Develop PEAC-seq for unbiased identification of CRISPR off-targets

To be compatible with off-target detection of in vivo CRISPR editing, we reasoned that the detection method
should streamline the editing and off-target enrichment processes without relying on exogenous moiety. To do so, we
adopted the prime editor system using Cas9 instead of Cas9n and utilized pegRNA to be templated for inserting a tag
sequence for enrichment. The Cas9/pegRNA creates double-strand breaks (DSBs) in the genome at both on-target and
off-target sites, and the tag sequence will be introduced at the DSB sites through reverse transcription from the pegRNA
and incorporated into the genome through DNA repair. We designed a 21-nt insertion tag, with the consideration of (1)
avoiding the RNA secondary structure of the insertion tag and between the insertion tag and the gRNA scaffold; (2)
sequence uniqueness to the host genome; (3) sufficiently long for efficient anneal by PCR primers for enrichment. We
named this assay PEAC-seq, Prime Editor Assisted off-target Characterization, as it employed the insertion ability of the
Prime Editor to label and enrich the editing sites.

To enrich the genomic regions embedded with the PEAC-seq tag sequences, we adopted a priming strategy as
GUIDE-seq5. We used Tn5 tagmentation instead of sonication to streamline the workflow and lower the starting DNA
requirement (Fig. 1a). The UMI-included adapters were embedded into Tn5 to enable the elimination of PCR duplications
from the sequencing data. During the library preparation, one of the biggest challenges is to effectively enrich the
inserted tag sequences, whose length might vary. Since the PEAC-seq tag was from reverse transcription and extended
alongside the RT template, both partial and full-length products might exist. Hence, primers must be carefully designed to
enrich the editing sites with insertion at different lengths. To optimize the enrichment, we designed forward and reverse
primers with different lengths of annealed base pairs to the inserted tag and evaluated their performances. We used
three forward primers and two reverse primers with different extension starting points on the PEAC-seq tag (Fig.  1b).
Different amplicons were generated in five separate reactions, each reaction was amplified by a forward primer and
downstream Tn5 primer, or an upstream Tn5 primer and a reverse primer (Supplementary Fig 1). The enrichment to the
PEAC-seq tag was evaluated to choose the best-performed primer set. It is worth pointing out that all the primers were
designed at least 2-bp away from the insertion boundary so that the extension sequence could be used to filter out
random priming reads (Fig. 1b).

Fig. 1: Development of the PEAC-seq technique.

a Schematic representation of the PEAC-seq experimental procedure. The gDNA was extracted and undergone
Tn5 tagmentation. The Tn5 was embedded with UMI adapters to eliminate PCR duplications in silicon. After tagmentation,
fragments were amplified by pairs of primers (one priming at the PEAC-seq insertion, another priming with the Tn5
adapter). b Schematic representation of the three forward primers and two reverse primers designed for tag enrichment
and library preparation of PEAC-seq. Each forward primer was paired with a downstream Tn5 primer to generate
amplicons including the PEAC-seq tag sequence and its downstream genomic sequences. Each reverse primer was paired
with an upstream Tn5 primer to generate amplicons including the PEAC-seq tag sequence and its upstream genomic
sequences. In total, five Amplicon-seq data from the three forward primers and two reverse primers were generated, and
six candidate lists of putative off-targets were inferred from the five Amplicon-seq data using a modified GUIDE-seq
analysis pipeline (“Methods”). c–e Venn diagram shows the shared and unique off-targets identified by PEAC-seq and
GUIDE-seq. The VEGFA TS1 (c), FANCF (d), and EMX1 (e). Source data are provided as a Source data file.
Next, we examined the indel efficiency of PEAC-seq and the insertion efficiency of PEAC-seq tag at ten on-target sites.
Across the ten examined sites, the indel frequencies of Cas9-MMLV and Cas9 are highly consistent (Supplementary
Fig. 2a). And the insertion efficiencies of the full-length tag were 11–31% (Supplementary Fig. 2b–h), which is
comparable to GUIDE-seq5. Encouraged by these pilot data, we conducted PEAC-seq in HEK293T cells at six sites ( VEGFA
TS1, VEGFA TS2, VEGFA TS3, EMX1, FANCF, and RNF2) that have been tested in multiple studies1,2,3,4,5. We used a
modified GUIDE-seq analysis pipeline to rank and filter the identified editing sites. We analyzed the off-target sites
generated from different primer sets for PEAC-seq tag enrichment and chose the F1 and R2 primers as the enrichment
primers in the following analysis (Tables S1–S6, Supplementary Data 1–6, Supplementary Figs. 3–8, and “Methods—
PEAC-seq in HEK293T cell”).

At the sites of VEGFA TS1, VEGFA TS2, and VEGFA TS3, a large proportion of PEAC-seq off-targets were also
reported by GUIDE-seq, but both methods hold a few unique off-targets (Fig. 1c and Supplementary Figs. 4 and 5). At
the sites of FANCF, EMX1, and RNF2, all PEAC-seq off-targets were reported by GUIDE-seq (Fig. 1d, e and Supplementary
Fig. 7). We then conducted Amplicon-seq to verify those off-targets that were only identified by GUIDE-seq or PEAC-seq
at VEGFA TS1, FANCF, and EMX1 sites5. At the VEGFA TS1 site, Amplicon-seq confirmed the two PEAC-seq-unique off-
targets, demonstrating good sensitivity of PEAC-seq. For the GUIDE-seq-unique off-targets, all six off-targets at
the FANCF site were confirmed not to occur in our sample, while two out of the twelve GUIDE-seq-unique off-targets at
the EMX1 site and two out of the eight GUIDE-seq-unique off-targets at the VEGFA TS1 site were detected by Amplicon-
seq. These data argued that PEAC-seq could effectively and specifically identify off-targets with a streamlined procedure
without incorporating other exogenous reagents to tag and enrich these sites.

Next, we looked up the PEAC score and local sequences at the shared and unique off-targets. The PEAC score,
calculated from the sequencing reads of PEAC-seq, quantitatively represents the enrichment of PEAC-seq tag at the
edited sites. At VEGFA TS1, the off-target sites identified by both PEAC-seq and GUIDE-seq show higher PEAC scores
compared to PEAC-seq-unique off-targets (Fig. 2a). Further, the number of sequencing reads surrounding the off-targets
were highly correlated at the fourteen shared sites (Fig. 2b), suggesting their consistency in detecting high confident off-
targets. Noticeably, when examining the signal tracks of the PEAC-seq reads, the on-target site, shared off-target sites,
and PEAC-seq-unique off-target sites show similar tracks (Fig. 2c). Also consistent with previous reports, the shared off-
target sites composed a smaller number of mismatches than off-target sites unique to one of the methods (Fig.  2d),
which is expected as the number of mismatches closely relates to the occurrence of off-target editing.

Fig. 2: Analysis of the PEAC-seq off-target sites.


a The visualization of PEAC-seq on-target and off-target sites. The ‘*’ represented a PEAC-seq site that was also
called by the GUIDE-seq. The ‘**’ represented a PEAC-seq off-target (PEAC-seq-unique) that was identified by Amplicon-
seq but not called by the GUIDE-seq. PEAC score: quantitative enrichment of the PEAC-seq tag at the edited sites; PEAC-
ID: each identified site (on-target and off-target) by PEAC-seq were assigned a PEAC-ID, which was ordered by the PEAC
score (descending order). b The number of reads from the shared PEAC-seq and GUIDE-seq sites is highly
correlated. c Screenshots of PEAC-seq signal tracks from the IGV Genome Browser. One on-target site, one shared off-
target site, and one PEAC-seq unique off-target site were presented. For each site, signals from both the PEAC-seq and
the wild-type (WT, no Cas9-MMLV treatment) samples were included. For each sample, the first track represented signals
from the amplicons of a forward primer and a downstream Tn5 primer; the second track represented signals from the
amplicons of a reverse primer and an upstream Tn5 primer. The model on the right side showed the direction of the
spacer and PAM of each case. d The shared off-targets (gray bars) tend to have less mismatches compared to the on-
target site, while the PEAC-seq unique sites (orange bars) and the GUIDE-seq unique sites (blue bars) tend to have more
mismatches. e. Mutation frequencies were plotted at each position alongside the gRNA and PAM sequences (from 5’ to
3’). From top to bottom are profiles of VEGFA TS1, TS2, and TS3. Source data are provided as a Source data file.

Furthermore, we also examined whether the position of mismatches on the pegRNA sequence might affect the
off-target identification5, especially in the primer binding site (PBS) that is crucial to initiate the primer extension of
reverse transcription18. To do that, we grouped the off-target sequences from the “Shared,” “PEAC-seq-unique,” and
“GUIDE-seq-unique” and aligned with the on-target sequence and PAM sequences. The frequency at each position were
plotted for the three sites (Fig. 2e). The patterns among the shared and unique off-target groups were quite consistent
in VEGFA TS2 (81 sites) and VEGFA TS3 (35 sites), but a bit fluctuated in VEGFA TS1 (24 sites). Although the smaller
number of off-targets of VEGFA TS1 might contribute to its fluctuated mutation frequency, this result indicated that the
sensitivity of PEAC-seq might be affected by mismatches located in the PBS region of PEAC-seq. Actually, the two verified
GUIDE-seq-unique off-targets of TS1 both show mismatches in PBS region (at the position 14 and 17 of the spacer,
respectively) (Table S10). Nevertheless, off-target identification of the TS3 gRNA seems more tolerant to PBS mutations,
which implied that the extent of the influence might be site-specific.

PEAC-seq identified Cas9-dependent chromosome rearrangement

To enrich PEAC-seq tag, the forward primer (F1) and downstream Tn5 primer (R1) would amplify regions
downstream, but not upstream, of the PEAC-seq tag (Fig. 3a). Surprisingly, in some cases, we saw unexpected signals
located at the upstream genomic region of the F1-R1 amplicons (Fig. 3a and Supplementary Fig. 9). With further analysis
on these sites, we speculated that the signals might come from the joining of DSB ends from another genome breaking
site. As shown in the proposed models (Fig. 3b), PEAC-seq generates DSBs with three different ends, including one
upstream end appended with a complete or partial PEAC-seq tag (②), one upstream end without PEAC-seq tag (①), and
one downstream end (③). If multiple DSBs simultaneously occurred in nucleus and physically proximal to each other,
DSB ends from different breaking points might join together and cause DNA rearrangements. In our hypothesized
scenario, the upstream end with the PEAC-seq tag from a distal Donor Site may join with the upstream end of a Receiver
Site, but the direction of the PEAC-seq tag is reverse relative to the Receiver Site (Fig. 3b, model (v)). This joining
generates signals upstream to the PEAC-seq tag of the Receiver Site, which won’t be amplified by the F1 and Tn5
primers (R1) (Fig. 3a).

Fig. 3: PEAC-seq identified DNA translocations relevant to CRISPR genome editing.


a Signal tracks of one PEAC-seq site with unexpected upstream signals from the F-primer amplicon. Dashed gray
bar: cutting site; Earthy yellow peak: expected signals from the F-primer; Pink peak: unexpected signals from the F-
primer. b Proposed models of the generation of unexpected upstream signals. Both the Receiver site and the Donor site
could generate DSBs and proximal to each other within the nucleus. Models (i) and (ii) joined DSB ends from the same
Receiver site. Models (iii), (iv), and (v) joined one donor DSB and one Receiver DSB. If the donor DSB carried the PEAC-
seq insertion, the unexpected upstream signal would be observed at the Receiver Site. In the models, the gRNA location
was set on the top strand. c The design of validation PCR to identify the genomic sequence of the Donor Sites. Two
specific primers (Nest-F1 and Nest-F2) were designed upstream of the gRNA of the Receiver Site. The Nest-F1 and Nest-
F2 were sequentially used with the downstream Tn5 primer, and two amplicons were generated. The second amplicons
were sent for Amplicon-seq. d The translocation cases identified by PEAC-seq + Amplicon-seq. e Translocation scores of
all sites were plotted. The red arrow indicated the Receiver Site in Fig. 3d. A DNA translocation score was calculated as
“translocation reads number”/(“normal reads number” + “translocation reads number” + 10). Source data are provided as
a Source data file.

The DSB-induced DNA rearrangements, which have not been systematically evaluated by other CRISPR off-target
identification techniques, would cause severe chromosome aberrant including large fragment deletion, inversion, and
translocation. Benefited from the directional PEAC-seq tag insertion, the resulting PCR amplicons could be used as
indicators for chromosome rearrangements, as it could distinguish whether the amplicon came from the joining of
expected DSB ends. To test this, we designed primers (Nest-F) located upstream of the F1 primer, which paired with the
downstream Tn5 primer to identify the sequences of the unknown Donor sites (Fig. 3c, “Methods—Translocation
characterization”). Noteworthy, a successful amplification bridging the Donor and the Receiver sites do not require the
existence of the PEAC-seq tag insertion (Fig. 3b, models (III) and (iv)), which allowed us to comprehensively estimate the
various rearrangement patterns between the Donor and the Receiver sites.

We conducted the Unidirectional Targeted Sequencing (UDiTaS)17 at two susceptible sites and identified three
types of translocations (Fig. 3d). The results indicated that both the upstream end (Fig. 3d, model (iii)) and the
downstream end (Fig. 3d, model (iv)) of a distal Donor site could join with the upstream end of a Receiver site. This
joining could happen either with or without the PEAC-seq insertion. And also, we identified many other translocations,
some of which were between the selected sites and other on- and off-target sites, or other DSB sites (Supplementary
Fig. 9b, c).

Interestingly, the frequencies of DNA translocation varied across different sites (Fig. 3e), and it did not
necessarily happen between DSB ends with high indel frequencies. For example, among the PEAC-seq off-targets
of VEGFA TS3 sites, the on-target site (chr6: 43769716–43769739) shows a 0.2% translocation rate in our data, while at
another off-target site (chr22: 37266776–37266799), 34.7% edits involve DNA translocations (Supplementary Fig. 9a).
The translocation score of other VEGFA TS3 off-targets and the other seven sites were provided (Supplementary Fig. 9a,
Tables S1–8, Supplementary Data 1–8). These results suggested that the PEAC-seq could successfully identify
chromosome translocations, further enabling the safety evaluation of the CRISPR application.

Apply PEAC-seq for in vivo off-target detection

PEAC-seq used the templated information on pegRNA to insert tag sequences and not rely on exogenous tags.
This straightforward procedure allowed us to investigate its application in vivo. We edited mice embryos at the pronuclear
stage by injecting in vitro transcribed Cas9-MMLV mRNA and pegRNAs targeting PCSK9 and PNPLA3. We collected
embryos around E14.5 to E21 and generated the PEAC-seq off-target lists for these two sites (Fig. 4). We identified
one PCSK9 on-target and one off-target from the two embryos, which both have been previously reported by DISCOVER-
seq10 (Fig. 4b–d, Table S7, and Supplementary Data 7). Amplicon-seq verified the edits at the PEAC-seq off-targets and
confirmed non-edits at the other reported off-targets. The small number of PCSK9 off-targets in our study might be
relevant to the short editing time window by using mRNA injection in embryos, compared to the adenovirus delivery in
the liver10. Using the same strategy, we also conducted the PEAC-seq at another in vivo CRISPR therapy target PNPLA3.
Three editing sites, including the on-target site, were identified by PEAC-seq from two embryos (Supplementary Fig. 10,
Table S8, and Supplementary Data 8). These off-targets were also reported by previous in vivo study10,19 and verified by
Amplicon-seq. These data demonstrated the potential of PEAC-seq to directly detect off-targets in vivo, although more
editing systems need to investigate.

Fig. 4: PEAC-seq identified pcsk9 off-targets from an edited mouse embryo.


a Schematic representation of the in vivo PEAC-seq experiment. b The Venn diagram shows the overlap between
the PEAC-seq on-target and off-targets of PCSK9 and the top18 editing sites (including the on-target) identified by
DISCOVER-seq. c The sequence visualization of the PCSK9 on-target and off-targets. One off-target was identified from
one of the two embryos. The site was also reported by DISCOVER-seq and validated by Amplicon-seq. The color scale
represented the indel frequency reported by CRISPResso. d The signal track of the on-target and off-target sites
identified from PEAC-seq in two different embryos and wild-type control. The signal of the WT control at chr4:106463845
was 1000-fold lower than the sample and was considered as background. Source data are provided as a Source data file.

ePEAC-seq, an improved version of PEAC-seq utilizing epegRNA

Since the original PEAC-seq protocol has been developed, multiple strategies have been proposed to improve the
editing efficiency of the native PE system, including modifications on pegRNA20, MMLV21, and transient expression of a
dominant negative MMR (DNA mismatch repair) protein22. By incorporating epegRNA (engineered pegRNA, incorporated
3’ RNA structural motif evopreQ 1), hMLH1, and epegRNA plus MLH1dn, we developed three modified versions of PEAC-
seq and benchmarked their performances on identifying off-targets at EMX1 and VEGFA TS2 sites (Fig. 5a). We did not
include the truncated MMLV, as it is reported to be effective in plants but not in mammal cells 21. We specifically
concentrated on the PEAC-seq tag insertion, whose efficiency is critical to the overall performance of PEAC-seq. Among
the three modifications, incorporating epegRNA appears to be the most effective one to increase the number of PEAC-seq
tag insertion at different cutoffs (Fig. 5b). We named the epegRNA version of PEAC-seq as ePEAC-seq. Importantly,
ePEAC-seq successfully identified the two missed off-targets of EMX1 (Fig. 5c, d), emphasizing its higher sensitivity than
PEAC-seq. At the VEGFA TS2 site, ePEAC-seq also called more off-target sites shared with GUIDE-seq, compared to PEAC-
seq (Supplementary Figs. 4a and 11).

Fig. 5: ePEAC-seq is an enhanced version of PEAC-seq with higher sensitivity to identify off-targets.
a Schematic representation of the three modified versions of PEAC-seq. b The insertion frequencies of PEAC-seq
tag in PEAC-seq and its three modifications. c The Venn diagram of EMX1 off-targets identified by PEAC-seq and GUIDE-
seq. d ePEAC-seq identified two more verified off-targets that were missed by PEAC-seq. Source data are provided as a
Source data file.

Discussion

The off-target detection is crucial to the biotechnological and clinical applications of CRISPR technology. Over the
past years, many elegant designs have been applied to depict the profile of off-targets in vitro and in cellula. These
methods could label and enrich the cleavage sites without knowing the genomic locations of off-targets, but the addition
of exogenous dsODN or chemicals limits their applications in vivo. Besides these experimental approaches, computational
algorithms considered the diverse features of gRNA also contributed to generate a candidate off-target list. However, it is
always a concern how well the cellular context could be reflected by these alternative approaches. To bypass the addition
of exogenous agents, we adopted the Prime Editor to insert a tag sequence along with the cleavages. We used Cas9
instead of Cas9n to fuse with MMLV and employed the template information from pegRNA to label and enrich the editing
sites. Utilizing the PEAC-seq, we successfully identified and validated off-targets both in HEK293T and in mouse embryos.
Recent studies have reported a variety of modifications to the native PE system to increase the editing
efficiency20,21,22. We demonstrated that incorporating epegRNA is the most effective method to improve the insertion
efficiency of PEAC-seq tags, which also rescued two missing off-targets from EMX1 PEAC-seq (Fig. 5c, d). It is not
surprising that the transient expression of MLH1dn did not improve the performance, as MLH1dn is a dominant negative
MMR protein, which involves DNA heteroduplexes by selectively replacing nicked DNA strands22. However, the repair
pathway activated by PEAC-seq is probably different, as we used the wild-type Cas9 to replace the Cas9 nickase in the
native PE system.

Besides the off-targeting indels, DNA translocation happens when multiple DSBs were introduced and is more
toxic to the genome stability14,23. Multiple DSBs might be introduced when a single gRNA was used but off-target editing
happen, or when multiple gRNAs were used. For example, to engineer T cells to become allogeneic or autogeneic CAR-T,
more than one gRNA needs to be used12,13,24. These further urged a sensitive translocation detection method to
systematically profile DNA translocations. Recently, several papers reported that DNA translocations happened more
frequently than we thought during Cas9 editing in vivo25,26,27. To our knowledge, besides ultra-deep whole-genome
sequencing, none of the CRISPR off-target detection techniques are able to directly detect the DNA translocations without
knowing the sequence of at least one DSB end. GUIDE-seq reported large-scale genomic alterations via AMP (anchored
multiplex PCR)-based sequencing, in which a candidate translocation could be detected in the following validation step5.
The directional insertion sequence in PEAC-seq allowed us to identify the aberrant ends joining from different DSB sites.
We also noticed that the occurrence of DNA translocation is independent of the frequency of DSB at a particular site,
which indicated that other factors, e.g., position or DSB context sequences might contribute to translocation 11. Finally,
due to the potential genotoxicity of the DNA rearrangements, both the translocation profiling methods and genotoxicity
assessment need to be developed for CRISPR transitional applications.

PEAC-seq also conducted proof-of-concept studies to demonstrate its application in vivo. This method, together
with DISCOVER-seq, both relying on agent signals that accompany the cleavage events. DISCOVER-seq used MRE11
ChIP-seq signals to represent the DSB events undergoing in the edited cells, while the nature of the ChIP-seq technique
captured only the snapshot of MRE11 binding and might not exhibit the off-target sites over the course of editing. PEAC-
seq, which relies on the enrichment of inserted PCR handle, might also overlook cleavages with incomplete insertions that
could not be effectively enriched, although our random sequence screen demonstrated good efficiency of long insertion.
Increasing the size of cell population might further increase the sensitivity of PEAC-seq, which has been demonstrated by
the two verified PEAC-seq unique off-targets in cellula. Nevertheless, these methods, together with previous approaches,
provided versatile tools to enhance our understanding of the occurrence of off-target in different contexts, which are very
informative alternatives to the costly WGS.

Finally, it is intrinsically interesting that not all potential off-target sequences are eventually edited as off-targets.
To look into this question, we analyzed the genomic co-localizations between the PEAC-seq off-targets and epigenetic
signals collected from public data28. We plotted the density of ATAC-seq peaks and ChIP-seq peaks of multiple histone
modifications and proteins surrounding (±5 kb) the PEAC-seq off-targets. Briefly, the results indicated that off-targets
tended to occur in open chromatin regions (ATAC-seq) and to be associated with histone modifications in active gene
regulation (H3K4me3, H3K9ac, and H3K27ac) and gene transcription (POLR2A, EP300, H2AFZ) (Fig. 6a). PEAC-seq
translocation does not associate with the above epigenetic marks as well as cancer-related fusion genes, but show co-
occurrence with the double-strand breaks (DSBs) in HEK293T cells. Compared to control regions, which were equally
sized regions re-sampled randomly across the genome, we observed enrichment of DSBs surrounding ±5 kb of the PEAC-
seq translocation sites (Fig. 6b), indicating that CRISPR editing-induced translocation tends to occur at DSB-enriched
regions.
Fig. 6: The genomic context of PEAC-seq off-target and translocations.

a Signals of the ATAC-seq peaks and ChIP-seq peaks of multiple histone modifications and proteins surrounding
the PEAC-seq off-targets. b Signals of the DSB surrounding the PEAC-seq translocation sites (left panel) and random
controls (right panel).

The limitation of this study, however, is that the insertion efficiency of the PEAC-seq tag might vary across
different pegRNAs and at different off-targets. For each pegRNA, the RNA secondary structure of the insertion tag and
sequence uniqueness to the host genome could vary. But if the aforementioned guidelines were taken into account, this
sequence is interchangeable, and we have supplied a few additional tested sequences (Table S11 and S12). Regarding
the PBS (primer binding site) length, we inherited a 13-nt design according to the native PE system 18, although both the
13-nt and 17-nt worked equally well in our hands. And the PBS sequences, which were derived from the on-target sites,
can be different at off-target sites. Mismatches between the PBS and the spacer sequences at off-target sites might affect
primer extension in the reverse transcription and result in low insertion efficiencies of the PEAC-seq tag. Actually, the two
missing off-targets in the VEGFA TS1 site include PBS mismatches at positions 14 and 17 (5’ to 3’) at the off-target sites
(Fig. 1c and Table S10), which are proximal to the starting point of primer extension of reverse transcription
(Supplementary Fig. 12a). GUIDE-seq-unique off-targets in the VEGFA TS2 and VEGFA TS3, not verified by Amplicon-seq
though, also contained relatively more PBS mismatches compared to the shared and the PEAC-seq-unique off-targets
(Supplementary Fig. 12b, c). However, many off-targets with PBS mismatches were successfully identified by PEAC-seq,
indicating the complication of the effects of PBS mismatches on reverse transcription. Nevertheless, we propose to include
a few random nucleotides in the PBS regions of pegRNA (mut-pegRNA) (e.g., proximal to the primer extension site) to
improve the extension efficiency at off-targets with PBS mismatches (Supplementary Fig. 13). According to this study’s
PEAC-seq and ePEAC-seq data, pegRNA designed from the on-target sequence could enable PEAC-seq tag insertion in
most off-target sites, and the incorporation of mut-pegRNA might improve the insertion efficiency of PEAC-seq tags in
some off-target sites with critical PBS mismatches. Besides, reverse transcriptase evolving for error-correcting activity
(e.g., error-correcting reverse transcriptase29) may further improve the primer extension efficiencies. If proper enzyme
could be evolved and characterized, the 3’ to 5’ exonuclease activity could correct mismatches between PBS and off-
targets.

In summary, we adopted the Prime Editor system to report CRISPR off-targets in cellula and in vivo, and Cas9-
dependent DNA rearrangement. Using pegRNA to provide a sequence-optimized template, PEAC-seq further diversified
the CRISPR off-target identification toolbox and provided a reliable solution to directly identify off-targets for in vivo
editing and recognize DNA rearrangements, which both would strengthen our ability to assess the genotoxicity in CRISPR
therapies.

Methods

Ethical statement

The animal experiments of this study comply with animal protocols approved by the Laboratory Animal Resource
Center (LARC) at Westlake University.

PEAC-seq in HEK293T cell

We adopted the Prime Editor system by replacing the Cas9 nickase with wildtype Cas9 and modified the RT
template of pegRNA, and assembled them into a single vector as the PEAC-seq backbone. The spacer sequences
targeting VEGFA, EMX1, RFN2, and FANCF were cloned into the PEAC-seq backbone individually. To conduct PEAC-seq in
living cells, HEK293T cells were seeded in a 12-well plate and grow till ~80% confluency. Each well was transfected with
3 μg plasmids by Lipofectamine 3000. The post-transfection cells were collected after 48 h. The cell sorter (SONY MA900)
was used to sort about 100,000 GFP-positive cells (Supplementary Fig. 14). About 500 ng extracted gDNA was digested
with NotI then cleaned up with 0.5× AMPure XP beads to remove the carryover plasmids. The gDNA fragments were
retained on the AMPure XP beads, and on-beads Tn5 digestion was performed at 55 °C for 1 h, and adapters were
inserted at the ends of the fragments. The Tn5 was expressed and embedded with the adapters in-house. At the end of
the Tn5 digestion, 6 μL 0.2% SDS was added to terminate the reaction. The products were purified and size-selected by
1.5× AMPure XP beads and eluted in 50 μL H 2O. The 21 bp insertion sequence was used to enrich the editing sites (both
on-target and off-target) in the NGS library preparation. In the first round of the nested PCR, two separate reactions were
performed. Each reaction used a 20 μL template in a total of 50 μL volume at ~30 cycles. One used the PEAC-seq
insertion sequence as the forward primer binding site and the downstream Tn5 adapter as the reverse primer binding
site. Another used the upstream Tn5 adapter as the forward primer binding site and the PEAC-seq insertion sequence as
the reverse primer binding site. In all, 2.5 μL first round product was used as the template in the second round
amplification in a total of 50 μL volume at 17 cycles, and Illumina adapters were added. The amplicons were purified by
AMPure XP beads using 0.6× + 0.25× double size selection. The library was sequenced on the Illumina Novaseq platform
as paired-end 150 bp.

The oligo and vectors are summarized in Supplementary Data 9.

Translocation characterization

To identify the translocated sequences, we designed two nested PCR primers upstream of the gRNA. The site-
specific nested PCR primers served as forward primers, and downstream Tn5 primer served as reverse primer. The nested
primers were sequentially used to amplify the adjacent sequences of translocated DSBs. About 300 ng PEAC-seq gDNA
was fragmentized by Tn5, purified with 1.5× AMpure XP beads and eluted with 23 μL H 2O. About 20 μL purified DNA was
used as template for the first round PCR for 20 cycles. And 2.5 μL products from the first PCR was used as template for
another 20 cycles in the second round of the nested PCR. Another 20-cycle PCR was conducted to add the sequencing
adapters. The amplicons were purified by 0.6× and then 0.25× double-size beads selection. The library was sequenced
on the Illumina Novaseq platform as paired-end 150 bp.

In the DNA translocation analysis, we summarized the reads number and reads orientation from the forward and
backward PCR libraries around the on-target and candidate off-target sites. A DNA translocation score was calculated as
“translocation reads number”/(“normal reads number” + “translocation reads number”).

The oligo and vectors are summarized in Table S9.


In vivo off-target detection by PEAC-seq

Both the pegRNA and the mRNA of Cas9-MMLV were prepared by in vitro transcription. The DNA template of
pegRNA was amplified from the plasmids “pcsk9-sgRNA” and “mPnpla-sgRNA” by primers T7F and T7R. The PCR products
were gel purified using MinElute Gel Extraction Kit (QIAGEN #28606), which was used as the template for in vitro
transcription by HiScribe T7 Quick High Yield RNA Synthesis Kit (NEB #E2050S). The pCMV-Cas9-PE2 plasmid was
linearized by MssI (Thermo #FD1344). According to the manufacturer’s instructions, 1 μg linearized product was used as
a template to generate Cas9-PE mRNA from in vitro transcription by HiScribe T7 ARCA mRNA Kit (NEB #E2060S).

C57BL/6 and ICR mice were purchased and housed in the Laboratory Animal Resource Center (LARC) at Westlake
University. The LARC is a certified pathogen-free and environmental-control facility (21 ± 2 °C, 55 ± 15% humidity, and
12:12-h light:dark cycle). The C57BL/6 mice were used for embryo collection, and ICR females were used as recipients.
All animal experiments were conducted under the protocol approved by the animal care and ethical committee of
Westlake University.

Six-week-old C57BL/6 female mice were superovulated by injecting 5 IU of PMSG (Pregnant Mare Serum
Gonadotropin; ProSpec #HOR-272), then followed by 5 IU of hCG (human chorionic gonadotropin; ProSpec # HOR-250)
after 48 h. The C57BL/6 females were then mated to 8-week-old C57BL/6 males. After 16 h, fertilized embryos were
collected and placed in EmbryoMax M2 Medium with Hyaluronidase (Millipore #MR-051-F). After the cumulus cells fell off,
embryos were transferred into a dish containing 2 mL of fresh M2 medium (Millipore #MR-015-D). Embryos were then
flushed several times to rinse off the hyaluronidase and cumulus cells. Afterward, embryos were transferred into a dish
with prewarmed KSOM medium (Millipore #MR-106-D) covered by mineral oil followed by three additional washes.

The mixture of Cas9-PE2 mRNA (100 ng/μL) and pegRNA (50 ng/μL) was injected into the cytoplasm of the
zygote in M2 medium. The injection was conducted using a microinjector (NARISHIGE #IM-400B) with constant flow
settings. The injected embryos were cultured in KSOM medium with amino acids in a cell culture incubator at 37 °C and
with 5% CO2, then were transplanted into oviducts of pseudopregnant ICR females at 0.5 dpc. Pups were sacrificed at
E19.5–E21, and organs were collected, dissected, and snap-frozen in liquid nitrogen. Samples were stored at −80 °C until
further analysis.

The gDNA from organs was extracted using TIANamp Genomic DNA Kit (TIANGEN #DP304-03) according to the
manufacturer’s instructions. Nested PCR was applied to amplify the targeting regions and attach the Illumina adapters to
amplicons. The in vivo PEAC-seq library was constructed as the cell line data in the previous section by Tn5
fragmentation.

The oligo and vectors are summarized in Supplementary Data 10.

Data analysis

The PEAC-seq data were analyzed using a modified pipeline from GUIDE-seq5. Firstly, we trimmed adapters using
cutadapt30, and reads without appropriate adapter were removed. Then the reads were mapped to the human or mouse
genome (hg38, mm10) using bwa. Reads mapped to the same location and shared the same UMI were considered PCR
duplicates and merged in the following analysis. In order to fit in the target identification pipeline from GUIDE-seq, the
reads name from bam files was modified, and the bam files from the forward and backward PCR were labeled and
merged. Modifications were made to the pipeline to remove reads originating from random priming. In summary, we
normalized the reads number from the GUIDE-seq output file to reads per million and calculated the number of reads
with correct primer extension. The candidate sites meet the following criteria: (1) no signal in the wild-type control
sample; (2) the number of reads with correct primer extension sequence ≥1 at least in one direction, and the geometric
mean of the primer extension reads >0; (3) correct reads strand information on both the upstream and downstream of
the putative gRNA cutting site. The Amplicon-seq data was analyzed using CRISPResso 2.13 (--
max_paired_end_reads_overlap 140 --min_paired_end_reads_overlap 10 --exclude_bp_from_left 0 --
exclude_bp_from_right 0 --plot_window_size 40 --min_frequency_alleles_around_cut_to_plot 0.1)31.

Visualization of the genomic co-localizations between the PEAC-seq off-targets and epigenetic signals

The bigWig files of ATAC-seq and ChIP-seq datasets from HEK293T cells (H3K4me3, H3K9ac, H3K27ac,POLR2A,
EP300, H2AFZ) were downloaded from epimap (https://epigenome.wustl.edu/epimap/data/imputed/)28. Deeptools
“computeMatrix” (command: --referencePoint center --afterRegionStartLength 5000 --beforeRegionStartLength 5000 -p
15 --binSize 500) and “plotHeatmap” function32 were used to visualize the genomic co-localizations between all in vitro
PEAC-seq off-target sites and epigenetic signals. DSBs hotspots were identified from the dsODN only control (no
Cas9/gRNA) from the GUIDE-seq performed in the 293T cells. Control genomic regions, which were equally sized regions
randomly across the genome, were generated with the in-house perl script. Deeptools “computeMatrix” and
“plotHeatmap” function were used to plot the heatmap of the genomic co-localizations between the translocation sites
and DSBs.

Statistics and reproducibility

No statistical method was used to predetermine sample size. No data were excluded from the analyses. The
experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome
assessment.

STAT3 suppression and β-cell ablation enhance α-to-β reprogramming mediated by Pdx1

As diabetes results from the absolute or relative deficiency of insulin secretion from pancreatic β cells, possible
methods to efficiently generate surrogate β cells have attracted a lot of efforts. To date, insulin-producing cells have been
generated from various differentiated cell types in the pancreas, such as acinar cells and α cells, by inducing defined
transcription factors, such as PDX1 and MAFA, yet it is still challenging as to how surrogate β cells can be efficiently
generated for establishing future regenerative therapies for diabetes. In this study, we demonstrated that the exogenous
expression of PDX1 activated STAT3 in α cells in vitro, and STAT3-null PDX1-expressing α cells in vivo resulted in efficient
induction of α-to-β reprogramming, accompanied by the emergence of α-cell-derived insulin-producing cells with silenced
glucagon expression. Whereas β-cell ablation by alloxan administration significantly increased the number of α-cell-
derived insulin-producing cells by PDX1, STAT3 suppression resulted in no further increase in β-cell neogenesis after β-
cell ablation. Thus, STAT3 modulation and β-cell ablation nonadditively enhance α-to-β reprogramming induced by PDX1,
which may lead to the establishment of cell therapies for curing diabetes.

Introduction

As diabetes mellitus results from the absolute or relative deficiency of insulin secretion from pancreatic β cells1,
the generation of insulin-producing cells has been a target for the cure of diabetes. To reach this ultimate goal, many
attempts have been made to generate surrogate β cells from human embryonic stem cells 2,3 or other differentiated cell
types, such as pancreatic acinar cells, ductal cells, and glucagon-expressing α cells4,5,6. We consider that α cells are a
prime candidate for reprogramming into β cells, because they are developmentally closely related to β cells. Notably,
mouse islet α cells transdifferentiate into β cells under conditions of extreme β-cell loss, which is a condition similar to
type 1 diabetes7,8. We and others have previously shown that the ectopic expression of β-cell specific transcription
factors, pancreatic and duodenal homeobox 1 (PDX1) and musculoaponeurotic fibrosarcoma oncogene family A (MAFA),
converts both embryonic or adult α cells into insulin-producing cells in mice9,10. Likewise, human islet α cells can be
reprogrammed into insulin-positive cells by adenovirus-mediated expression of PDX1 and MAFA11. Thus, while both PDX1
and MAFA have been demonstrated to play essential roles in α-to-β reprogramming, identifying efficient methods to
generate surrogate β cells, which will lead to the establishment of regenerative therapies for people with diabetes, still
remains a challenge.

Modulating cellular plasticity may be the key to improve reprogramming efficiency into the β-cell lineage. STAT3
has been demonstrated to play a role in regulating cellular plasticity in various cell types, such as pluripotent stem
cells12,13 and hematopoietic cells14. In addition, we previously demonstrated that STAT3 regulates cellular identities in
pancreatic acinar cells6,15. Notably, activating mutations in human STAT3 have been reported to cause neonatal diabetes
accompanied by β-cell failure16,17. Thus, as proper STAT3 activity appears to be essential for determing the cellular
identities of pancreatic cells as well as other cells types, we hypothesized that modifying STAT3 signaling may contribute
to the reprogramming efficiency into insulin-producing cells, not only for acinar cells but also for α cells. To address this
issue, we developed experimental models to investigate the role of STAT3 in α-to-β reprogramming induced by Pdx1, and
demonstrated that the suppression of STAT3 signaling enhances the reprogramming efficiency into β cells.

Another key to enhance cellular reprogramming into β cells is the extreme ablation of β cells, as previously
demonstrated7,8. To further investigate the effects of β-cell ablation on α-to-β reprogramming, we induced the ectopic
expression of PDX1 in α cells combined with β-cell ablation by injecting alloxan, demonstrating that extreme β-cell loss
robustly enhances α-to-β reprogramming induced by Pdx1, although there was no additive effect by Stat3 inhibition.

Results

Ectopic expression of Pdx1 induced STAT3 activation in α cells

STAT3 has been shown to be activated in pancreatic acinar cells that ectopically express Pdx16,15. To investigate
whether STAT3 is activated in pancreatic α cells as well as in acinar cells, an adenoviral vector expressing Pdx1 (Ad-Pdx1)
was infected into αTC1 cells, a mouse glucagonoma cell line. Immunoblotting for phosphorylated STAT3 (pSTAT3) at
Tyr705 demonstrated a significant increase in pSTAT3 levels in αTC1 cells infected with Ad-Pdx1, compared with control
αTC1 cells infected with a green-fluorescent protein (GFP)-expressing adenovirus (Ad-GFP) 72 h after adenoviral infection
(Fig. 1A,B). In addition, immunocytochemical staining clearly detected pSTAT3 protein in Ad-Pdx1-infected αTC1 cells,
with high expression of PDX1, whereas few nuclei were positive for pSTAT3 in cells with weak expression of PDX1, and in
control cells infected with Ad-GFP (Fig. 1C). When αTC1 cells were infected with an adenoviral vector expressing Mafa
(Ad-Mafa), another β-cell-specific transcription factor, the exogenous expression of Mafa did not activate STAT3 (Fig.  S1).
This is in contrast to our previous findings in mPAC cells, which exhibit pancreatic progenitor-like characteristics6. Taken
together, these findings suggest that Pdx1 activates STAT3 in α cells as well as in acinar cells.

Figure 1

Ectopic expression of Pdx1 induces STAT3 activation in α cells. (A) αTC1cells were infected with Pdx1-expressing
adenovirus (Ad-Pdx1) or a control adenovirus expressing GFP(Ad-GFP). Protein levels of phosphorylated STAT3 (pSTAT3),
total STAT3, and GAPDH were assessed by Western blotting. IL-6-treated cells were used as a positive control. (B)
Quantification of the Western blot shown in (A). The expression levels of pSTAT3 were normalized to those of total
STAT3. *p < 0.05 (n = 3 for each group) (C) Immunocytochemical staining for Pdx1 (white) and phospho-STAT3
(pSTAT3, red) was performed in αTC1 cells 72 h after infection with a control adenovirus (Ad-GFP) or an adenovirus
expressing Pdx1(Ad-Pdx1). STAT3 was activated in αTC1 cells with high expression of PDX1 (arrows), whereas STAT3
was not activated in cells with weak expression of PDX1 (arrowhead). Scale bars, 20 μm. The original blots/gels are
presented in Fig. S10.

STAT3 deletion enhances α-to-β reprogramming induced by Pdx1


Pdx1 has been shown to change α-cell fate into the β-cell lineage both in vitro and in vivo 18,19. On the other
hand, we previously demonstrated that Stat3 plays a role in modulating cell fates orchestrated by Pdx16,15. These
findings led us to hypothesize that Stat3 plays a role in modulating the α-to-β reprogramming induced by Pdx1. When
Stat3 signaling was suppressed in Ad-Pdx1-infected αTC1 cells using an adenovirus expressing a dominant-negative form
of STAT3 (Ad-Stat3-DN), there was no difference in the expression levels of Ins1 and Ins2 mRNAs between cells with and
without Ad-Stat3-DN (Fig. S2). Next, to investigate our hypothesis in vivo, Gcg-CreER mice, in which Cre-mediated
recombination is induced in α cells, were sequentially crossed with CAG-CAT-Pdx1FLAG mice and floxed Stat3 mice to
generate Gcg-CreER; CAG-CAT-Pdx1FLAG; Stat3flox/flox (αPdx1; Stat3KO) mice (Fig. 2A). Six-week-old αPdx1; Stat3KO mice were
subcutaneously injected with tamoxifen to induce the ectopic expression of Pdx1, together with STAT3 deficiency,
specifically in α cells. Double immunostaining against FLAG-tag and pSTAT3 demonstrated that 7.8% ± 2.3% of FLAG-tag-
positive cells expressed pSTAT3 in αPdx1; Stat3Hetero mice 7 days after tamoxifen administration, whereas few FLAG-tag-
positive cells (0.6% ± 0.3%) were positive for pSTAT3 in αPdx1; Stat3KO mice (Fig. S2). The substantial decrease in the
number of pSTAT3/FLAG-tag double-positive cells was also observed 1 and 3 days after tamoxifen administration
in αPdx1; Stat3KO mice, suggesting that Stat3 was inactivated after Cre-mediated recombination, as originally designed
(Fig. S3). As shown in the Fig. 2B, a substantial number of α cells expressed exogenous Pdx1 detected by anti-FLAG
antibody. FLAG-tag-labeled insulin-producing cells were also detected, which are α-cell-derived insulin-producing cells. We
calculated the α-to-β reprogramming efficiency as the number of FLAG-tag/insulin double-positive cells divided by the
total number of FLAG-tagged cells (Fig. 2C), which demonstrated that Stat3 deletion resulted in a significantly larger
number of α-cell-derived insulin-producing cells in the islets of αPdx1; Stat3KO mice than in the islets of Gcg-CreER; CAG-
CAT-Pdx1FLAG; Stat3flox/+ (αPdx1; Stat3Hetero) mice (30.9% ± 1.9% vs. 16.0% ± 1.1%) 2 weeks after Cre-mediated
recombination. Evaluation at shorter time periods after Cre-mediated recombination resulted in no significant differences
in α-to-β reprogramming efficiency between the groups (Fig. S4). STAT3 deletion itself, without the ectopic expression of
Pdx1, did not increase α-to-β reprogramming efficiency (Fig. S5). These findings demonstrate that Stat3 deletion
enhances α-to-β reprogramming in vivo induced by Pdx1.

Figure 2

v
STAT3 deletion enhances α-to-β reprogramming induced by Pdx1. (A) Schematic representation of the
transgenes and their Cre-mediated recombination in Gcg-CreER; CAG-CAT-Pdx1FLAG; Stat3flox/flow (αPdx1; Stat3KO) mice.
Before recombination, the transcription of Pdx1 is blocked by the floxed STOP cassette. When the mice are treated with
tamoxifen, the floxed sequence is removed by Cre recombinase, and the CAG promoter activates the expression of Pdx1.
Likewise, the loxP-flanked (floxed) Stat3 gene is removed. (B) Immunostaining for FLAG-tagged Pdx1 (blue), glucagon
(green) and insulin (red) in the pancreas of αPdx1; Stat3KO mice 2 weeks after tamoxifen administration. The arrowheads
show FLAG-tag positive cells which express insulin. Scale bars, 50 μm. Magnified images of the dotted square regions are
shown below each image. (C) The percentage of reprogrammed-β cells among FLAG-tag positive cells. *** p < 0.005 (n = 
5 in each group).

STAT3 deletion together with Pdx1 expression induces α-to-β reprogramming with silenced glucagon expression

We next performed triple immunostaining against FLAG-tag, insulin, and glucagon, and found that there were two
types of FLAG-tagged, i.e., α-cell-derived, insulin-expressing cells; some cells expressed both insulin and glucagon, which
denotes α-to-bihormonal conversion, whereas other cells expressed FLAG-tagged PDX1 and insulin without expressing
glucagon, which denotes glucagon-silenced α-to-β conversion (Fig. 3A). The number of α-cell-derived insulin-expressing
cells, which were negative for glucagon, was significantly larger in the pancreata of αPdx1; Stat3KO mice than in the
pancreata of αPdx1; Stat3Hetero mice (41.9% vs. 16.8%, Fig. 3B). As insulin/glucagon double-positive cells are thought to
be immature cells in the developing pancreas20, this finding suggests that Stat3 suppression enhanced more advanced α-
to-β conversion induced by Pdx1.

Figure 3

The Characteristics of α-cell-derived insulin-producing cells. (A) Representative images of two types of insulin-
producing cells derived from α cells. Immunostaining for FLAG-tagged Pdx1 (blue), glucagon (green), and insulin (red)
was performed on pancreas sections of αPdx1; Stat3KO mice. A FLAG-tag/insulin double positive cell that is negative for
glucagon staining (arrow), which denotes Gcg-silenced α-to-β conversion, and a FLAG-tag positive cell expressing both
insulin and glucagon (arrowhead), which denotes α-to-bihormonal transition are shown. Scale bars, 10  μm. (B) The
percentage of α-cell-derived insulin-expressing cells that do not express glucagon among total α-cell-derived β cells. *** p 
< 0.005 (n = 5 in each group). (C) Immunostaining for FLAG-tagged Pdx1 (green), Nkx6.1 (red) and insulin (white) in
pancreas sections of αPdx1; Stat3KO mice 2 weeks after tamoxifen administration. The arrowhead indicates a FLAG-tag-
positive cell that expresses both Nkx6.1 and insulin. The arrow indicates FLAG-tag/insulin double positive cells that lack
Nkx6.1. Scale bars, 50 μm. Magnified images of the dotted square regions are shown below each image. (D) Percentage
of Nkx6.1-positive cells among FLAG-tag/insulin double-positive cells (α-cell-derived β cells). Pancreata of Stat3-knockout
mice have a significantly higher number of α-cell-derived Nkx6.1-expressing β cells than pancreata of control mice. * p < 
0.05 (n = 3 in each group). (E) Immunostaining for FLAG-tagged Pdx1 (white), UCN3 (green) and insulin (red) in
pancreas sections of αPdx1; Stat3KO mice 2 weeks after tamoxifen administration. The arrowhead indicates a FLAG-tag
positive cell that expresses both urocortin3 and insulin. The arrow indicates a FLAG-tag/insulin double-positive cell that
lacks urocortin3. Magnified images of the dotted square regions are shown below each image. (F) The percentage of
urocortin3-positive cells among FLAG-tag/insulin double-positive cells (α-derived β cells, n = 3 in each group).

Stat3 deletion modifies the characteristics of α-cell-derived β cells

To further investigate the characteristics of α-cell-derived insulin-expressing cells, immunostaining against NKX6.1
and urocortin 3 (UCN3), which are highly expressed in endogenous β cells21,22, was performed. Although the number of
α-cell-derived Nkx6.1-expressing cells at 1, 3, and 7 days after tamoxifen administration was comparable between Stat3-
heterozygous and Stat3-deficient mice, it was significantly increased 2 weeks after tamoxifen administration in Stat3-
deficient mice compared with Stat3-heterozygous mice (57.4% ± 15.7% vs. 13.1% ± 2.3%, respectively, Fig. 3C,D, and
Fig. S6). In contrast, there was no significant difference in the number of UCN3-expressing cells in both groups (Fig. 3E,F,
and Fig. S7). These findings suggest that Stat3 deletion together with the ectopic expression of Pdx1 may endow α cells
with some β-cell characteristics to some extent, and further steps are necessary to induce the cellular reprogramming of
α cells into more fully differentiated β cells that are indistinguishable from endogenous β cells.

Alloxan-induced β-cell ablation promotes α-to-β reprogramming induced by Pdx1

It has been reported that extreme β-cell ablation induces α-to-β conversion in mice7,8. To investigate whether β-
cell ablation affects the reprogramming efficiency and/or the characteristics of reprogrammed β cells in our experimental
model, we induced β-cell ablation by injecting alloxan (ALX) into αPdx1; Stat3KO mice and control αPdx1; Stat3Hetero mice
(Fig. 4A and Fig. S8). The reprogramming efficiency induced by Pdx1 was significantly increased after β-cell ablation in
both mice with heterozygous and homozygous mutations of the Stat3 gene, compared with normoglycemic αPdx1;
Stat3Hetero mice without ALX injection (Fig. 4B,C). In contrast, there was no difference in reprogramming efficiencies
between Stat3KO and Stat3Hetero mice after β-cell ablation (42.6% ± 6.0% vs. 42.1% ± 4.1%, Fig. 4C). There was no
significant difference in α-cell mass between the groups, with or without ALX injection (Fig.  4D), showing that neither the
ectopic expression of Pdx1 nor β-cell ablation by ALX substantially affected the homeostasis of α-cell volume. These
findings suggest that STAT3 deletion and β-cell ablation nonadditively enhance the α-to-β reprogramming induced by
Pdx1.

Figure 4
Alloxan-induced β-cell ablation promotes α-to-β reprogramming by Pdx1. (A) Experimental design of β-cell
ablation and induction of α-to-β reprogramming. At 6 weeks of age, alloxan was administered into the αPdx1;
Stat3KO mice. For the induction of Cre-mediated recombination, mice were subcutaneously injected with 4 mg of
tamoxifen 3 times over a 1-week period. (B) Immunostaining for FLAG-tagged Pdx1 (green), glucagon (white) and insulin
(red) in pancreas sections of αPdx1; Stat3KO mice 2 weeks after alloxan and tamoxifen administration. Scale bars, 50 μm.
Magnified images of the dotted square regions are shown below each image. (C) The percentage of reprogrammed-β
cells among FLAG-tag positive cells. ** p < 0.01, ***p < 0.005 (n = 3–5 in each group) (D) Percentage of α cell area
among whole pancreas area (n = 3 in each group).

Reduced α-to-β reprogramming efficiency in aged mice

To investigate whether aging affects α-to-β reprogramming efficiency in mice, the number of α-cell-derived
insulin-producing cells was quantified in Stat3KO and Stat3Hetero mice at the age of 28 weeks or older. As shown in Fig. S9,
there were fewer reprogrammed β cells in the aged mice than in 8-week-old mice (Fig. 2). In addition, STAT3 deletion did
not significantly enhance α-to-β reprogramming.

Discussion

Whereas the transcription factor Pdx1 has been demonstrated to endow pancreatic cells with some β-cell
characteristics both in vitro and in vivo18,19, the generation of fully functional β cells from α cells remains a challenge.
We previously reported that the suppression of STAT3 signaling enhanced the cellular reprogramming of acinar cells into
β cells, and the present study further demonstrated the significance of Stat3 signaling in α-to-β reprogramming in mice.

STAT3 has been shown to play various roles in cell differentiation and proliferation, and in maintaining
pluripotency in iPS/ES cells12,13, cancer stem cells23, and hematopoietic stem cells14. In addition, the crucial role of
STAT3 signaling in the regulation of pancreatic cellular plasticity has been demonstrated in previous in vivo studies,
including ours6,15,24. Furthermore, activating mutations in STAT3 have been reported to cause neonatal diabetes in
humans16,17. Based on these previous findings together with our present study, STAT3 signaling appears to play an
essential role in maintaining the cellular identity of pancreatic α cells as well as acinar cells, and the suppression of STAT3
can enhance cellular reprogramming into β cells, orchestrated by the ectopic expression of PDX1.

Previous studies have demonstrated that the combined ectopic expression of Pdx1 and Mafa efficiently induces
the cellular reprogramming of α cells into insulin-producing cells in mice and humans9,10,11. Although the transgenic
expression of Pdx1 alone in the α-cell lineage without β-cell ablation was shown to have no or little effect on β-cell
genesis9,25, the exogenous expression of Pdx1 alone successfully induced α-to-β reprogramming in our present study. As
the studies by Matsuoka et al. and Cigliola et al. both used the CAG-CAT-Pdx1 FLAG mice that we previously generated26,
the expression levels of PDX1 in the mice are expected to be the same after Cre-mediated recombination in our study and
these previous studies. One of the obvious differences is that tamoxifen-inducible Gcg-CreER mice were used to induce
Cre-mediated recombination in the present study, whereas Gcg-Cre or Gcg-rtTA; TetO-Cre mice were used in other
studies. Tamoxifen treatment may have a beneficial effect in enhancing α-to-β reprogramming. In addition, even
heterozygous deletion of the Stat3 gene may affect α-cell plasticity in αPdx1; Stat3Hetero mice. Interestingly, another
previous study showed that the ectopic expression of Pdx1 alone into sorted human α cells using adenovirus induced
insulin gene expression in α cells11. Thus, the ectopic expression of Pdx1 alone is likely to endow α cells with β-cell
characteristics under some specific experimental conditions.

Not only STAT3 deletion but also β-cell ablation enhanced the α-to-β reprogramming induced by Pdx1. As there
was no additive effect between Stat3 inhibition and β-cell ablation, insulin insufficiency by β-cell ablation may stimulate
the same downstream pathways as Stat3 inhibition. Another possibility is that Stat3 inhibition may induce α-to-β
reprogramming in coordination with insulin signaling, and may have little effect under insulin insufficiency. Further studies
are needed to clarify the underlying molecular mechanisms involved, to maximize the reprogramming efficiency into β
cells, which is expected to lead to the establishment of future cell therapies for the cure of diabetes.

Methods

Cell culture

The mouse pancreatic α-cell line αTC1 (clone 6) was purchased from American Type Culture Collection
(Manassas, VA, USA). The cells were cultured in DMEM with 10% fetal bovine serum and incubated at 37 °C in an
atmosphere of 5% CO2 in air.

Preparation of adenoviruses.

Recombinant adenoviruses expressing Pdx1 (Ad-Pdx1) were generated as described previously27. As each
adenovirus used in this study carries GFP, adenovirus-infected cells are labeled with green fluorescence. An adenovirus
expressing only GFP was used as a control (Ad-GFP).

Western blotting

Whole-cell protein extracts were isolated using RIPA lysis buffer (Thermo Scientific, Rockford, IL, USA) containing
protease inhibitor cocktail (Thermo Scientific). Ten micrograms of total proteins were loaded and fractionated by SDS-
PAGE, transferred to nitrocellulose membranes (Merck Millipore, Darmstadt, Germany), and probed with primary
antibodies against pSTAT3, total STAT3 (rabbit, 1:1000; Cell Signaling Technology, Danvers, MA, USA), and GAPDH
(rabbit, 1:1000; Cell Signaling Technology). Immunoreactivity was visualized using SuperSignal West Extended Duration
Substrate (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer’s instructions. The protein
extracts from αTC1 cells treated with IL-6 were used as a positive control. The expression levels of pSTAT3 were
normalized to those of total STAT3.

Mice

CAG-CAT-Pdx1FLAG, Gcg-CreER, ROSA26mTmG, and floxed-Stat3 mice were generated as previously


described6,26,28,29,30,31. Gcg-CreER mice, which express tamoxifen-activated Cre recombinase in α cells, were crossed
with Pdx1FLAG mice to induce α-to-β reprogramming. Floxed Stat3 mice were repeatedly crossed with Gcg-CreER;
Pdx1FLAG mice to generate Gcg-CreER; CAG-CAT-Pdx1FLAG; Stat3KO mice. Gcg-CreER; CAG-CAT-Pdx1FLAG; Stat3KO or
control Gcg-CreER; CAG-CAT-Pdx1FLAG; Stat3Hetero mice are viable, fertile, and indistinguishable from their wild-type (WT)
littermates with respect to weight, blood glucose, and glucose tolerance. To induce Cre-mediated recombination,
tamoxifen (Sigma Aldrich, St. Louis, MO, USA) was dissolved in corn oil at 20  mg/mL and injected subcutaneously at
2 mg/10 g body weight, 3 times over a 1-week period. The mice were euthanized at 1, 3, 7, and 14 days after tamoxifen
administration.
At 6 weeks of age, ALX (Sigma Aldrich) was administered into the mice as a single intravenous injection at a dose
of 100 mg/kg body weight through the tail vein. For induction of Cre-mediated recombination, tamoxifen was
subcutaneously injected 3 times over a 1-week period.

Mice were maintained on a 12-h light/dark cycle in a controlled atmosphere and fed standard rodent food. The
study protocol was reviewed and approved by the Animal Care and Use Committee of Juntendo University. All methods
were carried out in accordance with relevant guidelines and regulations, and are reported in accordance with ARRIVE
guidelines.

Histological analysis

Tissues were harvested and fixed in 4% paraformaldehyde in PBS, and embedded in paraffin, for subsequent
sectioning (5-μm thickness). For immunofluorescence analysis of paraffin-embedded tissues, sections were deparaffinized
in xylene and dehydrated in graded ethanol before heat-induced epitope retrieval in a microwave oven (95 °C for 15 min
in 10 mM citrate buffer). Slides were then blocked in 1% horse serum (Vector Laboratries, Burlingame, CA, USA) and
incubated overnight in primary antibody. The primary antibodies used in this study were the following: rabbit anti-pSTAT3
(1:100; Cell Signaling Technology), guinea pig anti-insulin (1:5; Dako, Carpinteria, CA, USA), rat anti-insulin (1:200; R&D
Systems, Minneapolis, MN, USA), rabbit anti-glucagon (1:1000; Dako), guinea pig anti-glucagon (1:1000; Takara Bio,
Shiga, Japan), mouse anti-FLAG (1:200; TransGenic, Fukuoka, Japan). Then, slides were incubated with secondary
antibody for 30 min at room temperature. The secondary antibodies used were Alexa Fluor 633-conjugated anti-rat IgG,
Alexa Fluor 633-conjugated anti-guinea pig IgG, Alexa Fluor 568-conjugated anti-rat IgG, Alexa Fluor 555-conjugated
anti-rabbit IgG, Alexa Fluor 555-conjugated anti-guinea pig IgG, Alexa Fluor 488-conjugated anti-guinea pig IgG, Alexa
Fluor 488-conjugated anti-rabbit IgG, Alexa Fluor 488-conjugated anti-rat IgG (all at 1:200; Invitrogen, Carlsbad, CA,
USA). Cell nuclei were stained with 4,6-diamidino-2-phenylindole (DAPI; Vector Laboratories). After washing in PBS, slides
were mounted in Vectashield mounting medium (Vector Laboratories). Images were captured with a laser scanning
confocal microscope (Zeiss LSM 780).

Statistical analyses

Statistical analyses were performed using GraphPad Prism software (GraphPad Software, La Jolla, CA, USA).
Comparisons of two samples were performed by the unpaired two-tailed t-tests. Multiple groups were analyzed by one-
way ANOVA by a multiple comparison test. A p-value of less than 0.05 was considered to indicate a statistically significant
difference between two groups. All data are presented as the mean ± SE.
References

CRISPR engineering in organoids for gene repair and disease modelling

https://www.nature.com/articles/s44222-022-00013-5

Predicting prime editing efficiency and product purity by deep learning

https://www.nature.com/articles/s41587-022-01613-7

CRISPR-mediated generation and characterization of a Gaa homozygous c.1935C>A (p.D645E)


Pompe disease knock-in mouse model recapitulating human infantile onset-Pompe disease

https://www.nature.com/articles/s41598-022-25914-8

Efficient simultaneous double DNA knock-in in murine embryonic stem cells by CRISPR/Cas9
ribonucleoprotein-mediated circular plasmid targeting for generating gene-manipulated mice

https://www.nature.com/articles/s41598-022-26107-z

PEAC-seq adopts Prime Editor to detect CRISPR off-target and DNA translocation

https://www.nature.com/articles/s41467-022-35086-8

STAT3 suppression and β-cell ablation enhance α-to-β reprogramming mediated by Pdx1

https://www.nature.com/articles/s41598-022-25941-5

You might also like