Identification and Characterization of Essential Genes in The Human Genome
Identification and Characterization of Essential Genes in The Human Genome
Identification and Characterization of Essential Genes in The Human Genome
RE FE RENCES AND N OT ES Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings
1. G. Giaever et al., Nature 418, 387–391 (2002). of many biological processes. Using the bacterial clustered regularly interspaced short
2. A. H. Tong et al., Science 294, 2364–2368 (2001). palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA
3. S. Mnaimneh et al., Cell 118, 31–44 (2004). library to screen for genes required for proliferation and survival in a human cancer cell line.
4. J. E. Carette et al., Nature 477, 340–343 (2011).
Our screen revealed the set of cell-essential genes, which was validated with an orthogonal
5. J. E. Carette et al., Nat. Biotechnol. 29, 542–546 (2011).
6. J. E. Carette et al., Science 326, 1231–1235 (2009). gene-trap–based screen and comparison with yeast gene knockouts. This set is enriched for
7. L. T. Jae et al., Science 340, 479–483 (2013). genes that encode components of fundamental pathways, are expressed at high levels,
8. D. J. Kelleher, R. Gilmore, Glycobiology 16, 47R–62R (2006). and contain few inactivating polymorphisms in the human population.We also uncovered a large
9. A. Dumax-Vorzet, P. Roboti, S. High, J. Cell Sci. 126,
group of uncharacterized genes involved in RNA processing, a number of whose products
2595–2606 (2013).
10. O. Parnas et al., Cell 162, 675–686 (2015). localize to the nucleolus. Last, screens in additional cell lines showed a high degree of overlap in
11. G. Reiss, S. te Heesen, R. Gilmore, R. Zufferey, M. Aebi, gene essentiality but also revealed differences specific to each cell line and cancer type that
EMBO J. 16, 1164–1172 (1997). reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern,
12. A. H. Tong et al., Science 303, 808–813 (2004).
and chromosomal structure of each line.These results demonstrate the power of CRISPR-based
13. J. L. Hartman 4th, B. Garvik, L. Hartwell, Science 291,
1001–1004 (2001). screens and suggest a general strategy for identifying liabilities in cancer cells.
T
14. C. Boone, H. Bussey, B. J. Andrews, Nat. Rev. Genet. 8,
437–449 (2007). he systematic identification of essential a library, which was optimized for high cleavage
15. M. Schuldiner et al., Cell 123, 507–519 (2005).
16. S. M. Nijman, S. H. Friend, Science 342, 809–811 (2013).
genes in microorganisms has provided activity, and performed a proliferation-based screen
17. J. Tischler, B. Lehner, A. G. Fraser, Nat. Genet. 40, 390–391 critical insights into the molecular basis in the near-haploid human KBM7 chronic myelog-
(2008). of many biological processes (1). Similar enous leukemia (CML) cell line (Fig. 1, table S1, and
We defined the gene-trap score (GTS) as the a similar proportion of cell-essential genes on tiality, we also found that (i) our new optimized
fraction of insertions in a given gene occurring in all the autosomes (Fig. 2, A to C). These ob- sgRNA library gave better results than those
the inactivating orientation. Because the accu- servations indicate that (i) the vast majority of from screens using older unoptimized libraries
racy of this score depends on the depth of inser- cell-essential genes are haplosufficient and (ii) (4, 5) and (ii) the coverage of this library (~10
tional coverage, we set a requirement on the biallelic inactivation occurs at high frequency constructs per gene) approaches saturation, as
minimum number (n = 65) of antisense inserts in our CRISPR screen (4). evidenced by down-sampling (decreasing the
in a gene needed for inclusion in our analysis by To assess the accuracy of our scores with other coverage by randomly eliminating subsets of data)
measuring the concordance between replicate measures of gene essentiality, we relied on func- (fig. S2, A and B). Together, our results suggest that
experiments (Fig. 2B; fig. S1, C and D; table S4; tional profiling experiments conducted in yeast scores from the CRISPR and gene-trap screens
and supplementary text S2). For the 7370 genes Saccharomyces cerevisiae as a benchmark (1, 10). both provide accurate measures of the cell-
on the haploid chromosomes that exceeded this Specifically, we ranked genes common to all data essentiality of human genes.
threshold, the GTS was well-correlated with the sets by their scores in each data set—CRISPR, Essential genes should be under strong puri-
CS and with results from a copublished study gene trap, scores from similar loss-of-function fying selection and should thus show greater
that used a similar gene-trap approach (r = 0.68) RNA interference (RNAi) screens (11), and as a evolutionary constraint than that of nonessential
(Fig. 2C; fig. S1, E and F; and supplementary text naïve proxy for gene essentiality, gene expression genes (12). Consistent with this expectation, the
S3). The strong correspondence between the levels determined by means of RNA sequencing essential genes found in our screens were more
overlapping sets of cell-essential genes defined (RNA-Seq)—and compared these rankings with broadly retained across species, showed higher
by the two methods provides support for the the essentiality of yeast homologs. The CRISPR levels of conservation between closely related
accuracy of the CRISPR scores for the full set of and gene-trap methods had significantly stron- species, and contain fewer inactivating poly-
18,166 genes. ger correlations with the yeast results than did morphisms within the human species, as com-
The two methods differed with respect to the the RNAi screens or gene expression, which per- pared with their dispensable counterparts (Fig. 2,
diploid chromosome 8. Whereas the gene-trap formed similarly to each other (both methods, E to G). Essential genes also tend to have higher
screen failed to detect any cell-essential genes on P < 10−4, permutation test) (Fig. 2D). On the basis expression and encode proteins that engage in
this chromosome, the CRISPR screen uncovered of additional comparisons with yeast gene essen- more protein-protein interactions (13–15). These
trap virus
Retroviral infection Score
Sense orientation 0.14 0.53
(gene-trap Gene-trap Antisense orientation (sense/total)
mutagenesis) (inactivating) integration (’harmless’)
GFP
GFP
Fig. 1. Two approaches for genetic screening in human cells. (Top) CRISPR/ contain fewer insertions in the inactivating orientation. Sample data for two
Cas9 method. Cells are transduced with a genome-wide sgRNA lentiviral library. neighboring genes—RPL14, encoding an essential ribosomal protein, and ZNF619,
Gene inactivation via Cas9-mediated genomic cleavage is directed by the encoding a dispensable zinc finger protein—are displayed. For CRISPR/Cas9,
20–base pair (bp) sequence at the 5′ end of the sgRNA. Cells bearing sgRNAs sgRNAs are plotted according to their target position along each gene, with the
targeting essential genes are depleted in the final population. (Bottom) Gene-trap height of each bar indicating the level of depletion. Boxes indicate individual exons.
method. KBM7 cells are transduced with a gene-trap retrovirus that integrates in For gene trap, the intronic insertion sites in each gene are plotted according to
an inactivating or “harmless” orientation at random genomic loci. Essential genes their orientation and genomic position. The height of each point is randomized.
patterns were also observed in our CRISPR data vide functional redundancy at the cellular level proteins found in the nucleolus and those con-
set (Fig. 2, H and I). (Fig. 2J). taining domains associated with RNA processing
In S. cerevisiae, genes with paralogous copies To examine the functions of the cell-essential (fig. S3, B and C) (21).
in the genome show a lower degree of essential- genes, we used gene set enrichment analysis We characterized three such genes—C16orf80,
ity, presumably because of at least partial func- (GSEA) and found strong enrichment for many C3orf17, and C9orf114—whose mRNA expression
tional overlap (16). Surprisingly, meta-analysis fundamental biological processes, such as DNA patterns across the Cancer Cell Line Encyclope-
of knockout mouse collections has suggested replication, RNA transcription, and mRNA trans- dia (CCLE) were correlated with that of genes
that there is no such correlation in mammals lation (fig. S3A) (20). Whereas most of the genes involved in RNA processing (Fig. 3B). We vali-
(17, 18). However, others have challenged this could be assigned to such well-defined pathways, dated the essentiality of these genes in short-term
interpretation because the genes analyzed were no function has been ascribed to ~330 of the cell- proliferation assays and detected localization
far from a random sample (19). Using the re- essential genes (18%) (Fig. 3A). For this set of un- of their products to the nucleus (C16orf80) or
sults from our genome-wide screens, we revisited characterized genes, an analysis of the domains nucleolus (C3orf17 and C9orf114) (Fig. 3, C and D)
this question and observed that genes with para- within their encoded gene products and compar- (22). Additionally, mass spectrometric analyses
logs are indeed less likely to be essential, which is isons with proteomic data sets from organellar of anti-FLAG-immunoprecipitates prepared from
consistent with the idea that paralogs can pro- purifications revealed substantial enrichment in KBM7 cells expressing FLAG-tagged C16orf80,
C3orf17, and C9orf114 revealed interactions with four cell lines, the KBM7 CRISPR results showed tions in the 5′-splice site of intron 8 that resulted
multiple subunits of the spliceosome, ribonucle- the highest correlation with the KBM7 gene-trap in the production of a truncated mRNA transcript
ase (RNase) P/MRP, and H/ACA small nucleolar data set, suggesting that the few differences ob- (Fig. 4, C and D). Conversely, DDX3Y was not
ribonucleoprotein (snoRNP) complexes, respec- served are likely to be biologically meaningful expressed in KBM7 cells and was not present in
tively (Fig. 3E). These results implicate C16orf80 (fig. S4A). K562 cells, which are of female origin (Fig. 4E).
in splicing, which is consistent with its associa- We focused first on genes found to be essen- Introduction of wild-type DDX3X cDNA into Raji
tion with mRNAs; C3orf17 in ribosomal RNA/ tial in only one of the four cell lines. The Raji, cells fully rescued the proliferation defect re-
tRNA processing; and C9orf114 in RNA modifi- Jiyoye, and KBM7 cell lines had 6, 7, and 19 such sulting from DDX3Y loss, indicating that the
cation (23). More broadly, our results indicate genes, respectively (fig. S4B and table S5). One paralogous genes are essential and functionally
that the molecular components of many critical example was DDX3Y, which resides in the non- overlapping (Fig. 4F). Essential paralogous gene
cellular processes, especially RNA processing, pseudoautosomal region of the Y chromosome pairs, involved in glucose metabolism (HK1/2 and
have yet to be fully defined in mammalian cells. and was required only in Raji cells (Fig. 4B). Its SLC2A1/3) and cell-cycle regulation (CDK4/6),
To determine how the set of essential genes X-linked paralog, DDX3X, was essential in KBM7 were also observed in the Jiyoye line (fig. S4C
differs among cell lines, we screened another and K562 cells (Fig. 4E). Both genes encode and supplementary text S4). Vulnerabilities due
CML cell line (K562) and two Burkitt’s lymphoma DEAD-box helicases that likely have similar cel- to the loss of a paralogous partner may serve as
cell lines (Raji and Jiyoye) using the CRISPR lular functions (24). Thus, the dependence on targets for highly personalized antitumor thera-
system (tables S2 and S3). Overall, the sets of one paralog might reflect functional absence of pies (25).
essential genes in the four cell lines showed a the other paralog. Indeed, DNA sequencing of In some cases, cell line–specific essentiality
high degree of overlap (Fig. 4A). Out of these DDX3X in Raji cells revealed hemizygous muta- of paralogous genes did not reflect differential
expression. For example, the transcription fac-
tors GATA1 and GATA2 are expressed in both
K562 and KBM7 cells, but the first is specifi-
cally essential in K562 cells and the second in
KBM7 cells (fig. S4D). These master regulators
are known to promote proliferation and sur-
vival during distinct developmental stages in
the hematopoietic lineage; GATA1 is required for
the survival of erythroid progenitors, and GATA2
is required for the maintenance and prolifera-
tion of immature hematopoietic progenitors (26).
These two cell types likely correspond to the
cells of origin of the two CML lines (27, 28). We
also identified similar instances of genes re-
region surrounding the JAK2 tyrosine kinase and 15 genes in the Burkitt’s lymphoma lines GRB2, and GAB2—scored strongly as well (ranked
(fig. S5, D and E). Together, these findings indicate (Fig. 4K and table S6). As a control, permuted 3, 4, and 7, respectively). Network analysis of
that lethality upon Cas9-mediated cutting may also comparisons—that is, a set containing of one the other top hits also uncovered several genes
reflect chromosome structure and therefore should CML and one Burkitt’s line versus the comple- encoding assembly factors for the electron trans-
be evaluated in light of copy-number information. mentary sets—showed roughly half as many “set- port chain, as well as enzymes involved in folate-
Last, we looked for consistent differences in specific” essential genes (fig. S6, A to C). mediated one-carbon metabolism. These results
essential genes between the two CML and two In the CML lines, the top two genes were suggest additional potential targets for CML
Burkitt’s lymphoma lines. Such genes might rep- BCR and ABL1, which is consistent with the therapy (table S6).
resent attractive targets for antineoplastic thera- known essentiality of the BCR-ABL translocation In the B cell–derived Burkitt’s lymphoma cell
pies because their inhibition is less likely to be product and the therapeutic effect of BCR-ABL lines, the top genes included three B cell–lineage
broadly cytotoxic. Overall, we identified 33 genes inhibitors such as imatinib (31). Additional mem- transcription factors EBF1, POU2AF1, and PAX5
that were specifically essential in the CML lines bers of the BCR-ABL signaling pathway—SOS1, (ranked 3, 6, and 8, respectively). Each of these
genes is the target of recurrent translocations 28. B. S. Andersson et al., Leukemia 9, 2100–2108 (1995). Massachussetts Institute of Technology Whitaker Health Sciences
in lymphoma (32–34). Enhancers of the cor- 29. S. Q. Wu et al., Leukemia 9, 858–862 (1995). Fund (T.W.). D.M.S. is an investigator of the Howard Hughes
30. A. Constantinou, K. Kiguchi, E. Huberman, Cancer Res. 50, Medical Institute. T.W., D.M.S., and E.S.L. are inventors on a
responding three gene loci all show a high level 2618–2624 (1990). U.S. patent application (PCT/US2014/062558) for functional
of bromodomain containing 4 (BRD4) occu- 31. B. Scappini et al., Cancer 100, 1459–1471 (2004). genomics using the CRISPR-Cas system, and T.W. and D.M.S. are
pancy in Ly1 cells, a related diffuse large B cell 32. S. Iida et al., Blood 88, 4110–4117 (1996). in the process of forming a company using this technology. The
33. H. Bouamar et al., Blood 122, 726–733 (2013). sgRNA plasmid library and other plasmids described here have
lymphoma cell line, suggesting bromodomain
34. S. Galiègue-Zouitina et al., C. R. Acad. Sci. III 318, 1125–1131 been deposited in Addgene.
inhibitors such as JQ1 as potential treatments (1995).
(35). Other selectively essential genes included 35. B. Chapuy et al., Cancer Cell 24, 777–790 (2013).
MEF2B—a transcriptional activator of BCL6— 36. R. D. Morin et al., Nature 476, 298–303 (2011).
SUPPLEMENTARY MATERIALS
and CCND3, both of which are frequently mu- ACKN OWLED GMEN TS www.sciencemag.org/content/350/6264/1096/suppl/DC1
tated and implicated in the pathogenesis of We thank T. Mikkelsen for assistance with oligonucleotide
Materials and Methods
various lymphomas (36). Intriguingly, the top Supplementary Text S1 to S5
synthesis; Z. Tsun for assistance with figures; C. Hartigan,
Figs. S1 to S6
two hits, CHM and RPP25L, do not appear to G. Guzman, M. Schenone, and S. Carr for mass spectrometric
Tables S1 to S6
have specific roles in B cells; rather, their dif- analysis; and J. Down and J. Chen for reagents for hemoglobin
References (37–50)
staining. This work was supported by the National Institutes
ferential essentiality is likely explained by the of Health (CA103866) (D.M.S.), the National Human Genome 17 October 2014; accepted 1 October 2015
lack of expression of their paralogs, CHML and Research Institute (2U54HG003067-10) (E.S.L.), an award from Published online 15 October 2015
RPP25, in both of the Burkitt’s lymphoma cell the National Science Foundation (T.W.), and an award from the 10.1126/science.aac7041
lines studied (fig. S6D).
We used two complementary and concordant
approaches, CRISPR and gene trap, to define the
cell-essential genes in the human genome. Al- GENOME EDITING
though the gene-trap method is suitable only
for loss-of-function screening in rare haploid
cell lines, the CRISPR method is broadly appli-
cable. Extending our analysis across different cell
Genome-wide inactivation of porcine
lines and tumor types, we developed a frame-
work to assess differential gene essentiality and
endogenous retroviruses (PERVs)
identify potential drivers of the malignant state.
The method can be readily applied to more cell Luhan Yang,1,2,3*† Marc Güell,1,2,3† Dong Niu,1,4† Haydy George,1† Emal Lesha,1
lines per cancer type so as to eliminate idio- Dennis Grishin,1 John Aach,1 Ellen Shrock,1 Weihong Xu,6 Jürgen Poci,1
syncrasies particular to a given cell line and to Rebeca Cortazio,1 Robert A. Wilkinson,5 Jay A. Fishman,5 George Church1,2,3*
more cancer types so as to systematically uncover
tumor-specific liabilities that might be exploited The shortage of organs for transplantation is a major barrier to the treatment of organ
P
7. J. E. Carette et al., Science 326, 1231–1235 (2009).
8. J. E. Carette et al., Nature 477, 340–343 (2011). ig genomes contain from a few to several of the PERV pol gene and effect a 1000-fold
9. I. A. Tchasovnikarova et al., Science 348, 1481–1485 (2015). dozen copies of PERV elements (1). Unlike reduction of PERV infectivity of human cells.
10. J. P. Kastenmayer et al., Genome Res. 16, 365–373 (2006).
11. G. S. Cowley, B. A. Weir, W. C. Hahn, Sci. Data 10.1038/
other zoonotic pathogens, PERVs cannot To design Cas9 guide RNAs (gRNAs) that spe-
data.2014.35 (2014). be eliminated by biosecure breeding (2). cifically target PERVs, we analyzed the sequences
12. A. C. Wilson, S. S. Carlson, T. J. White, Annu. Rev. Biochem. 46, Prior strategies for reducing the risk of of publicly available PERVs and other endoge-
573–639 (1977). PERV transmission to humans have included nous retroviruses in pigs (methods). Using drop-
13. Y. Ishihama et al., BMC Genomics 9, 102 (2008).
14. H. Jeong, S. P. Mason, A. L. Barabási, Z. N. Oltvai, Nature 411, small interfering RNAs (RNAi), vaccines (3–5), let digital polymerase chain reaction (PCR), we
41–42 (2001). and PERV elimination using zinc finger nucle- identified a distinct clade of PERV elements (Fig.
15. T. Hart, K. R. Brown, F. Sircoulomb, R. Rottapel, J. Moffat, Mol. ases (6) or TAL effector nucleases (7), but these 1A) and determined that there were 62 copies of
Sys. Biol. 10.15252/msb.20145216 (2014).
16. Z. Gu et al., Nature 421, 63–66 (2003).
have had limited success. Here we report the PERVs in PK15 cells (a porcine kidney epithelial
17. H. Liang, W.-H. Li, Trends Genet. 23, 375–378 (2007). successful use of the CRISPR-Cas9 RNA-guided cell line) (Fig. 1B). We then designed two Cas9
18. B.-Y. Liao, J. Zhang, Trends Genet. 23, 378–381 (2007). nuclease system (8–10) to inactivate all copies gRNAs that targeted the highly conserved cat-
19. T. Makino, K. Hokamp, A. McLysaght, Trends Genet. 25, alytic center (11) of the pol gene on PERVs (Fig.
152–155 (2009).
20. A. Subramanian et al., Proc. Natl. Acad. Sci. U.S.A. 102,
1C and fig. S1). The pol gene product functions as
15545–15550 (2005). 1
Department of Genetics, Harvard Medical School, Boston, a reverse transcriptase (RT) and is thus essential
21. Y. Ahmad, F.-M. Boisvert, E. Lundberg, M. Uhlen, A. I. Lamond, MA, USA. 2Wyss Institute for Biologically Inspired for viral replication and infection. We determined
Mol. Cell. Proteomics 11, 013680 (2012). Engineering, Harvard University, Cambridge, MA, USA. that these gRNAs targeted all PERVs but no other
3
22. J. Barretina et al., Nature 483, 603–607 (2012). eGenesis Biosciences, Boston, MA 02115, USA. 4College
23. A. G. Baltz et al., Mol. Cell 46, 674–690 (2012). of Animal Sciences, Zhejiang University, Hangzhou 310058,
endogenous retrovirus or other sequences in the
24. T. Sekiguchi, H. Iida, J. Fukumura, T. Nishimoto, Exp. Cell Res. China. 5Transplant Infectious Disease and Compromised pig genome (methods).
300, 213–222 (2004). Host Program, Massachusetts General Hospital, Boston, Initial experiments showed inefficient PERV
25. F. L. Muller et al., Nature 488, 337–342 (2012). MA 02115, USA. 6Department of Surgery, Massachusetts editing when Cas9 and the gRNAs were tran-
26. K. Ohneda, M. Yamamoto, Acta Haematol. 108, 237–245 General Hospital, Harvard Medical School, Boston, MA, USA.
(2002). *Corresponding author. E-mail: [email protected].
siently transfected (fig. S2). Thus, we used a
27. L. C. Andersson, K. Nilsson, C. G. Gahmberg, Int. J. Cancer 23, harvard.edu (G.C.); [email protected] (L.Y.) PiggyBac transposon (12) system to deliver a
143–147 (1979). †These authors contributed equally to this work. doxycycline-inducible Cas9 and the two gRNAs
Science (ISSN 1095-9203) is published by the American Association for the Advancement of Science. 1200 New York Avenue NW,
Washington, DC 20005. The title Science is a registered trademark of AAAS.
Copyright © 2015, American Association for the Advancement of Science