A Hitchhiker's Guide To Single-Cell Transcriptomics and Data Analysis Pipelines

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Genomics 113 (2021) 606–619

Contents lists available at ScienceDirect

Genomics
journal homepage: www.elsevier.com/locate/ygeno

Review

A hitchhiker’s guide to single-cell transcriptomics and data


analysis pipelines
Richa Nayak, Yasha Hasija *
Department of Biotechnology, Delhi Technological University, Delhi 110042, India

A R T I C L E I N F O A B S T R A C T

Keywords: Single-cell transcriptomics (SCT) is a tour de force in the era of big omics data that has led to the accumulation of
Single-cell transcriptomics massive cellular transcription data at an astounding resolution of single cells. It provides valuable insights into
Single-cell RNA sequencing cells previously unachieved by bulk cell analysis and is proving crucial in uncovering cellular heterogeneity,
Single-cell data analysis
identifying rare cell populations, distinct cell-lineage trajectories, and mechanisms involved in complex cellular
Computational approach
processes. SCT data is highly complex and necessitates advanced statistical and computational methods for
Machine learning
analysis. This review provides a comprehensive overview of the steps in a typical SCT workflow, starting from
experimental protocol to data analysis, deliberating various pipelines used. We discuss recent trends, challenges,
machine learning methods for data analysis, and future prospects. We conclude by listing the multitude of
scRNA-seq data applications and how it shall revolutionize our understanding of cellular biology and diseases.

1. Introduction standard and abnormal cell states. Increasing evidence suggests that
even in similar cells, the gene expression pattern can be heterogeneous
As an elementary school textbook would exclaim, cells are the [3,4]. Although bulk expression analysis could simultaneously assess
fundamental, structural, and functional unit of all living organisms. gene expression levels and differentiate between abundant known cell
Understanding the biology of cells has been at the center of our pursuit types, it could obscure the identification of rare cell types, subtypes and
to unravel the complexities that make an organism. Cell biology research fail to distinguish cell to cell variability [5]. Thus, the understanding of
has undergone a remarkable transformation in recent years with the stochastic cellular processes necessitated a more precise transcriptome
advent of single-cell multi-omics technology. The genome structure of analysis technique to overcome the averaging phenomenon inherent to
every cell is essentially the same for any given individual organism; bulk analysis. Unabated technological advancements in NGS, molecular
however, the genome’s expression pattern determines the physiological biology, cell biology, and bioinformatics has fostered a new wave of
fate of the cell. The observed diversity of phenotypes is due to the ge­ profiling single cells at genomics, transcriptomics, proteomics, and
notype and the varying expression pattern, abnormalities in which form epigenomics level [6,7].
the basis of various diseases. Mapping of this unique genotype- Single-cell transcriptomics (SCT) involves profiling the complete set
phenotype relationship requires transcriptome profiling, and recent of RNA transcripts of each individual cell for a given population of cells
progress made in high throughput sequencing technologies has enabled [8]. Transcriptome analysis of single-cell was pioneered two decades
the measurement of transcriptomic information at an unprecedented ago, in two separate historical experiments, one by Norman N. Iscove
resolution of single cells [1]. [9], and James Eberwine and group [10,11], that laid the groundwork
Transcriptome profiling has revealed that, for any given cell, the for single-cell transcriptome analysis based on high throughput
transcriptome information reveals the activity of merely a subset of sequencing technologies [12]. scRNA-sequencing is a fast-growing and
genes [2], and each cell type has a unique transcriptomic fingerprint. promising technology for SCT [13] and has rendered microarrays and
Earlier, transcriptome profiling was based on the assumption that all qPCR obsolete.
cells from any given tissue are homogenous, and bulk population The volume and complexity of scRNA-seq data make it a paradigm of
sequencing followed by average expression analysis would provide us big data [1], it has opened doors to a multitude of possibilities in
with sufficient information to understand gene expression in both biomedical research, but we have only tapped a fraction of the potential

* Corresponding author.
E-mail address: [email protected] (Y. Hasija).

https://doi.org/10.1016/j.ygeno.2021.01.007
Received 9 August 2020; Received in revised form 30 December 2020; Accepted 18 January 2021
Available online 22 January 2021
0888-7543/© 2021 Elsevier Inc. This article is made available under the Elsevier license (http://www.elsevier.com/open-access/userlicense/1.0/).
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

of such a large and versatile dataset. scRNA-seq transcriptome profiles end transcripts of the DNA is incapable of examining allele-specific
have paved the way for identification of rare cell types in complex tis­ expression or alternative splice forms. Some methods rely on FACS
sues [14], cell lineage relationships in early development [15], antigen based sorting, such as MARS-seq, that make them reliant on a larger
specificity of immune cells [16], inferring cellular trajectory [17], initial volume [39] and is disadvantageous when the initial volume is
determination of cell fate [18], distinguishing between normal and low, as in fine-needle aspirates. Another drawback of using FACS is the
abnormal cell [19], understanding tumor heterogeneity [20], identi­ requirement of antibodies that target specific proteins for sorting; this
fying regulatory signatures in cancer [21], deciphering immune reper­ poses problems while sorting rare cell subtypes [40]. Thus, each pro­
toire for infectious diseases [22], elucidating the mechanism for drug tocol has its sets of advantages and disadvantages that determine the
resistance, and relapse in cancer treatment [23]. With better analysis “depth” (reads/cell) of a given dataset, and it could ultimately affect the
methods, we are uncovering more applications. statistical and biological insight [41]. scRNA-seq is not a “one-size-fits-
scRNA-seq data, although highly potent, poses many challenges on all” technique like the bulk sequencing approach since the depth can
various fronts owing to its big-data characteristics such as sophisticated vary with the protocol being used, cell types being examined, capture
data acquisition techniques, data storage, management, and analysis method, sequencing technique, and alignment stringency during library
[24]. A single scRNA-seq experiment generates a larger volume of high- construction [42]. We discuss a typical scRNA-seq experimental work­
dimensional raw data than bulk sequencing methods as it retains the flow and complementary technologies in use.
information of the stochastic expression of genes for individual cells. In
addition, scRNA-seq experimental protocols have more steps compared 2.1. scRNA-seq workflow
to bulk sequencing, which gives rise to more technical biases and
artifacts. 2.1.1. Single-cell isolation
Experimental techniques for scRNA-seq have mushroomed and The generation of scRNA-seq data from a tissue sample involves
improved over time, which has led to the generation of a massive multiple steps. First, the tissue is digested to ensure dissociation, which
amount of data and an increasing demand for computational techniques gives rise to the single-cell suspension from which single-cells are iso­
for data analysis. That has led to a spike in developing new experimental lated so that each cell’s mRNA can be profiled separately. Single-cell
protocols, algorithms, and tools to analyze the raw data. Several isolation technique predominantly used for scRNA-seq are plate-based
research groups and commercial companies have designed software techniques and microfluidic-based techniques. Plate-based techniques
tools and packages for data preprocessing and downstream analysis. involve capturing or sorting cells on multi-well plates or microfuge
Machine Learning (ML) approaches, preeminent in big data analysis, tubes, followed by FACS based sorting. Some full-length scRNA-seq
have been a noteworthy addition to the list of approaches used for the techniques like SUPeR-seq, SMART-seq2, MATQ-seq, Cell-seq rely on
underlying computational challenges of dimensionality reduction, plate-based techniques. However, there are many limitations, one being
clustering, and differential expression (DE) analysis [25,26]. Further­ fewer cells per assay than droplet-based technologies. Microfluidic
more, there are choices between various programming languages like R, technology involves capturing cells in its microfluidic droplet.
Java, MATLAB, C++, and Python. The development of analysis tools is Microfluidics-based techniques have swiftly gained popularity amongst
still in its infancy, and current tools have many shortcomings. The single-cell isolation techniques as it requires less initial volume, is cost-
challenge to improve the reciprocation between speed and accuracy in effective, and aids in massively parallel quantification of single-cell gene
analysis remains. Despite the abundance of techniques, it is hard to expression profiles [43,44]. Microfluidics techniques can be of two
establish a standard that can be used across disciplines, and it is crucial types, i.e. (i) continuous-flow microfluidics, like the Fluidigm’s C1
to make an informed decision while proceeding for analysis as it can Single-Cell Auto Preparation System, (ii) droplet-based microfluidics
have a tremendous impact on the findings. The scRNA-seq analysis tool like InDrop, Drop-seq, and 10X Genomics. Comparison between single-
choice can influence detecting a biological signal comparable to cell isolation techniques is well depicted by Chen X. et al. 2018 [45] and
quadrupling the sample size [27]. Ziegenhain C. et al. 2017 [46]. Droplet-based platforms are readily
This review outlines the general workflow involved in single-cell automated and easily optimizable to suit individual experimental needs
RNA-seq protocols and discusses the popular and promising new depending upon the number of cells to be captured and sequenced.
computational tools for analysis. It provides a comprehensive account of Recent studies have an added advantage of not requiring zero inflation
each step of the analysis, starting from data preprocessing, imputation of over plate-based techniques that need zero inflation for accurate simu­
dropouts to tools used for pseudotime ordering, and rare cell type lation [27].
identification. It also discusses ML approaches in the analysis steps, Another technique for single-cell isolation from solid tissue is Laser
wherever applicable. The review concludes with a discussion of appli­ Capture Microdissection, LCM-seq, a laser system-aided isolation of cells
cations across fields in biological sciences, remaining challenges, and directly from solid tissue, coupled with in-situ RNA-sequencing tech­
prospects. niques [47,48], which conserves spatial information of mRNA expres­
sion within the morphology of a tissue. This method enables isolation of
2. Single-cell RNA sequencing technology rare cell types even in highly heterogeneous clinical samples, with
increased accuracy and understanding of dynamic cellular systems.
A wide range of scRNA-seq protocols has been developed to In the lookout to overcome these techniques’ inefficiencies, nanowell
accommodate the high demand for improved techniques with high based single-cell isolation like Seq-Well promises cost-effectiveness,
throughput. New methods are being developed to counter batch effects throughput, and portability and requires only nanoliter sized initial
and technical noise since it is important to regulate the initial steps to volume [49]. Some newer techniques eliminate the single-cell isolation
alleviate the computational burden during data analysis. step, like in SPLiT-seq (split-pool ligation-based transcriptome
scRNA-seq technologies currently in use can be divided into four sequencing) and sci-RNA-seq (Single-cell Combinatorial Indexing RNA
broad classes based on transcript coverage approach [28]: (i) full-length sequencing [50]). SPLiT-seq allows for simplified and low-cost tran­
transcript sequencing [example- MATQ-seq [29], SMART-seq2 [30], scriptome profiling compatible with fixed cells or nuclei and offers high-
ICELL8 [31] SUPeR-seq [32]], (ii) 5′ -end transcript sequencing resolution [51].
[example- STRT-seq [33,34]], (iii)3′ -end transcript sequencing
[example- Chromium [35] 10X Genomics, Fluidigm C1 [36], Drop-seq 2.2. scRNA-seq library preparation
[37], inDrop [38]]. With full-length transcript sequencing approach,
there is an issue of resolution, speed, and sequencing cost. On the other Much like any RNA library preparation, it roughly entails reverse
hand, a major drawback of cDNA sequencing prioritizing either 5′ or 3′ - transcription of captured mRNA into first-strand cDNA synthesis,

607
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

Fig. 1. Preprocessing (a) The first step after obtain­


ing the scRNA-seq data is demultiplexing FASTQ
batch data. (b) The demultiplexed data (modified
FASTQ file) is mapped against respective genome
using an alignment program. (c) Feature annotation
is carried out using gene annotation file that contains
all information on genes, exons, introns and regions
of interest (RefSeq, GENCODE). All the reads are
filtered, keeping reads that align to the forward and
reverse strand with less than three mismatches and
mapped only once to the reference genome. (d) Once
the gene names for specific reads are obtained, how
many reads correspond to which genes can be
determined. Tabular count matrix depicting genes/
features as rows and cell labels as columns is gener­
ated. (e) Multiple single count matrices are joined
together to form a combined matrix. (f) Not all
batches use same barcodes thus barcodes are filtered
for the combined matrix to ensure there is no cross-
contamination and a final QC-filtered count matrix
is generated. (g) Feature selection and dimensionality
reduction is carried out on the matrix carried out on
the count matrix followed by downstream analysis.

second-strand synthesis, and cDNA amplification followed by many technical artifacts that may arise due to cell bursting leading to
sequencing [52]. But it is more challenging as the amount of RNA per RNA leakage, multiple cells sticking together, and lowly expressed RNA
cell is low compared to bulk RNA-seq experiments. A thorough analysis leading to dropouts, amplification bias, transcriptional bursting, RNA
of scRNA-seq required the profiling of a large number of represented degradation, and batch effect. Before performing downstream analysis,
individual cells, which is a task worthy of Sisyphus and significantly it is crucial to ensure that all the cellular barcode data obtained from
adds to the cost of carrying out sequencing. The use of Unique Molecular scRNA-seq correspond to viable cells [55]. Another challenge is to pre­
Identifiers (UMI) or cellular barcodes somewhat simplified the process. vent false interpretation of technical artifacts (cells that show technical
Cell barcodes are primarily designed to be able to distinguish between noise that appears as distinguishable gene expression pattern) as bio­
read transcripts originating from different cells. To fully determine the logical heterogeneity [56].
uniqueness of the reads, UMIs (short molecular tags composed of a A typical scRNA-seq dataset constitutes of three files, genes quanti­
unique random sequence) are added to the reverse transcription step fied (gene IDs), cells quantified (cellular barcode data), and a count
(5’end in template switching or 3′ end in oligo-dT primer) [53]. They matrix, irrespective of technology or pipeline used. These files are
constitute the second portion of a barcode and primarily detect and crucial for building quality matrices for QC assessment. The barcodes
quantify unique mRNA transcripts [46] such that amplicons of the same are extracted and annotated, called demultiplexing or barcode extrac­
transcript are only counted once. This allows for multiplexing of scRNA- tion, followed by mapping and alignment of the read data using read
seq, even for low abundant transcripts that show poor reproducibility processing pipelines. Post alignment, feature annotation, and quantifi­
with previous quantification methods based on the number of cation are carried out on the data to generate gene expression matrices
sequencing reads [54]. Still, current UMI based approaches are poorly (N(cells) x M(genes)) indicative of the level of gene expression in each
suited for the identification of allele-specific expression or alternative cell, based on the molecular counts or read counts Fig. 1. These corre­
splice variants [2]. Post library construction, cDNA libraries labeled spond to high mapping quality exonic loci [55]. A list of preprocessing
with cellular barcodes are pooled for sequencing, mapping, and align­ and downstream analysis tools is shown in Table 1.
ment. Current RNA-seq technologies rely on pooled-sequencing in order
to generate high throughput data. This allows for amplification and 3.1. Quality control
sequencing of multiple cells parallelly in the same pool that generates
batch-specific output data containing sequences from multiple cells. QC filtering can be performed using a combination of strategies.
Some common strategies are based on assigned barcodes- number of
3. Preprocessing scRNA-seq data counts per barcode and number of genes per barcode [57]. Cells that
show unique gene counts and genes expressed in very few cells are not
Sequencing generates reads (raw data) that need to undergo quality always indicative of biological heterogeneity, low or high count-depth
control (QC) before downstream analysis. scRNA-seq data contains are indicative of quiescent/damaged cells or doublets/multiplets,

608
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

Table 1 Table 1 (continued )


List of some popular preprocessing pipelines and downstream analysis tools used Analysis category Pipeline Environment Description
for scRNA-seq data.
https://github.com/lan
Analysis category Pipeline Environment Description agarmire/DeepImpute
Overall analysis Seurat [78] R A comprehensive tool to Batch effect, BERMUDA R Deep Autoencoders. Data
perform QC analysis on merging [85] is obtained from scRNA-
diverse types of single-cell seq data, and one common
data. Spatial interference cell type is required at the
using in situ RNA patterns least from batches for the
as a reference. Compatible process
with multimodal data. https://github.
http://satijalab.org/s com/txWang/BERMUDA
eurat/ MNN [86] R Mutual Nearest Neighbor
Scanpy [79] Python A scalable analytical for batch correction of
framework for scRNA-seq single cells. Based on the
analysis starting from established assumption
preprocessing, that a batch is orthogonal
visualization, DE, to biology and that MNN
clustering, and TI works in exists between batches.
tandem with AnnData- https://github.com/
annotated data matrices. MarioniLab/MNN2017/
Fast and efficient for large Normalization SCnorm [63] R Makes use of quantile
data sets. regression to approximate
https://github.com/theisl the dependency of
ab/scanpy expression of a transcript
PyMINEr Python An automated tool for cell- on the depth of
[80] type identification, sequencing per gene.
filtering enriched genes, Similar dependency genes
network pathway are clustered together, and
analysis, and visualization then a second quantile
of analysis. Non-cell type regression is utilized to
determining gene approximate scale
expression may influence parameters in every
cellular graphs. group. For depth
https://www.sciencesco sequencing, then in-group
tt.com/pyminer correction is achieved by
Pre-processing dropEst [81] R Accurate estimation, using the approximate
quality control, correcting scale parameters to
composition bias, and deliver normalized
sequencing error for estimations of expression.
droplet-based scRNA-seq https://www.biostat.wisc.
data. Provides edu/~kendzio
configuration options for r/SCNORM/
accommodating different Dimensionality DR-A [87] Python Dimensionality Reduction
scRNA-seq protocols. Reduction with Adversarial
More efficient with variational autoencoder
smaller datasets than SAUCIE [88] Python Deep Multitasking Neural
larger datasets. Networks for DR
https://github.com/hm Cell clustering Refer to Table 3
s-dbmi/dropEst Cluster Annotation scMAP [89] R Automated cluster
Visualization Cerebro [82] R An interactive annotation technique
environment, compatible follows a projection-based
with Seurat objects. approach where scRNA-
https://github.com/roma seq data is projected onto
nhaa/Cerebro a previously annotated
iSEE [83] Interactive visualization, cell type or dataset.
reproducible, and Pseudotime TinGa [90] R TI based on Growing
compatible with existing trajectory Neural Graphs. Scalable,
R/Bioconductor packages. inference/ time-efficient, and
https://github.com/cson reconstruction accurate on complex
eson/iSEE trajectories. Does not
Imputation bayNorm R An integrated platform for require prior specification
[73] normalization, of the topology by the
imputation, and batch user.
effect correction. https://github.com/
Improves accuracy and Helena-todd/TinGa
sensitivity of DE analysis. ReCAT [91] R Hidden Markov model-
https://bioconductor.org/ based method for
packages/release/bioc/h reconstructing cell cycle
tml/bayNorm.html pseudotime in time-series
DeepImpute Employs a deep neural data
[84] network-based algorithm https://github.com/ting
that allows for improved lab/reCAT
speed, accuracy, and Differential MAST R Uses a linear hurdle model
scalability. It is also well Expression to account for
suited for large ever- confounders, and DE is
increasing datasets. determined using the
likelihood ratio test.
(continued on next page)

609
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

Table 1 (continued ) transcript count within each cell to the median transcript number across
Analysis category Pipeline Environment Description cells [61]. Clustering based on Transcript Compatibility Counts (TCC)
uses equivalence classes in place of genes as parameters and normalizes
https://github.com/
RGLab/MAST\
each parameter by distributing the total count across all the cells.
SCDE R Uses a Bayesian approach Sctransform pipeline interfaces with Seurat and Pearson residuals from
that incorporates an negative binomial regression, which has been regularized. In regression,
evidence-based approach sequencing depth is used as a covariate to eliminate technical artifacts
to evaluate the likelihood
[62]. SCnorm uses quantile regression to approximate the dependency
of the average level of
gene expression for of expression of transcript or depth of sequencing per gene [63]. Similar
individual cells and dependency genes are clustered together, and then a second quantile
measure the fold changes. regression is utilized to approximate scale parameters in every group. In-
Highly sensitive. group correction is achieved by using the approximate scale parameters
https://hms-dbmi.github.
io/scde/index.html
to deliver normalized estimations of expression.
The diversity of scRNA-seq protocols makes it difficult to standardize
Note:https://www.scrna-tools.org/ is a catalog of tools for analyzing single-cell any one normalization method. It has been observed that different
RNA sequencing data.
normalization methods perform optimally for different datasets, and the
same goes for cell-level and gene-level normalization. Post normaliza­
respectively. Another covariate used for QC is the fraction of mito­ tion, log(x + 1) transformed count matrices are obtained that give a
chondrial genes per barcode. Elevated levels of mitochondrial genes simplified account of expression levels in terms of log-fold changes and
(above 5–10%) in a cell is an indication that the cell may have broken bring down the skewness of the data [55]. Downstream analyses that are
and the cytoplasmic mRNA content has leaked. Furthermore, RNA spike- based on the assumption that scRNA-seq data is normally distributed
ins (synthetically generated short RNA polymers of known quantity) are and perform analysis on log-transformed data may sometimes result in
used for calibration purposes, where a low mapping ratio between counterfeiting DE effects. Thus, there is a pressing need to develop more
endogenous RNA and spike-ins is indicative of a low-quality library precise and robust normalization methods designed explicitly for
[58]. scRNA-seq data.
The quality metrices are visualized to determine the outlier cells.
Low-quality cells are filtered out by setting appropriate thresholds. 3.3. Data correction
While filtering out outlier cells, multiple independent variables must be
considered together rather than individual ones as it can lead to Normalized data successfully removes amplification and count depth
misinterpretation of biological heterogeneity. Thresholds can be fixed or biases; however, a few challenging technical and biological biases
adaptive. Setting fixed thresholds requires experience as suitable remain. Data correction deals with batch effects, dropouts, and biolog­
thresholds may vary for each experimental protocol or biological sys­ ical effects. scRNA-seq data is prone to zero-inflated values, otherwise
tems [59]. An alternative is adaptive thresholds that are decided based known as dropouts [64], resulting from low sensitivity of scRNA-seq
on the outlier peaks for the QC covariates. It is essential to reevaluate the protocols, inefficient capture of mRNA, low amounts of mRNA in cells,
QC metrices after filtering before proceeding for further analysis. or transient gene expression [2]. The dependency of downstream anal­
ysis on the accuracy of gene expression profiles makes the imputation of
3.2. Normalization dropouts a crucial step. Many analysis pipelines account for dropouts
during analysis; however, recent findings can change how we look at
UMI-based protocols inherently reduce amplification biases, and the dropouts. Imputation is carried out either by direct expression analysis
addition of spike-ins enables assessment of sensitivity. However, cell- or model-based. Newer and more robust ML-based algorithms have
level (counts comparable between cells) and gene-level (counts com­ taken over popular imputation techniques like Markov Transition
parable between genes) normalization is carried out to cater to sampling Matrix-based MAGIC [65], Clustering-based DrImpute [66], LASSO
effects or technical biases that remain in the data due to variability in the regression-based ScImpute [64]. SAVER(Single-cell Analysis Via
protocols. Once the count matrix is obtained, normalization sought to Expression Recovery) uses gene-gene relationships to recover true
address the gene expression variability between cells in count data to expression levels of each cell [67], RESCUE (REcovery of Single-Cell
prevent the highly expressed genes from influencing the analysis. Pop­ Under-detected Expression) enhances cell-type identification based on
ular normalization methods have been derived from bulk RNA-seq an ensemble-based method to minimize feature selection bias and count
analysis methods and have been successfully applied to scRNA-seq error and perform imputation by comparing gene expression levels of
data such as DESeq and Trimmed Mean of M-values [60]. Popularly, other cells with similar patterns [68]. SCRABBLE, a matrix regulariza­
raw scRNA-seq read library normalization is carried out using read tion framework, uses bulk RNA seq data as a constraint that improves
count normalization/ CPM (Counts Per Million) methods like RPKM the accuracy and estimation of gene expression distribution across cells
(Read Per Kilobase Million), TPM (Transcripts Per Million), and FPKM compared to scRNA-seq analysis in isolation [69]. For datasets that
(Fragments Per Kilobase Million). The scaling factors in these methods suffer from imbalance and limited sample sizes, scHinter, with a hier­
are based on the assumption that the majority of the genes are not archical framework for random interpolation by leveraging minority
differentially expressed, so they might fail when fold-change of DE genes oversampling technique [70], proves to be a robust technique than its
is high across cell populations under study [60]. These library-based or predecessors. A recent study explores the opposite view of dropouts,
global-scaling normalization methods derived from bulk RNA analysis, where instead of imputation, it can be used as a signal. It was observed
other than being computationally intensive, have their shortcomings that binary dropout patterns prove almost equally informative as
when used for scRNA-seq data. Given the added complexity of scRNA- quantitative expression patterns of highly variable genes in cell type
seq data due to data sparsity and high heterogeneity, it requires identification [71].
advanced normalization strategies to address specific biases. Furthermore, studies have been conducted to establish that the
SINCERA is a commonly used normalization pipeline, in which gene excess of dropouts is consistent with stochastic sampling of molecular
normalization is performed by z-score, while cell normalization is per­ counts, and any additional zero values may result from biological vari­
formed by trimmed mean. For example, during clustering, BISCUIT uses ation [72]. Such studies suggested that a negative binomial distribution
iterative normalization by learning features representing technical model for UMI based scRNA-seq count data would suffice, and zero-
modifications. RaceID (Rare cell type identification) normalizes the total inflation may not always be necessary [46,72,73]. It can also be

610
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

inferred that the number of dropouts can be decreased by increasing the Table 2
depth of sequencing or increasing global count with more efficient Classes of clustering algorithms used in scRNA-seq following ways [92].
capturing methods. Class of Principle Limitations Pipeline
Apart from dropouts, several other technical covariates like batch clustering
effects and biological covariates need to be considered. Removal of such algorithm
biases must be carried out simultaneously as there might be a de­ Distance- Unsupervised Sensitive to outliers, SCUBA [102],
pendency between multiple covariates under consideration. Batch ef­ matrix learning algorithms biased towards data PCAKmeans,
fects arise from data handling in different experiment batches or time like k- means falls shape/cluster shape, pcaReduce, SAIC
in this category. and the number of [103], scVCMD
points and are highly nonlinear variations. Batch effects can have a This algorithm first clusters must be [104]
significant impact on DE analysis. Some methods include aggregation- identifies k specified
based methods [74] that pool cells from batches to form a pseudo- centroids or means beforehand.
bulk sample and use bulk analysis approaches or nested fixed [75] iteratively, and
data points are
and mixed effect models [76] that treat batch effects as fixed effects
assigned to the
nested within each group or random-effects shared between cells from cluster around the
each batch, respectively. A comparison between different batch nearest centroids.
correction methods is given in Chen et al., 2020 [77], and some popular During cluster
batch correction methods are listed in table 1. Upon removal of batch allocation of
datapoints, the in-
effects, data is merged for further unbiased analysis. Biological cova­ cluster sum of
riates may arise due to important biological processes like cell cycle squares is reduced,
effects that affect cell-size and mRNA counts. Correcting for such vari­ and the position of
ation helps reveal important biological signals and processes. Linear centroids is
iteratively
regression against a cell cycle score, and correction for cell size during
optimized.
normalization are some ways to remove the effects of the cell cycle [55]. Scalable and time-
However, data correction for biological effects may not always be in the efficient.
best interest, and correction for one effect may mask another. Thus, it is Hierarchical Generates clusters Time intensive BackSPIN,
advised first to evaluate the study’s objective and context before clustering into a hierarchical cellTree [105],
structure and is DendroSplit [106]
deciding on data correction measures.
popular in gene CIDR [93]
Despite the numerous QC measures, it is hard to determine each expression analysis.
step’s stringency before assessing its effect on downstream analysis. It overcomes the
Thus, a feedback system should be followed to regulate QC stringency limitation of k-
means of specifying
alongside downstream analysis.
the number of
clusters a priori and
3.4. Dimensionality Reduction (DR) handling different
shapes of clusters.
scRNA-seq data is computationally intensive, noisy, and suffers from No assumptions are
made about the
the curse of dimensionality. scRNA-seq expression metrics are of high distribution of data
dimension, but not all genes are required for meaningful classification of points, and each
cellular expression profiles, and it can practically be explained in fewer cluster links to
dimensions, focusing only on relevant biological signals. DR enables another by
branches and is
better data visualization and resolves the statistical issue of data spar­
nested like a
sity. An effective low-dimensional representation should summarize the hierarchical tree in
data in a few optimal dimensions that must retain the underlying the form of a
structure in the data to describe the variability of the dataset. Some of dendrogram. This
the popular techniques for DR are Principal Component Analysis (PCA), representation
facilitates
t-distributed Stochastic Neighbor Embedding(t-SNE), Uniform Manifold meaningful data
Approximation, and Projection (UMAP), Self Organizing Maps (SOM), interpretation.
and Model embedded dimension reduction [92]. Some clustering pipe­ Graph-based Supervised- Reliance on heuristic Seurat [78]
lines come integrated for both dropout imputation and DR like CIDR learning solutions sometimes scanpy [79]
algorithms. leads to spurious SNN-Cliq [108]
[93].DR has two components: feature selection, where one selects a
Projects a graph results, and iteration
smaller subset from the original set of variables, and the other part is representation of sometimes masks
feature extraction, where the high dimension data is projected to a lower data in which the small communities.
dimension. Feature selection is carried out based on the expression nodes correspond
variability of genes according to the assumption that genes showing high to datapoints/cells,
and edges
variability correspond to biological variation. Per-gene variation can be correspond to
quantified by calculating the variance of log-normalized value. A subset pairwise similarity
of highly variable genes (HVG), depending on the type of dataset, is between the
selected for further analysis. Selecting a larger subset of HVGs may in­ datapoints, for
example, K-
crease the noise but reduces the risk of discarding biologically relevant
Nearest Neighbor.
signals. Seurat performs feature selection by modeling the mean- Clusters are based
variance relationship [94]. on neighboring cell
After feature selection, DR is carried out using linear or nonlinear pairs. Graph-based
techniques. PCA is a linear projection method that linearly transforms clustering
techniques have
the original dataset into PCs ranked in decreasing order of variance, and various subtypes
the data’s variance is maximized in the lower dimensional space. It is like Louvain
computationally efficient and removes redundant features, but since (continued on next page)
scRNA-seq has a highly nonlinear structure, PCA alone is not best suited

611
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

Table 2 (continued ) Table 2 (continued )


Class of Principle Limitations Pipeline Class of Principle Limitations Pipeline
clustering clustering
algorithm algorithm

clustering that is algorithms on the result by using a


used by popular same dataset, and a hypergraph
scRNA-seq analysis consensus result is partitioning
pipeline. Louvain obtained that is algorithm [113]
algorithm is largely more precise than
getting taken over independent
by the Leiden algorithms.
algorithm for
cluster detection,
which uses a smart for data visualization. However, t-SNE, a graph-based nonlinear tech­
local move nique, is much better suited for the task. It is often performed in tandem
approach faster and
shows more
with PCA. t-SNE is based on a probabilistic distance model. It creates
proficiency in probability distributions to establish a relationship between two data
detecting well- points in high dimensional space and reconstructs it in a lower-
connected dimensional space by optimizing using gradient descent. Although t-
communities
SNE is favorable for data visualization, good with nonlinear datasets,
[107].
Mixture Clustering is based Computationally BISCUIT [109], and preserves local structures in lower dimensions, it often ignores the
Models on the probability intensive and relies Seurat, TSCAN global structure, which may lead to misinterpretation of differences
distribution of the on the accuracy of [110] between cell populations. Owing to these drawbacks, a superior mani­
datapoints. It is assumptions of fold learning-based DR technique has emerged called Uniform Manifold
well suited for the probability
identification of distribution.
Approximation and Projection (UMAP) [95]. It is principally similar to t-
subpopulations and SNE but also preserves the global structure. It is also computationally
integrates prior more efficient but may sometimes result in spurious signals in smaller
knowledge as datasets. For complex dataset visualization, partition-based graph
assumptions of
abstraction (PAGA) is used with UMAP. ZIFA [96] and ZINB-WaVE [97]
probability
distributions. are model-embedded dimension reduction algorithm carried out on
Density- Density-based Sensitive to Monocle2 [111] zero-inflated data. ML methods, like Deep Learning and Autoencoders,
based algorithms assign parameters, time- (for the have shown efficiency in DR problems. Latent Dirichlet Allocation
clusters based on intensive identification of (LDA), a Natural Language Processing-based algorithm, and SAUCIE
high-density outlier cells)
regions of
[88], a neural network-based algorithm, are promising new approaches.
datapoints. It is a DR-A, an autoencoder based framework, provides precise low dimen­
highly efficient sional representation, enhances downstream clustering performance,
clustering and could potentially be used for lineage estimation [87].
technique but is
sensitive to
parameters. 4. Downstream analysis
Neural Supervised- Sensitive to SAUCIE [88]
Network learning methods, parameters scDeepCluster The unique features of single-cell data make downstream analysis
inspired by the [112]
elaborate and diversified. There are different stages of preprocessed data
neural network of
the human nervous such as log-transformed data, batch corrected data, feature selected
system. Highly data, dimensionality reduced data, etc. Depending on the requirement
efficient in specifications of downstream analysis and availability, certain pre­
performing processed data is chosen. Some open-source scRNA-seq repositories or
clustering and
reference databases are Human Cell Atlas [98], Broad Institute’s Single
classification tasks.
Kohonen networks Cell Portal [99], EMBL-EBI’s Single Cell Expression Atlas [100], Pan­
are bilayer glaoDB [101], and OmicX Jingle Bells. 10× Genomics offers datasets at
networks that use various preprocessed levels for downstream analysis.
competitive
Downstream analysis of scRNA-seq data can be of cellular level or
learning for
clustering. Deep gene level. This section will discuss various analysis tools and their
learning and applications that have led to prominent discoveries.
autoencoders are
also used. These are
efficient, scalable, 4.1. Cell- level analysis
and information on
relationships The cell-level analysis is about understanding cell subtypes, cell
amongst clusters differentiation patterns, cell lineage, cellular trajectories, identify novel
can be
incorporated.
cellular markers, and many unique cellular features. It helps charac­
Ensemble Ensemble Disadvantages of SAFE-clustering terize known properties of cells and uncover previously unidentified
clustering clustering individual uses SC3, CIDR, characteristics. Cell-level analysis is not exclusive to gene-level analysis.
algorithms are a algorithms add up Seurat, and t-SNE We briefly discuss important cell-level analysis techniques that have
solution to the lack + k-means for
found major implications in advancing cell biology research.
of any one optimal clustering and
algorithm as it then combines the
makes use of result to obtain 4.1.1. Cluster analysis
multiple clustering one consensus Clustering entails categorizing cells into clusters to enable the
identification of cell types and subtypes. It is ideally performed on a

612
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

dimensionally reduced dataset, and clusters are made based on cell- approach, several parameters are needed for reliable evaluation. Sta­
specific molecular profiles. Results of a clustering analysis can itself be tistical and experimental validation is often needed. Transient biological
of much significance or can serve as a covariate in other downstream states make it difficult to identify cell states. However, the generation of
analyses. There has been an attempt to develop robust cluster analysis more comprehensive and extensive cell atlases will facilitate better
algorithms (as shown in Table 2) that can address the vast heterogeneity clustering, cluster annotation results, and better computational ap­
of cells, but there is still a lack of an optimal algorithm that fares well proaches to help overcome technical challenges.
across datasets.
After clustering, the clusters need to be annotated in order to give 4.1.2. Trajectory inference
them biological relevance. Cluster annotation can be achieved either by Trajectory Inference (TI), also known as pseudo-temporal ordering,
thorough examination of literature, reference cell databases, or by is a process of characterization of underlying dynamic cellular processes.
identifying gene signatures or differentially expressed marker genes. Although clustering successfully builds discrete clusters of cell types and
CellAssign [114] and scMAP [89] rely on the former technique for subtypes, it does not account for the variability due to dynamic cellular
cluster annotation, but since most marker genes have been identified by processes like transient cell states in cell differentiation, cell cycle, or
bulk analysis in the reference datasets, it is limited to giving a classical environmental effect. TI deals with this blind spot by ordering cells
view of the cell types. Also, it is not necessary that cell types in reference along a continuous path that minimizes transcriptional changes between
databases will correspond to all the cell types present in the dataset successive cell pairs, called pseudotime (one dimensional manifold),
under investigation. Analysis pipelines like Seurat, SC3, scVDMC, make that represents the progression of the cell through its dynamic processes
use of differential gene expression approach. Full gene expression pro­ measured in terms of transcriptional changes that a cell undergoes
files are used in DE analysis for marker gene identification and cluster during a biological process. Some datasets have an expected temporal
annotation, that is performed using simple statistical tests. The quanti­ component such as cells from developing embryos, immune cells during
tative levels in gene expression are measured amongst clusters and all an immune response, tumor cells, progenitor stem cells, etc. Under­
the cells in the dataset. Based on statistical tests like Wilcoxon rank sum standing of cell differentiation via bulk RNA analysis gave us the
test (used in Seurat), Welch’s t-test, Kruskal-Wallis test, etc. marker gene impression that cell differentiation occurs in discrete stages, but in re­
sets are identified i.e., the top-ranked genes from these tests. DE analysis ality, it is a continuous process that may appear chaotic but can be or­
could either be carried out in succession to clustering like in Seurat, dered along continuous trajectories. Following the cells along a pseudo-
simultaneously like in scVDMC and DendroSplit, or by using DE soft­ temporal trajectory and analyzing gene expression changes yields
ware like MAST [115], SCDE [116], and ZingeR. Gene set enrichment valuable insights into the cellular regulatory processes, dynamic states,
analysis is carried out against reference gene sets (set of genes grouped and abnormal cell states. The progression of cells in a given cellular
as they share common chromosome location, biological function, or process is rarely synchronized. Capturing this asynchrony poses a
regulation) using statistical parameters like Jaccard index, and clusters unique challenge of deciphering the sequence of regulatory events. TI
are annotated accordingly. The important thing to note here is the p- takes into account a snapshot view of these events and uses computa­
value that is based on the assumption that the marker gene identified tional techniques to infer the order of the cells along their develop­
represents the biological phenomenon, but the p-value is often inflated mental trajectories. The derived trajectory topologies can be linear,
and leads to an overestimation of marker genes. Most existing GSEA bifurcating, multifurcating, complex tree structures, or graph structures
methods have been developed for bulk RNA seq analysis and perform [121]. TI is mostly used for cell-lineage construction. During cellular
poorly in case of scRNA-seq, thus Ma, Y., Sun, S., Shang, X. et al. came up developmental stages, cells express unique cellular markers and various
with an integrative DE-GSE analysis technique called iDEA [117] that lineage marks that can serve helpful for tracing lineage along pseudo­
makes use of DE summary statistic and thus easy to use with current DE time trajectories, such as somatic mutations, single nucleotide poly­
methods and efficiently produces well-assessed p-values for enriched morphisms, copy number variants, microsatellites, transposons, and
gene set detection. Nowadays, automated cluster annotation techniques retroviral sequences.
are becoming increasingly available like scMAP [89], which follows a Two broad strategies used for TI are DR-based methods (Monocle
projection-based approach where scRNA-seq data is projected onto a [122], Wishbone [123]), which use the reduced latent space as the first
previously annotated cell type or dataset. Garnett uses a supervised phase in inference and assigns pseudotime to individual cells, or
classification approach for rapid annotation [118], scCATCH [119] clustering-based methods (TSCAN [110], SCUBA [102], ÉCLAIR [124])
makes use of CellMatch reference database for annotation followed by which builds a network connecting clusters and applies pseudo-
evidence-based scoring for increased performance. Another important temporal ordering of clusters [125]. Monocle is a pioneering pseudo-
goal is to identify rare cell types that may appear as outliers in clustering temporal ordering algorithm to demonstrate how pseudotime analysis
results that only consider global differences in gene expression. RaceID can reveal important cellular regulatory interactions [122]. It uses an
[61] and GiniClust [120] are clustering algorithms sensitive to identi­ unsupervised learning approach that constructs minimum spanning
fying rare cell types. RaceID is based on the assumption that a given cell- trees (MST) for ordering cells along pseudotime. Several other algo­
type must express some genes that are specific to the cell type and rithms like Wishbone, TSCAN, versions of Monocle (currently Monocle
appear as outliers but if the focus is shifted from global to such cells and 3) have been developed that are more robust and accurate, and a
the technical and the biological noise are accounted for by setting detailed comparative account can be found in Saelens et al. 2018 [121].
appropriate thresholds, it will enable the identification of rare cell types. While most TI algorithms are unsupervised learning-based models,
To an extent, the determination of cell-types is dependent on user- Ouija is a supervised learning-based algorithm that was developed
defined criteria, since for different researchers, the level of cluster res­ keeping in mind that several confounding factors that affect biological
olution may vary, and in some cases, sub clustering of clusters may also processes like cell-cycle and apoptosis sometimes need to be accounted
be required. The choice of the resolution also has a significant effect on for to get biologically plausible pseudotime trajectories [126]. It uses
the results. DendroSplit, a clustering framework, allows the user to switch-like marker genes and can be used as a complementary method
cluster using feature selection that enables identifying multiple levels of with existing methods owing to its consideration of gene-specific be­
biologically meaningful cell populations in a dataset, also suitable for haviors as opposed to unsupervised methods. Despite the availability of
detecting rare cells. [106]. Despite continual attempts on developing more than 70 TI methods, researchers find it challenging to determine
new algorithms, clustering and cluster annotations suffer from various which is best suited for their analysis. Selection depends on the task like
challenges in both biological and computational front. It is advisable to types of biological processes being studied, whether it is a cell differ­
use a cocktail of automated and manual cluster annotation measures to entiation process (Wishbone), lineage trees (MerLOT [127]), cell cycle
get precise results. But since clustering is an unsupervised learning (Cyclone [128], reCAT [91]), or downstream analysis, like inferring

613
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

GRN (SCODE [129]), DE. Each trajectory method has its pros and cons, Table 3
and there is a lack of standardization. Dynverse is a collection of R Current and future applications of scRNA-seq analysis in major fields of
packages specifically designed to address this issue so that researchers biomedical sciences.a
can perform TI, quantify it, or compare it to other available methods to Field of study Scope of analysis Applications
decide on the best approach for their dataset [121]. Recent advance­ Immunology (i) Clustering of regional Identification of novel
ments have led to the use of time-series data in place of snapshot data. immune cells immune cell subtypes
Tempora is an upcoming algorithm that may prove to be more biologi­ (ii) Trajectory analysis of [137], revealing immune
cally relevant than previous methods as it uses biological pathway in­ individual immune microenvironment across
lineages tissues [138],
formation and identifies time-dependent pathways for ordering and
understanding regional
inferring time-series data [130]. immunity in tumors [139].
Undoubtedly, TI is becoming a popular tool for studying biological Build developmental
processes like cellular differentiation, immune response, tumor pro­ trajectory of immune cells,
gression, and resistance, but inferring trajectories alone cannot be reli­ and gain mechanistic
insights [140].
able as it needs to be validated using supporting biological evidence. Immunology studies with
With better algorithms and more time-series data availability, TI can be the aid of SCT will help us
used to predict pre-disease state of cells, which may help in the early develop better and targeted
detection of diseases. immunotherapies.
Cancer biology (i) Clustering of tumor Researchers and clinicians
microenvironment have struggled with
4.2. Gene-level analysis (ii) DE analysis of tumors understanding tumor
(iii) Construction of gene heterogeneity for a long
Gene-level analysis is an integral part of cell-level analysis for regulatory maps time. scRNA-seq has
revealed intra-tumor, inter-
studying cellular structures and identities, but independently gene-level
tumor heterogeneity, and
analysis of single cells reveals a more comprehensive inference of rare tumor subpopulations
cellular pathways and regulatory networks. It involves DE analysis, [141]. It will help
pathway analysis, gene regulatory networks (GRN), and gene set anal­ understand cell-cell
ysis. We shall be discussing some of the important gene-level analysis interactions in the tumor
ecosystem, tumor
and the information they have revealed.
resistance, refractory, and
recurrence mechanisms. It
4.2.1. Differential Expression (DE) can help elucidate genetic
We have discussed DE testing in previous sections while discussing and non-genetic
clustering, but at the gene-level, we focus more on the stochastic nature mechanisms for cancer and
help devise better
of gene expression, and distinctive signatures only observed at the treatment regimes.
single-cell level. Although principally the same as bulk DE analysis, DE Cell-cell (i) Integrating the count Deciphering cell-cell
for single-cell data was developed to deal with artifacts inherent to communication matric generated from interactions is crucial to
scRNA-seq such as multimodality, dropouts, and heterogeneity. Among studies scRNA-seq with known understanding both cellular
ligand-receptor interaction development and diseases.
popular DE tools for scRNA-seq, MAST uses a linear hurdle model to
matrix SCT has enabled the
account for confounders, and DE is determined using the likelihood ratio (ii) Construction of GRNs inference of ligand-receptor
test [115]. SCDE uses a Bayesian approach that incorporates an interactions at an
evidence-based approach to evaluate the likelihood of the average level unprecedented resolution.
of gene expression for individual cells and measure the fold changes Employing SCT we can
identify communication
[116]. These techniques have shown higher sensitivity to other tech­ patterns and use them to
niques. In some cases, bulk DE methods, when used with gene-weights, predict functions of poorly
exceeds performance on scRNA-seq specific DE methods but at the price studied pathways. Tools
of being computationally intensive. Apart from having low detection like NATMI [142] is being
used to identify which cell-
accuracy for true DE genes, there is also a lack of agreement between
type pairs or cellular
various available tools. This demands better tools that account for the communities communicate
multimodality of scRNA-seq data, its artifacts, and identifies true DE more frequently or
genes having biological relevance. Wang, Tianyu et al. performed a specifically, what ligand-
recent comparative analysis of DE tools that could guide researchers to receptor pairs are the most
active within a network,
evaluate DE tools, choose appropriate ones for their analysis, and and has offered insights
improve upon existing techniques [131]. Trajectory-based DE methods into autocrine signaling in
have also been developed called tradeSeq, enabling DE analysis cell-cell communication.
between-lineage and within-lineage, providing a continuous resolution Stem cell (i) Trajectory analysis of stem When used to construct
cells hematopoietic lineage
of gene expression changes through a dynamic process [132].
(ii) DE analysis of progenitor trees, SCT analysis revealed
DE studies allow us to identify distinct expression profiles of cellular cells that the differentiation
pathways that help us understand the effect of perturbations and the process is continuous
underlying mechanisms of disease pathologies. instead of the traditional
belief that it is a stepwise
process [143]. Also helped
4.2.2. Gene Regulatory Network (GRN) reveal a novel pathway
The above discussion on gene expression poses another question on used by stem-cells for self-
how gene expression is regulated in cells. Stochastic gene expression renewal [144]. Another
observed amongst single-cells indicates that gene regulation that relies group of researchers
developed a scRNA-seq
on transcription factors, signaling molecule, and co-factors is regulated
based CRISPER
in a specific way. Uncovering these GRNs will reveal the basis of gene interference technique to
expression stochasticity and provide mechanistic insights into normal (continued on next page)
and abnormal cellular phenotypes. Several tools have been developed

614
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

Table 3 (continued ) Table 3 (continued )


Field of study Scope of analysis Applications Field of study Scope of analysis Applications

study transcription factors successfully identified and


involved in human validated diagnostic and
endoderm development therapeutic biomarkers for
that revealed underlying rheumatoid arthritis for
factors and effects of mice and humans, using
perturbation [145]. scRNA-seq analysis and
It can help devise novel network tools [152]. In the
methods for treating future, it may also help in
genetic diseases and identifying dynamical
developmental disorders network biomarkers that
and better stem-cell will help predict a pre-
therapies. disease state.
Neurobiology (i) Clustering of regional It has been used to identify Precision (i) Longitudinal profiling with scRNA-seq helps identify
brain cells neuronal subtypes [146], Medicine scRNA-seq. the more notorious clones
(ii) Pseudo temporal ordering understanding cellular (ii) Pre- and post-treatment or malignant cells in
(iii) DE analysis of partitioned programs involved in early analysis cancer, thus tailored
brain functions development, cell therapy could be designed
populations in neuronal to target this particular
diseases, tracking group of cells. Profiling
transcriptional landscape of patient samples can help
aging, resolving regional monitor disease states,
cell type landscape, and response to therapies, and
other nuances of brain mechanisms of resistance
function [147]. scRNA-seq [153,154]. However, for
has led to rapid this ambitious idea to
accumulation of normal become a reality, it would
and tumor brain cell data, require more cost-effective
which can be further used and reliable analysis
to understand brain techniques than currently
function and diseases. available.
Infection biology (i) Classification of cellular The recent COVID-19 a
and viral transcriptomes pandemic has drawn all the Applications in different fields are not mutually exclusive.
(ii) Genome profiling of host attention to infectious
cell during infection diseases and host-pathogen for inferring GRN, some derived from bulk analysis methods and some
interactions. It has also
designed explicitly for scRNA-seq data, but GRNs are incredibly complex
made us highly aware of
the lack of robust to decipher. The SCENIC algorithm simultaneously constructs gene
techniques to understand regulatory networks and performs clustering [133]. It bases the identi­
infection mechanisms and fication of stable cellular states on the activity of GRNs in each cell and is
devise treatment strategies. well suited for the discovery of cell states that are driven by transcription
scRNA-seq analysis can
help uncover host-
factors and cis-regulatory sequences. SCGRN is a supervised feature
pathogen relationships, learning-based approach for inferring GRNs that employs three different
map infection progression, ML techniques [134], is promising against unsupervised learning ap­
and help identify druggable proaches used earlier. Another study by Qin et al. presents a toolkit,
targets. The study of human
Scribe, for inferring causal GRNs from single-cell datasets that indicate
microbiome interaction
with immune cells will that pseudotime data perform poorly compared to true time-series data
indicate the pathological [136]. Such insights will encourage newer studies that focus on deriving
state that develops when it true biological insights from scRNA-seq data. Recently, researchers
is perturbed [148,149]. explored GRN inference algorithms and developed a framework called
Diagnostics (i) DE analysis and Gene scRNA-seq data is
regulatory network becoming increasingly
BEELINE, where they used synthetic networks with predictable trajec­
construction for available for pre-clinical tories, literature curated Boolean models, and diverse transcriptional
identification of novel and clinical samples of regulatory networks to assess the accuracy of GRN methods [135]. This
diagnostic biomarkers. diseases. It enables the study can be used as a benchmark for selecting GRN algorithms or for
construction of high-
developing new strategies.
resolution cellular maps for
diseases, helps identify The study of GRNs at single-cell resolution will contribute signifi­
novel biomarkers, and cantly to system biology approaches and help build more precise
unravels underlying disease models. These models can further help discover dynamical network
mechanisms that aids in biomarkers that can predict disease and pre-disease states.
developing better
treatment regimes. Used in
Cerebrospinal Fluid 5. Applications and future prospects
Research and diagnostics
[150], a molecular The past decade has seen a momentous upsurge in SCT and its ap­
diagnostic test using
plications across neurology, microbiology, cell biology, molecular
scRNA-seq analysis
identified two gene-sets biology, immunology, cancer biology, bioinformatics, stem cell,
involved in the biomedical sciences, and clinical and diagnostic applications. Recent
autoimmune response that research efforts evince its potential to be of use in translational research.
is suggestive of disease To list all the applications is beyond the scope of this article. We list
progression and could drive
lupus nephritis treatment
some of the major areas that have benefited from SCT are listed in
[151]. Researchers Table 3.
scRNA transcriptomics can be used with multiple other technologies

615
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

to yield more comprehensive results for understanding cellular biology. [14] D. Grün, A. Lyubimova, L. Kester, K. Wiebrands, O. Basak, N. Sasaki, H. Clevers,
A. Van Oudenaarden, Single-cell messenger RNA sequencing reveals rare
Spatial transcriptomics is one such technology that preserves the spatial
intestinal cell types, Nature (2015), https://doi.org/10.1038/nature14966.
location of gene expression in cells during analysis, and when used with [15] S. Petropoulos, D. Edsgärd, B. Reinius, Q. Deng, S.P. Panula, S. Codeluppi,
scRNA-seq, is an excellent way to study tissue microenvironment. The A. Plaza Reyes, S. Linnarsson, R. Sandberg, F. Lanner, Single-cell RNA-seq reveals
possibilities of what can be achieved with this technology are countless. lineage and x chromosome dynamics in human preimplantation embryos, Cell
(2016), https://doi.org/10.1016/j.cell.2016.03.023.
[16] A.A. Tu, T.M. Gierahn, B. Monian, D.M. Morgan, N.K. Mehta, B. Ruiter, W.
6. Conclusion G. Shreffler, A.K. Shalek, J.C. Love, TCR sequencing paired with massively
parallel 3′ RNA-seq reveals clonotypic T cell signatures, Nat. Immunol. (2019),
https://doi.org/10.1038/s41590-019-0544-5.
Single-cell RNA sequencing technology has zoomed in on cellular [17] R.J. Miragaia, T. Gomes, A. Chomka, L. Jardine, A. Riedel, A.N. Hegazy,
biology like never before. This review objectively points out that single- N. Whibley, A. Tucci, X. Chen, I. Lindeman, G. Emerton, T. Krausgruber,
J. Shields, M. Haniffa, F. Powrie, S.A. Teichmann, Single-cell transcriptomics of
cell analysis is a multi-step process, and no one step is exclusive of the regulatory T cells reveals trajectories of tissue adaptation, Immunity (2019),
other. All the steps, only when carefully monitored in tandem, will give https://doi.org/10.1016/j.immuni.2019.01.001.
precise results. Leveraging SCT and multiple single-cell modalities at [18] M.J.T. Stubbington, T. Lönnberg, V. Proserpio, S. Clare, A.O. Speak, G. Dougan, S.
A. Teichmann, T cell fate and clonality inference from single-cell transcriptomes,
once bears a remarkable ingenuity in understanding complex cellular Nat. Methods (2016), https://doi.org/10.1038/nmeth.3800.
processes, capturing cellular heterogeneity, and disease states. It opens [19] A.K. Shalek, R. Satija, X. Adiconis, R.S. Gertner, J.T. Gaublomme,
new frontiers of research. Efforts to characterize all cells in a human R. Raychowdhury, S. Schwartz, N. Yosef, C. Malboeuf, D. Lu, J.J. Trombetta,
D. Gennert, A. Gnirke, A. Goren, N. Hacohen, J.Z. Levin, H. Park, A. Regev,
body, such as the Chan-Zuckerberg Initiative- Human Cell Atlas, serve as
Single-cell transcriptomics reveals bimodality in expression and splicing in
major reservoirs for researchers across several fields from biological immune cells, Nature (2013), https://doi.org/10.1038/nature12172.
sciences to computational sciences to come together and develop from. [20] J. Wagner, M.A. Rapsomaniki, S. Chevrier, T. Anzeneder, C. Langwieder,
A. Dykgers, M. Rees, A. Ramaswamy, S. Muenst, S.D. Soysal, A. Jacobs,
There is an increasing demand for better tools, techniques, analysis al­
J. Windhager, K. Silina, M. van den Broek, K.J. Dedes, M. Rodríguez Martínez, W.
gorithms, and experimental validation measures that can rapidly P. Weber, B. Bodenmiller, A single-cell atlas of the tumor and immune ecosystem
materialize the vision of understanding biological processes at a single- of human breast cancer, Cell (2019), https://doi.org/10.1016/j.cell.2019.03.005.
cell resolution. [21] J.M. Granja, S. Klemm, L.M. McGinnis, A.S. Kathiria, A. Mezger, M.R. Corces,
B. Parks, E. Gars, M. Liedtke, G.X.Y. Zheng, H.Y. Chang, R. Majeti, W.J. Greenleaf,
Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype
acute leukemia, Nat. Biotechnol. (2019), https://doi.org/10.1038/s41587-019-
0332-7.
Declaration of Competing Interest
[22] C. Yao, H.W. Sun, N.E. Lacey, Y. Ji, E.A. Moseman, H.Y. Shih, E.F. Heuston,
M. Kirby, S. Anderson, J. Cheng, O. Khan, R. Handon, J. Reilley, J. Fioravanti,
The authors do not have any conflicts of interests to declare. J. Hu, S. Gossa, E.J. Wherry, L. Gattinoni, D.B. McGavern, J.J. O’Shea, P.
L. Schwartzberg, T. Wu, Single-cell RNA-seq reveals TOX as a key regulator of
CD8+ T cell persistence in chronic infection, Nat. Immunol. (2019), https://doi.
Acknowledgments org/10.1038/s41590-019-0403-4.
[23] S.M. Shaffer, M.C. Dunagin, S.R. Torborg, E.A. Torre, B. Emert, C. Krepler,
M. Beqiri, K. Sproesser, P.A. Brafford, M. Xiao, E. Eggan, I.N. Anastopoulos, C.
This work was supported by the project “Genetic Analysis of A. Vargas-Garcia, A. Singh, K.L. Nathanson, M. Herlyn, A. Raj, Rare cell
Dermatological Disorders” (BT/PR5402/BID/7/408/2012 dated: 6/7/ variability and drug-induced reprogramming as a mode of cancer drug resistance,
2017), Department of Biotechnology, Government of India. Nature (2017), https://doi.org/10.1038/nature22794.
[24] P. Yu, W. Lin, Single-cell transcriptome study as big data, Genom. Proteome.
Bioinform. (2016), https://doi.org/10.1016/j.gpb.2016.01.005.
References [25] J. Zheng, K. Wang, Emerging deep learning methods for single-cell RNA-seq data
analysis, Quant. Biol. (2019), https://doi.org/10.1007/s40484-019-0189-2.
[26] R. Petegrosso, Z. Li, R. Kuang, Machine learning and statistical methods for
[1] P. Angerer, L. Simon, S. Tritschler, F.A. Wolf, D. Fischer, F.J. Theis, Single cells
clustering single-cell RNA-sequencing data, Brief. Bioinform. (2019), https://doi.
make big data: new challenges and opportunities in transcriptomics, Curr. Opin.
org/10.1093/bib/bbz063.
Syst. Biol. (2017), https://doi.org/10.1016/j.coisb.2017.07.004.
[27] B. Vieth, S. Parekh, C. Ziegenhain, W. Enard, I. Hellmann, A systematic
[2] B. Hwang, J.H. Lee, D. Bang, Single-cell RNA sequencing technologies and
evaluation of single cell RNA-seq analysis pipelines, Nat. Commun. (2019),
bioinformatics pipelines, Exp. Mol. Med. (2018), https://doi.org/10.1038/
https://doi.org/10.1038/s41467-019-12266-7.
s12276-018-0071-8.
[28] G. Chen, B. Ning, T. Shi, Single-cell RNA-seq technologies and related
[3] S. Huang, Non-genetic heterogeneity of cells in development: more than just
computational data analysis, Front. Genet. (2019), https://doi.org/10.3389/
noise, Development (2009), https://doi.org/10.1242/dev.035139.
fgene.2019.00317.
[4] N. Li, H. Clevers, Coexistence of quiescent and active adult stem cells in
[29] K. Sheng, W. Cao, Y. Niu, Q. Deng, C. Zong, Effective detection of variation in
mammals, Science 80 (2010), https://doi.org/10.1126/science.1180794.
single-cell transcriptomes using MATQ-seq, Nat. Methods (2017), https://doi.
[5] B.D. Aevermann, M. Novotny, T. Bakken, J.A. Miller, A.D. Diehl, D. Osumi-
org/10.1038/nmeth.4145.
Sutherland, R.S. Lasken, E.S. Lein, R.H. Scheuermann, Cell type discovery using
[30] S. Picelli, Å.K. Björklund, O.R. Faridani, S. Sagasser, G. Winberg, R. Sandberg,
single-cell transcriptomics: implications for ontological representation, Hum.
Smart-seq2 for sensitive full-length transcriptome profiling in single cells, Nat.
Mol. Genet. (2018), https://doi.org/10.1093/hmg/ddy100.
Methods (2013), https://doi.org/10.1038/nmeth.2639.
[6] S. Linnarsson, S.A. Teichmann, Single-cell genomics: Coming of age, Genome
[31] L.D. Goldstein, Y.J.J. Chen, J. Dunne, A. Mir, H. Hubschle, J. Guillory, W. Yuan,
Biol. (2016), https://doi.org/10.1186/s13059-016-0960-x.
J. Zhang, J. Stinson, B. Jaiswal, K.B. Pahuja, I. Mann, T. Schaal, L. Chan,
[7] E. Shapiro, T. Biezuner, S. Linnarsson, Single-cell sequencing-based technologies
S. Anandakrishnan, C. Wah Lin, P. Espinoza, S. Husain, H. Shapiro,
will revolutionize whole-organism science, Nat. Rev. Genet. (2013), https://doi.
K. Swaminathan, S. Wei, M. Srinivasan, S. Seshagiri, Z. Modrusan, Massively
org/10.1038/nrg3542.
parallel nanowell-based single-cell gene expression profiling, BMC Genomics
[8] P. Angerer, L. Simon, S. Tritschler, F.A. Wolf, D. Fischer, F.J. Theis, Single cells
(2017), https://doi.org/10.1186/s12864-017-3893-1.
make big data: new challenges and opportunities in transcriptomics, Curr. Opin.
[32] X. Fan, X. Zhang, X. Wu, H. Guo, Y. Hu, F. Tang, Y. Huang, Single-cell RNA-seq
Syst. Biol. 4 (2017) 85–91, https://doi.org/10.1016/j.coisb.2017.07.004.
transcriptome analysis of linear and circular RNAs in mouse preimplantation
[9] G. Brady, M. Barbara, N.N. Iscove, Representative in vitro cDNA amplification
embryos, Genome Biol. (2015), https://doi.org/10.1186/s13059-015-0706-1.
from individual hemopoietic cells and colonies, Methods Mol. Cell. Biol. 2 (1990)
[33] S. Islam, U. Kjällquist, A. Moliner, P. Zajac, J.B. Fan, P. Lönnerberg, S. Linnarsson,
17–25, 08987750.
Characterization of the single-cell transcriptional landscape by highly multiplex
[10] J. Eberwine, H. Yeh, K. Miyashiro, Y. Cao, S. Nair, R. Finnell, M. Zettel,
RNA-seq, Genome Res. (2011), https://doi.org/10.1101/gr.110882.110.
P. Coleman, Analysis of gene expression in single live neurons, Proc. Natl. Acad.
[34] S. Islam, U. Kjällquist, A. Moliner, P. Zajac, J.B. Fan, P. Lönnerberg, S. Linnarsson,
Sci. U. S. A. (1992), https://doi.org/10.1073/pnas.89.7.3010.
Highly multiplexed and strand-specific single-cell RNA 5′ end sequencing, Nat.
[11] F. Tang, K. Lao, M.A. Surani, Development and applications of single-cell
Protoc. (2012), https://doi.org/10.1038/nprot.2012.022.
transcriptome analysis, Nat. Methods (2011), https://doi.org/10.1038/
[35] G.X.Y. Zheng, J.M. Terry, P. Belgrader, P. Ryvkin, Z.W. Bent, R. Wilson, S.
nmeth.1557.
B. Ziraldo, T.D. Wheeler, G.P. McDermott, J. Zhu, M.T. Gregory, J. Shuga,
[12] F. Tang, C. Barbacioru, Y. Wang, E. Nordman, C. Lee, N. Xu, X. Wang, J. Bodeau,
L. Montesclaros, J.G. Underwood, D.A. Masquelier, S.Y. Nishimura, M. Schnall-
B.B. Tuch, A. Siddiqui, K. Lao, M.A. Surani, mRNA-Seq whole-transcriptome
Levin, P.W. Wyatt, C.M. Hindson, R. Bharadwaj, A. Wong, K.D. Ness, L.W. Beppu,
analysis of a single cell, Nat. Methods (2009), https://doi.org/10.1038/
H.J. Deeg, C. McFarland, K.R. Loeb, W.J. Valente, N.G. Ericson, E.A. Stevens, J.
nmeth.1315.
P. Radich, T.S. Mikkelsen, B.J. Hindson, J.H. Bielas, Massively parallel digital
[13] D. Hebenstreit, Methods, challenges and potentials of single cell RNA-seq, Biology
(Basel) (2012), https://doi.org/10.3390/biology1030658.

616
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

transcriptional profiling of single cells, Nat. Commun. (2017), https://doi.org/ [62] C. Hafemeister, R. Satija, Normalization and variance stabilization of single-cell
10.1038/ncomms14049. RNA-seq data using regularized negative binomial regression, Genome Biol.
[36] D.M. DeLaughter, The use of the fluidigm C1 for RNA expression analyses of (2019), https://doi.org/10.1186/s13059-019-1874-1.
single cells, Curr. Protoc. Mol. Biol. (2018), https://doi.org/10.1002/cpmb.55. [63] R. Bacher, L.F. Chu, N. Leng, A.P. Gasch, J.A. Thomson, R.M. Stewart, M. Newton,
[37] E.Z. Macosko, A. Basu, R. Satija, J. Nemesh, K. Shekhar, M. Goldman, I. Tirosh, A. C. Kendziorski, SCnorm: robust normalization of single-cell RNA-seq data, Nat.
R. Bialas, N. Kamitaki, E.M. Martersteck, J.J. Trombetta, D.A. Weitz, J.R. Sanes, Methods (2017), https://doi.org/10.1038/nmeth.4263.
A.K. Shalek, A. Regev, S.A. McCarroll, Highly parallel genome-wide expression [64] W.V. Li, J.J. Li, An accurate and robust imputation method scImpute for single-
profiling of individual cells using nanoliter droplets, Cell (2015), https://doi.org/ cell RNA-seq data, Nat. Commun. (2018), https://doi.org/10.1038/s41467-018-
10.1016/j.cell.2015.05.002. 03405-7.
[38] A.M. Klein, L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V. Li, L. Peshkin, D. [65] D. van Dijk, R. Sharma, J. Nainys, K. Yim, P. Kathail, A.J. Carr, C. Burdziak, K.
A. Weitz, M.W. Kirschner, Droplet barcoding for single-cell transcriptomics R. Moon, C.L. Chaffer, D. Pattabiraman, B. Bierie, L. Mazutis, G. Wolf,
applied to embryonic stem cells, Cell (2015), https://doi.org/10.1016/j. S. Krishnaswamy, D. Pe’er, Recovering gene interactions from single-cell data
cell.2015.04.044. using data diffusion, Cell (2018), https://doi.org/10.1016/j.cell.2018.05.061.
[39] P. Hu, W. Zhang, H. Xin, G. Deng, Single cell isolation and analysis, Front. Cell [66] W. Gong, I.Y. Kwak, P. Pota, N. Koyano-Nakagawa, D.J. Garry, DrImpute:
Dev. Biol. (2016), https://doi.org/10.3389/fcell.2016.00116. imputing dropout events in single cell RNA sequencing data, BMC Bioinformatics
[40] A.E. Saliba, A.J. Westermann, S.A. Gorski, J. Vogel, Single-cell RNA-seq: advances (2018), https://doi.org/10.1186/s12859-018-2226-y.
and future challenges, Nucleic Acids Res. (2014), https://doi.org/10.1093/nar/ [67] M. Huang, J. Wang, E. Torre, H. Dueck, S. Shaffer, R. Bonasio, J.I. Murray, A. Raj,
gku555. M. Li, N.R. Zhang, SAVER: gene expression recovery for single-cell RNA
[41] V. Menon, Clustering single cells: a review of approaches on high-and low-depth sequencing, Nat. Methods (2018), https://doi.org/10.1038/s41592-018-0033-z.
single-cell RNA-seq data, Brief. Funct. Genomics (2018), https://doi.org/ [68] S. Tracy, G.C. Yuan, R. Dries, RESCUE: imputing dropout events in single-cell
10.1093/bfgp/elx044. RNA-sequencing data, BMC Bioinformatics (2019), https://doi.org/10.1186/
[42] M.S. Cembrowski, Single-cell transcriptomics as a framework and roadmap for s12859-019-2977-0.
understanding the brain, J. Neurosci. Methods (2019), https://doi.org/10.1016/j. [69] T. Peng, Q. Zhu, P. Yin, K. Tan, SCRABBLE: Single-cell RNA-seq imputation
jneumeth.2019.108353. constrained by bulk RNA-seq data, Genome Biol. (2019), https://doi.org/
[43] D.B. Weibel, G.M. Whitesides, Applications of microfluidics in chemical biology, 10.1186/s13059-019-1681-8.
Curr. Opin. Chem. Biol. (2006), https://doi.org/10.1016/j.cbpa.2006.10.016. [70] P. Ye, W. Ye, C. Ye, S. Li, L. Ye, G. Ji, X. Wu, scHinter: imputing dropout events for
[44] J.S. Marcus, W.F. Anderson, S.R. Quake, Microfluidic single-cell mRNA isolation single-cell RNA-seq data with limited sample size, Bioinformatics (2020), https://
and analysis, Anal. Chem. (2006), https://doi.org/10.1021/ac0519460. doi.org/10.1093/bioinformatics/btz627.
[45] X. Chen, S.A. Teichmann, K.B. Meyer, From tissues to cell types and back: single- [71] P. Qiu, Embracing the dropouts in single-cell RNA-seq data, bioRxiv (2018),
cell gene expression analysis of tissue architecture, Annu. Rev. Biomed. Data Sci. https://doi.org/10.1101/468025.
(2018), https://doi.org/10.1146/annurev-biodatasci-080917-013452. [72] V. Svensson, Droplet scRNA-seq is not zero-inflated, Nat. Biotechnol. (2020),
[46] C. Ziegenhain, B. Vieth, S. Parekh, B. Reinius, A. Guillaumet-Adkins, M. Smets, https://doi.org/10.1038/s41587-019-0379-5.
H. Leonhardt, H. Heyn, I. Hellmann, W. Enard, Comparative analysis of single-cell [73] W. Tang, F. Bertaux, P. Thomas, C. Stefanelli, M. Saint, S. Marguerat,
RNA sequencing methods, Mol. Cell (2017), https://doi.org/10.1016/j. V. Shahrezaei, BayNorm: bayesian gene expression recovery, imputation and
molcel.2017.01.023. normalization for single-cell RNA-sequencing data, Bioinformatics (2020),
[47] V. Espina, J.D. Wulfkuhle, V.S. Calvert, A. VanMeter, W. Zhou, G. Coukos, D. https://doi.org/10.1093/bioinformatics/btz726.
H. Geho, E.F. Petricoin, L.A. Liotta, Laser-capture microdissection, Nat. Protoc. [74] A.T.L. Lun, J.C. Marioni, Overcoming confounding plate effects in differential
(2006), https://doi.org/10.1038/nprot.2006.85. expression analyses of single-cell RNA-seq data, Biostatistics (2017), https://doi.
[48] S. Nichterwitz, G. Chen, J. Aguila Benitez, M. Yilmaz, H. Storvall, M. Cao, org/10.1093/biostatistics/kxw055.
R. Sandberg, Q. Deng, E. Hedlund, Laser capture microscopy coupled with smart- [75] M.B. Cole, D. Risso, A. Wagner, D. DeTomaso, J. Ngai, E. Purdom, S. Dudoit,
seq2 for precise spatial transcriptomic profiling, Nat. Commun. (2016), https:// N. Yosef, Performance assessment and selection of normalization procedures for
doi.org/10.1038/ncomms12139. single-cell RNA-seq, Cell Syst. (2019), https://doi.org/10.1016/j.
[49] T.M. Gierahn, M.H. Wadsworth, T.K. Hughes, B.D. Bryson, A. Butler, R. Satija, cels.2019.03.010.
S. Fortune, J. Christopher Love, A.K. Shalek, Seq-well: portable, low-cost rna [76] P.Y. Tung, J.D. Blischak, C.J. Hsiao, D.A. Knowles, J.E. Burnett, J.K. Pritchard,
sequencing of single cells at high throughput, Nat. Methods 14 (2017) 395–398, Y. Gilad, Batch effects and the effective design of single-cell gene expression
https://doi.org/10.1038/nmeth.4179. studies, Sci. Rep. (2017), https://doi.org/10.1038/srep39921.
[50] J. Cao, J.S. Packer, V. Ramani, D.A. Cusanovich, C. Huynh, R. Daza, X. Qiu, [77] W. Chen, S. Zhang, J. Williams, B. Ju, B. Shaner, J. Easton, G. Wu, X. Chen,
C. Lee, S.N. Furlan, F.J. Steemers, A. Adey, R.H. Waterston, C. Trapnell, A comparison of methods accounting for batch effects in differential expression
J. Shendure, Comprehensive single-cell transcriptional profiling of a multicellular analysis of UMI count based single cell RNA sequencing, Comput. Struct.
organism, Science 80 (2017), https://doi.org/10.1126/science.aam8940. Biotechnol. J. (2020), https://doi.org/10.1016/j.csbj.2020.03.026.
[51] A.B. Rosenberg, C.M. Roco, R.A. Muscat, A. Kuchina, P. Sample, Z. Yao, L. [78] R. Satija, J.A. Farrell, D. Gennert, A.F. Schier, A. Regev, Spatial reconstruction of
T. Graybuck, D.J. Peeler, S. Mukherjee, W. Chen, S.H. Pun, D.L. Sellers, B. Tasic, single-cell gene expression data, Nat. Biotechnol. (2015), https://doi.org/
G. Seelig, Single-cell profiling of the developing mouse brain and spinal cord with 10.1038/nbt.3192.
split-pool barcoding, Science 80 (360) (2018) 176–182, https://doi.org/10.1126/ [79] F.A. Wolf, P. Angerer, F.J. Theis, SCANPY: large-scale single-cell gene expression
science.aam8999. data analysis, Genome Biol. (2018), https://doi.org/10.1186/s13059-017-1382-
[52] B. Hwang, J.H. Lee, D. Bang, Single-cell RNA sequencing technologies and 0.
bioinformatics pipelines, Exp. Mol. Med. 50 (2018), https://doi.org/10.1038/ [80] S.R. Tyler, P.G. Rotti, X. Sun, Y. Yi, W. Xie, M.C. Winter, M.J. Flamme-Wiese, B.
s12276-018-0071-8. A. Tucker, R.F. Mullins, A.W. Norris, J.F. Engelhardt, PyMINEr finds gene and
[53] T. Kivioja, A. Vähärautio, K. Karlsson, M. Bonke, M. Enge, S. Linnarsson, autocrine-paracrine networks from human islet scRNA-seq, Cell Rep. (2019),
J. Taipale, Counting absolute numbers of molecules using unique molecular https://doi.org/10.1016/j.celrep.2019.01.063.
identifiers, Nat. Methods (2012), https://doi.org/10.1038/nmeth.1778. [81] V. Petukhov, J. Guo, N. Baryawno, N. Severe, D.T. Scadden, M.G. Samsonova, P.
[54] S. Islam, A. Zeisel, S. Joost, G. La Manno, P. Zajac, M. Kasper, P. Lönnerberg, V. Kharchenko, dropEst: pipeline for accurate estimation of molecular counts in
S. Linnarsson, Quantitative single-cell RNA-seq with unique molecular identifiers, droplet-based single-cell RNA-seq experiments, Genome Biol. (2018), https://doi.
Nat. Methods (2014), https://doi.org/10.1038/nmeth.2772. org/10.1186/s13059-018-1449-6.
[55] M.D. Luecken, F.J. Theis, Current best practices in single-cell RNA-seq analysis: a [82] R. Hillje, P.G. Pelicci, L. Luzi, Cerebro: interactive visualization of scRNA-seq
tutorial, Mol. Syst. Biol. (2019), https://doi.org/10.15252/msb.20188746. data, Bioinformatics (2019), https://doi.org/10.1093/bioinformatics/btz877.
[56] O. Stegle, S.A. Teichmann, J.C. Marioni, Computational and analytical challenges [83] K. Rue-Albrecht, F. Marini, C. Soneson, A.T.L. Lun, iSEE: interactive
in single-cell transcriptomics, Nat. Rev. Genet. (2015), https://doi.org/10.1038/ summarizedexperiment explorer, F1000Research (2018), https://doi.org/
nrg3833. 10.12688/f1000research.14966.1.
[57] T. Ilicic, J.K. Kim, A.A. Kolodziejczyk, F.O. Bagger, D.J. McCarthy, J.C. Marioni, [84] C. Arisdakessian, O. Poirion, B. Yunits, X. Zhu, L.X. Garmire, DeepImpute: an
S.A. Teichmann, Classification of low quality cells from single-cell RNA-seq data, accurate, fast, and scalable deep neural network method to impute single-cell
Genome Biol. (2016), https://doi.org/10.1186/s13059-016-0888-1. RNA-seq data, Genome Biol. (2019), https://doi.org/10.1186/s13059-019-1837-
[58] P. Brennecke, S. Anders, J.K. Kim, A.A. Kołodziejczyk, X. Zhang, V. Proserpio, 6.
B. Baying, V. Benes, S.A. Teichmann, J.C. Marioni, M.G. Heisler, Accounting for [85] T. Wang, T.S. Johnson, W. Shao, Z. Lu, B.R. Helm, J. Zhang, K. Huang,
technical noise in single-cell RNA-seq experiments, Nat. Methods (2013), https:// BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing
doi.org/10.1038/nmeth.2645. batch correction reveals hidden high-resolution cellular subtypes, Genome Biol.
[59] Chapter 6 Quality Control, Orchestrating Single-Cell Analysis with Bioconductor, (2019), https://doi.org/10.1186/s13059-019-1764-6.
(s.d.), https://osca.bioconductor.org/quality-control.html (accedit 6 agost 2020), [86] L. Haghverdi, A.T.L. Lun, M.D. Morgan, J.C. Marioni, Batch effects in single-cell
2020. RNA-sequencing data are corrected by matching mutual nearest neighbors, Nat.
[60] C.A. Vallejos, D. Risso, A. Scialdone, S. Dudoit, J.C. Marioni, Normalizing single- Biotechnol. (2018), https://doi.org/10.1038/nbt.4091.
cell RNA sequencing data: challenges and opportunities, Nat. Methods (2017), [87] E. Lin, S. Mukherjee, S. Kannan, A deep adversarial variational autoencoder
https://doi.org/10.1038/nmeth.4292. model for dimensionality reduction in single-cell RNA sequencing analysis, BMC
[61] L. Wen, F. Tang, How to catch rare cell types, Nature (2015), https://doi.org/ Bioinformatics 21 (2020) 64, https://doi.org/10.1186/s12859-020-3401-5.
10.1038/nature15204. [88] M. Amodio, D. van Dijk, K. Srinivasan, W.S. Chen, H. Mohsen, K.R. Moon,
A. Campbell, Y. Zhao, X. Wang, M. Venkataswamy, A. Desai, V. Ravi, P. Kumar,
R. Montgomery, G. Wolf, S. Krishnaswamy, Exploring single-cell data with deep

617
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

multitasking neural networks, Nat. Methods (2019), https://doi.org/10.1038/ heterogeneity in single-cell RNA sequencing data, Genome Biol. (2015), https://
s41592-019-0576-7. doi.org/10.1186/s13059-015-0844-5.
[89] V.Y. Kiselev, A. Yiu, M. Hemberg, Scmap: projection of single-cell RNA-seq data [116] P.V. Kharchenko, L. Silberstein, D.T. Scadden, Bayesian approach to single-cell
across data sets, Nat. Methods (2018), https://doi.org/10.1038/nmeth.4644. differential expression analysis, Nat. Methods (2014), https://doi.org/10.1038/
[90] H. Todorov, R. Cannoodt, W. Saelens, Y. Saeys, TinGa: fast and flexible trajectory nmeth.2967.
inference with growing neural gas, Bioinformatics (2020), https://doi.org/ [117] Y. Ma, S. Sun, X. Shang, E.T. Keller, M. Chen, X. Zhou, Integrative differential
10.1093/bioinformatics/btaa463. expression and gene set enrichment analysis using summary statistics for scRNA-
[91] Z. Liu, H. Lou, K. Xie, H. Wang, N. Chen, O.M. Aparicio, M.Q. Zhang, R. Jiang, seq studies, Nat. Commun. (2020), https://doi.org/10.1038/s41467-020-15298-
T. Chen, Reconstructing cell cycle pseudo time-series via single-cell transcriptome 6.
data, Nat. Commun. (2017), https://doi.org/10.1038/s41467-017-00039-z. [118] H.A. Pliner, J. Shendure, C. Trapnell, Supervised classification enables rapid
[92] R. Petegrosso, Z. Li, R. Kuang, Machine learning and statistical methods for annotation of cell atlases, Nat. Methods (2019), https://doi.org/10.1038/s41592-
clustering single-cell RNA-sequencing data, Brief. Bioinform. (2019), https://doi. 019-0535-3.
org/10.1093/bib/bbz063. [119] X. Shao, J. Liao, X. Lu, R. Xue, N. Ai, X. Fan, scCATCH: automatic annotation on
[93] P. Lin, M. Troup, J.W.K. Ho, CIDR: ultrafast and accurate clustering through cell types of Clusters from single-cell RNA sequencing data, iScience (2020),
imputation for single-cell RNA-seq data, Genome Biol. (2017), https://doi.org/ https://doi.org/10.1016/j.isci.2020.100882.
10.1186/s13059-017-1188-0. [120] L. Jiang, Rare cell type detection, en, Methods Mol. Biol. (2019), https://doi.org/
[94] A. Butler, P. Hoffman, P. Smibert, E. Papalexi, R. Satija, Integrating single-cell 10.1007/978-1-4939-9057-3_5.
transcriptomic data across different conditions, technologies, and species, Nat. [121] W. Saelens, R. Cannoodt, H. Todorov, Y. Saeys, A comparison of single-cell
Biotechnol. (2018), https://doi.org/10.1038/nbt.4096. trajectory inference methods, Nat. Biotechnol. (2019), https://doi.org/10.1038/
[95] L. McInnes, J. Healy, N. Saul, L. Großberger, UMAP: uniform manifold s41587-019-0071-9.
approximation and projection, J. Open Source Softw. (2018), https://doi.org/ [122] C. Trapnell, D. Cacchiarelli, J. Grimsby, P. Pokharel, S. Li, M. Morse, N.J. Lennon,
10.21105/joss.00861. K.J. Livak, T.S. Mikkelsen, J.L. Rinn, The dynamics and regulators of cell fate
[96] E. Pierson, C. Yau, ZIFA: dimensionality reduction for zero-inflated single-cell decisions are revealed by pseudotemporal ordering of single cells, Nat.
gene expression analysis, Genome Biol. (2015), https://doi.org/10.1186/s13059- Biotechnol. (2014), https://doi.org/10.1038/nbt.2859.
015-0805-z. [123] M. Setty, M.D. Tadmor, S. Reich-Zeliger, O. Angel, T.M. Salame, P. Kathail,
[97] D. Risso, F. Perraudeau, S. Gribkova, S. Dudoit, J.P. Vert, A general and flexible K. Choi, S. Bendall, N. Friedman, D. Pe’Er, Wishbone identifies bifurcating
method for signal extraction from single-cell RNA-seq data, Nat. Commun. developmental trajectories from single-cell data, Nat. Biotechnol. (2016), https://
(2018), https://doi.org/10.1038/s41467-017-02554-5. doi.org/10.1038/nbt.3569.
[98] Data Portal, Human Cell Atlas, (s.d.), https://www.humancellatlas.org/data-port [124] G. Giecold, E. Marco, S.P. Garcia, L. Trippa, G.C. Yuan, Robust lineage
al/ (accedit 18 març 2020), 2020. reconstruction from high-dimensional single-cell data, Nucleic Acids Res. (2016),
[99] Single Cell Portal, (s.d.). https://singlecell.broadinstitute.org/single_cell (accedit https://doi.org/10.1093/nar/gkw452.
18 març 2020). 2020. [125] J. Chen, L. Rénia, F. Ginhoux, Constructing cell lineages from single-cell
[100] Home, < Single Cell Expression Atlas < EMBL-EBI (s.d.). https://www.ebi.ac. transcriptomes, Mol. Asp. Med. (2018), https://doi.org/10.1016/j.
uk/gxa/sc/home (accedit 18 març 2020), 2020. mam.2017.10.004.
[101] Samples, PanglaoDB (s.d.). https://panglaodb.se/samples.html?species [126] K. Campbell, C. Yau, Ouija: incorporating prior knowledge in single-cell trajectory
=human&protocol=all protocols&sort=mostrecent (accedit 18 març 2020), learning using Bayesian nonlinear factor analysis, bioRxiv (2016), https://doi.
2020. org/10.1101/060442.
[102] M. Eugenio, R.L. Karp, G. Guo, P. Robson, A.H. Hart, L. Trippa, G.C. Yuan, [127] R. Gonzalo Parra, N. Papadopoulos, L. Ahumada-Arranz, J. El Kholtei,
Bifurcation analysis of single-cell gene expression data reveals epigenetic N. Mottelson, Y. Horokhovsky, B. Treutlein, J. Soeding, Reconstructing complex
landscape, Proc. Natl. Acad. Sci. U. S. A. (2014), https://doi.org/10.1073/ lineage trees from scRNA-seq data using MERLoT, Nucleic Acids Res. (2019),
pnas.1408993111. https://doi.org/10.1093/nar/gkz706.
[103] L. Yang, J. Liu, Q. Lu, A.D. Riggs, X. Wu, SAIC: an iterative clustering approach for [128] A. Scialdone, K.N. Natarajan, L.R. Saraiva, V. Proserpio, S.A. Teichmann,
analysis of single cell RNA-seq data, BMC Genomics (2017), https://doi.org/ O. Stegle, J.C. Marioni, F. Buettner, Computational assignment of cell-cycle stage
10.1186/s12864-017-4019-5. from single-cell transcriptome data, Methods (2015), https://doi.org/10.1016/j.
[104] H. Zhang, C.A.A. Lee, Z. Li, J.R. Garbe, C.R. Eide, R. Petegrosso, R. Kuang, ymeth.2015.06.021.
J. Tolar, A multitask clustering approach for single-cell RNA-seq analysis in [129] H. Matsumoto, H. Kiryu, C. Furusawa, M.S.H. Ko, S.B.H. Ko, N. Gouda,
recessive dystrophic epidermolysis bullosa, PLoS Comput. Biol. (2018), https:// T. Hayashi, I. Nikaido, SCODE: an efficient regulatory network inference
doi.org/10.1371/journal.pcbi.1006053. algorithm from single-cell RNA-Seq during differentiation, Bioinformatics. 33
[105] D.A. du Verle, S. Yotsukura, S. Nomura, H. Aburatani, K. Tsuda, CellTree: an R/ (2017) 2314–2321, https://doi.org/10.1093/bioinformatics/btx194.
bioconductor package to infer the hierarchical structure of cell populations from [130] T.N. Tran, G. Bader, Tempora: Cell Trajectory Inference Using Time-Series Single-
single-cell RNA-seq data, BMC Bioinformatics (2016), https://doi.org/10.1186/ Cell RNA Sequencing Data, bioRxiv, 2019, https://doi.org/10.1101/846907.
s12859-016-1175-6. [131] T. Wang, B. Li, C.E. Nelson, S. Nabavi, Comparative analysis of differential gene
[106] J.M. Zhang, J. Fan, H.C. Fan, D. Rosenfeld, D.N. Tse, An interpretable framework expression analysis tools for single-cell RNA sequencing data, BMC Bioinformatics
for clustering single-cell RNA-seq datasets, BMC Bioinformatics (2018), https:// (2019), https://doi.org/10.1186/s12859-019-2599-6.
doi.org/10.1186/s12859-018-2092-7. [132] K. Van den Berge, H. Roux de Bézieux, K. Street, W. Saelens, R. Cannoodt,
[107] V.A. Traag, L. Waltman, N.J. van Eck, From Louvain to Leiden: guaranteeing well- Y. Saeys, S. Dudoit, L. Clement, Trajectory-based differential expression analysis
connected communities, Sci. Rep. (2019), https://doi.org/10.1038/s41598-019- for single-cell sequencing data, Nat. Commun. (2020), https://doi.org/10.1038/
41695-z. s41467-020-14766-3.
[108] C. Xu, Z. Su, Identification of cell types from single-cell transcriptomes using a [133] S. Aibar, C.B. González-Blas, T. Moerman, V.A. Huynh-Thu, H. Imrichova,
novel clustering method, Bioinformatics (2015), https://doi.org/10.1093/ G. Hulselmans, F. Rambow, J.C. Marine, P. Geurts, J. Aerts, J. Van Den Oord, Z.
bioinformatics/btv088. K. Atak, J. Wouters, S. Aerts, SCENIC: single-cell regulatory network inference
[109] E. Azizi, A.J. Carr, G. Plitas, A.E. Cornish, C. Konopacki, S. Prabhakaran, and clustering, Nat. Methods (2017), https://doi.org/10.1038/nmeth.4463.
J. Nainys, K. Wu, V. Kiseliovas, M. Setty, K. Choi, R.M. Fromme, P. Dao, P. [134] T. Turki, Y.H. Taguchi, SCGRNs: Novel supervised inference of single-cell gene
T. McKenney, R.C. Wasti, K. Kadaveru, L. Mazutis, A.Y. Rudensky, D. Pe’er, regulatory networks of complex diseases, Comput. Biol. Med. (2020), https://doi.
Single-cell map of diverse immune phenotypes in the breast tumor org/10.1016/j.compbiomed.2020.103656.
microenvironment, Cell (2018), https://doi.org/10.1016/j.cell.2018.05.060. [135] A. Pratapa, A.P. Jalihal, J.N. Law, A. Bharadwaj, T.M. Murali, Benchmarking
[110] Z. Ji, H. Ji, TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA- algorithms for gene regulatory network inference from single-cell transcriptomic
seq analysis, Nucleic Acids Res. (2016), https://doi.org/10.1093/nar/gkw430. data, Nat. Methods (2020), https://doi.org/10.1038/s41592-019-0690-6.
[111] X. Qiu, Q. Mao, Y. Tang, L. Wang, R. Chawla, H.A. Pliner, C. Trapnell, Reversed [136] X. Qiu, A. Rahimzamani, L. Wang, B. Ren, Q. Mao, T. Durham, J.L. McFaline-
graph embedding resolves complex single-cell trajectories, Nat. Methods (2017), Figueroa, L. Saunders, C. Trapnell, S. Kannan, Inferring causal gene regulatory
https://doi.org/10.1038/nmeth.4402. networks from coupled single-cell expression dynamics using scribe, Cell Syst.
[112] T. Tian, J. Wan, Q. Song, Z. Wei, Clustering single-cell RNA-seq data with a (2020), https://doi.org/10.1016/j.cels.2020.02.003.
model-based deep learning approach, Nat. Mach. Intell. (2019), https://doi.org/ [137] P. Savas, B. Virassamy, C. Ye, A. Salim, C.P. Mintoff, F. Caramia, R. Salgado, D.
10.1038/s42256-019-0037-0. J. Byrne, Z.L. Teo, S. Dushyanthen, A. Byrne, L. Wein, S.J. Luen, C. Poliness, S.
[113] Y. Yang, R. Huh, H.W. Culpepper, Y. Lin, M.I. Love, Y. Li, SAFE-clustering: single- S. Nightingale, A.S. Skandarajah, D.E. Gyorki, C.M. Thornton, P.A. Beavis, S.
cell aggregated (from Ensemble) clustering for single-cell RNA-seq data, B. Fox, P.K. Darcy, T.P. Speed, L.K. MacKay, P.J. Neeson, S. Loi, Single-cell
Bioinformatics (2019), https://doi.org/10.1093/bioinformatics/bty793. profiling of breast cancer T cells reveals a tissue-resident memory subset
[114] A.W. Zhang, C.O. Flanagan, E.A. Chavez, J.L.P. Lim, N. Ceglia, A. Mcpherson, associated with improved prognosis, Nat. Med. (2018), https://doi.org/10.1038/
M. Wiens, P. Walters, T. Chan, B. Hewitson, D. Lai, A. Mottok, C. Sarkozy, s41591-018-0078-7.
L. Chong, T. Aoki, X. Wang, A.P. Weng, J.N. Mcalpine, S. Aparicio, C. Steidl, K. [138] E. Papalexi, R. Satija, Single-cell RNA sequencing to explore immune cell
R. Campbell, S.P. Shah, RNA-seq for tumor microenvironment profiling, Nat. heterogeneity, Nat. Rev. Immunol. (2018), https://doi.org/10.1038/nri.2017.76.
Methods (2019), https://doi.org/10.1038/s41592-019-0529-1. [139] X. Yu, Y.A. Chen, J.R. Conejo-Garcia, C.H. Chung, X. Wang, Estimation of immune
[115] G. Finak, A. McDavid, M. Yajima, J. Deng, V. Gersuk, A.K. Shalek, C.K. Slichter, H. cell content in tumor using single-cell RNA-seq reference data, BMC Cancer
W. Miller, M.J. McElrath, M. Prlic, P.S. Linsley, R. Gottardo, MAST: a flexible (2019), https://doi.org/10.1186/s12885-019-5927-3.
statistical framework for assessing transcriptional changes and characterizing [140] A.L. Roy, Transcriptional regulation in the immune system: one cell at a time,
Front. Immunol. 10 (2019) 1355, https://doi.org/10.3389/fimmu.2019.01355.

618
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619

[141] M.L. Suvà, I. Tirosh, Single-cell RNA sequencing in cancer: lessons learned and [150] T.V. Lanz, A.K. Pröbstel, I. Mildenberger, M. Platten, L. Schirmer, Single-cell high-
emerging challenges, Mol. Cell (2019), https://doi.org/10.1016/j. throughput technologies in cerebrospinal fluid research and diagnostics, Front.
molcel.2019.05.003. Immunol. (2019), https://doi.org/10.3389/fimmu.2019.01302.
[142] R. Hou, E. Denisenko, H.T. Ong, J.A. Ramilowski, A.R.R. Forrest, Predicting cell- [151] E. Der, H. Suryawanshi, P. Morozov, M. Kustagi, B. Goilav, S. Ranabathou,
to-cell communication networks using NATMI, Nat. Commun. (2020), https:// P. Izmirly, R. Clancy, H.M. Belmont, M. Koenigsberg, M. Mokrzycki,
doi.org/10.1038/s41467-020-18873-z. H. Rominieki, J.A. Graham, J.P. Rocca, N. Bornkamp, N. Jordan, E. Schulte,
[143] I.C. Macaulay, V. Svensson, C. Labalette, L. Ferreira, F. Hamey, T. Voet, S. M. Wu, J. Pullman, K. Slowikowski, S. Raychaudhuri, J. Guthridge, J. James,
A. Teichmann, A. Cvejic, Single-cell rna-sequencing reveals a continuous J. Buyon, T. Tuschl, C. Putterman, J. Anolik, W. Apruzzese, A. Arazi, C. Berthier,
spectrum of differentiation in hematopoietic cells, Cell Rep. (2016), https://doi. M. Brenner, J. Buyon, R. Clancy, S. Connery, M. Cunningham, M. Dall’Era,
org/10.1016/j.celrep.2015.12.082. A. Davidson, E. Der, A. Fava, C. Fonseka, R. Furie, D. Goldman, R. Gupta,
[144] K.S. Yan, C.Y. Janda, J. Chang, G.X.Y. Zheng, K.A. Larkin, V.C. Luca, L.A. Chia, A. J. Guthridge, N. Hacohen, D. Hildeman, P. Hoover, R. Hsu, J. James, R. Kado,
T. Mah, A. Han, J.M. Terry, A. Ootani, K. Roelf, M. Lee, J. Yuan, X. Li, C.R. Bolen, K. Kalunian, D. Kamen, M. Kretzler, H. Maecker, E. Massarotti, W. McCune,
J. Wilhelmy, P.S. Davies, H. Ueno, R.J. Von Furstenberg, P. Belgrader, S. M. McMahon, M. Park, F. Payan-Schober, W. Pendergraft, M. Petri, M. Pichavant,
B. Ziraldo, H. Ordonez, S.J. Henning, M.H. Wong, M.P. Snyder, I.L. Weissman, A. C. Putterman, D. Rao, S. Raychaudhuri, K. Slowikowski, H. Suryawanshi,
J. Hsueh, T.S. Mikkelsen, K.C. Garcia, C.J. Kuo, Non-equivalence of Wnt and R- T. Tuschl, P. Utz, D. Waguespack, D. Wofsy, F. Zhang, Tubular cell and
spondin ligands during Lgr5 + intestinal stem-cell self-renewal, Nature (2017), keratinocyte single-cell transcriptomics applied to lupus nephritis reveal type I
https://doi.org/10.1038/nature22313. IFN and fibrosis relevant pathways, Nat. Immunol. (2019), https://doi.org/
[145] R.M.J. Genga, E.M. Kernfeld, K.M. Parsi, T.J. Parsons, M.J. Ziller, R. Maehr, 10.1038/s41590-019-0386-1.
Single-cell RNA-sequencing-based CRISPRi screening resolves molecular drivers [152] D.R. Gawel, J. Serra-Musach, S. Lilja, J. Aagesen, A. Arenas, B. Asking,
of early human endoderm development, Cell Rep. (2019), https://doi.org/ M. Bengnér, J. Björkander, S. Biggs, J. Ernerudh, H. Hjortswang, J.E. Karlsson,
10.1016/j.celrep.2019.03.076. M. Köpsen, E.J. Lee, A. Lentini, X. Li, M. Magnusson, D. Martínez-Enguita,
[146] S. Darmanis, S.A. Sloan, Y. Zhang, M. Enge, C. Caneda, L.M. Shuer, M.G. A. Matussek, C.E. Nestor, S. Schäfer, O. Seifert, C. Sonmez, H. Stjernman,
H. Gephart, B.A. Barres, S.R. Quake, A survey of human brain transcriptome A. Tjärnberg, S. Wu, K. Åkesson, A.K. Shalek, M. Stenmarker, H. Zhang,
diversity at the single cell level, Proc. Natl. Acad. Sci. U. S. A. (2015), https://doi. M. Gustafsson, M. Benson, A validated single-cell-based strategy to identify
org/10.1073/pnas.1507125112. diagnostic and therapeutic targets in complex diseases, Genome Med. (2019),
[147] Q. Mu, Y. Chen, J. Wang, Deciphering brain complexity using single-cell https://doi.org/10.1186/s13073-019-0657-3.
sequencing, Genom. Proteome. Bioinform. (2019), https://doi.org/10.1016/j. [153] A.K. Shalek, M. Benson, Single-cell analyses to tailor treatments, Sci. Transl. Med.
gpb.2018.07.007. (2017), https://doi.org/10.1126/scitranslmed.aan4730.
[148] A.C. Tolonen, R.J. Xavier, Dissecting the human microbiome with single-cell [154] A.J. Wilk, A. Rustagi, N.Q. Zhao, J. Roque, G.J. Martínez-Colón, J.L. McKechnie,
genomics, Genome Med. (2017), https://doi.org/10.1186/s13073-017-0448-7. G.T. Ivison, T. Ranganath, R. Vergara, T. Hollis, L.J. Simpson, P. Grant,
[149] P.M. Strzelecka, A.M. Ranzoni, A. Cvejic, Dissecting human disease with single- A. Subramanian, A.J. Rogers, C.A. Blish, A single-cell atlas of the peripheral
cell omics: application in model systems and in the clinic, DMM Dis. Model. Mech. immune response in patients with severe COVID-19, Nat. Med. 26 (2020)
(2018), https://doi.org/10.1242/dmm.036525. 1070–1076, https://doi.org/10.1038/s41591-020-0944-y.

619

You might also like