A Hitchhiker's Guide To Single-Cell Transcriptomics and Data Analysis Pipelines
A Hitchhiker's Guide To Single-Cell Transcriptomics and Data Analysis Pipelines
A Hitchhiker's Guide To Single-Cell Transcriptomics and Data Analysis Pipelines
Genomics
journal homepage: www.elsevier.com/locate/ygeno
Review
A R T I C L E I N F O A B S T R A C T
Keywords: Single-cell transcriptomics (SCT) is a tour de force in the era of big omics data that has led to the accumulation of
Single-cell transcriptomics massive cellular transcription data at an astounding resolution of single cells. It provides valuable insights into
Single-cell RNA sequencing cells previously unachieved by bulk cell analysis and is proving crucial in uncovering cellular heterogeneity,
Single-cell data analysis
identifying rare cell populations, distinct cell-lineage trajectories, and mechanisms involved in complex cellular
Computational approach
processes. SCT data is highly complex and necessitates advanced statistical and computational methods for
Machine learning
analysis. This review provides a comprehensive overview of the steps in a typical SCT workflow, starting from
experimental protocol to data analysis, deliberating various pipelines used. We discuss recent trends, challenges,
machine learning methods for data analysis, and future prospects. We conclude by listing the multitude of
scRNA-seq data applications and how it shall revolutionize our understanding of cellular biology and diseases.
1. Introduction standard and abnormal cell states. Increasing evidence suggests that
even in similar cells, the gene expression pattern can be heterogeneous
As an elementary school textbook would exclaim, cells are the [3,4]. Although bulk expression analysis could simultaneously assess
fundamental, structural, and functional unit of all living organisms. gene expression levels and differentiate between abundant known cell
Understanding the biology of cells has been at the center of our pursuit types, it could obscure the identification of rare cell types, subtypes and
to unravel the complexities that make an organism. Cell biology research fail to distinguish cell to cell variability [5]. Thus, the understanding of
has undergone a remarkable transformation in recent years with the stochastic cellular processes necessitated a more precise transcriptome
advent of single-cell multi-omics technology. The genome structure of analysis technique to overcome the averaging phenomenon inherent to
every cell is essentially the same for any given individual organism; bulk analysis. Unabated technological advancements in NGS, molecular
however, the genome’s expression pattern determines the physiological biology, cell biology, and bioinformatics has fostered a new wave of
fate of the cell. The observed diversity of phenotypes is due to the ge profiling single cells at genomics, transcriptomics, proteomics, and
notype and the varying expression pattern, abnormalities in which form epigenomics level [6,7].
the basis of various diseases. Mapping of this unique genotype- Single-cell transcriptomics (SCT) involves profiling the complete set
phenotype relationship requires transcriptome profiling, and recent of RNA transcripts of each individual cell for a given population of cells
progress made in high throughput sequencing technologies has enabled [8]. Transcriptome analysis of single-cell was pioneered two decades
the measurement of transcriptomic information at an unprecedented ago, in two separate historical experiments, one by Norman N. Iscove
resolution of single cells [1]. [9], and James Eberwine and group [10,11], that laid the groundwork
Transcriptome profiling has revealed that, for any given cell, the for single-cell transcriptome analysis based on high throughput
transcriptome information reveals the activity of merely a subset of sequencing technologies [12]. scRNA-sequencing is a fast-growing and
genes [2], and each cell type has a unique transcriptomic fingerprint. promising technology for SCT [13] and has rendered microarrays and
Earlier, transcriptome profiling was based on the assumption that all qPCR obsolete.
cells from any given tissue are homogenous, and bulk population The volume and complexity of scRNA-seq data make it a paradigm of
sequencing followed by average expression analysis would provide us big data [1], it has opened doors to a multitude of possibilities in
with sufficient information to understand gene expression in both biomedical research, but we have only tapped a fraction of the potential
* Corresponding author.
E-mail address: [email protected] (Y. Hasija).
https://doi.org/10.1016/j.ygeno.2021.01.007
Received 9 August 2020; Received in revised form 30 December 2020; Accepted 18 January 2021
Available online 22 January 2021
0888-7543/© 2021 Elsevier Inc. This article is made available under the Elsevier license (http://www.elsevier.com/open-access/userlicense/1.0/).
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
of such a large and versatile dataset. scRNA-seq transcriptome profiles end transcripts of the DNA is incapable of examining allele-specific
have paved the way for identification of rare cell types in complex tis expression or alternative splice forms. Some methods rely on FACS
sues [14], cell lineage relationships in early development [15], antigen based sorting, such as MARS-seq, that make them reliant on a larger
specificity of immune cells [16], inferring cellular trajectory [17], initial volume [39] and is disadvantageous when the initial volume is
determination of cell fate [18], distinguishing between normal and low, as in fine-needle aspirates. Another drawback of using FACS is the
abnormal cell [19], understanding tumor heterogeneity [20], identi requirement of antibodies that target specific proteins for sorting; this
fying regulatory signatures in cancer [21], deciphering immune reper poses problems while sorting rare cell subtypes [40]. Thus, each pro
toire for infectious diseases [22], elucidating the mechanism for drug tocol has its sets of advantages and disadvantages that determine the
resistance, and relapse in cancer treatment [23]. With better analysis “depth” (reads/cell) of a given dataset, and it could ultimately affect the
methods, we are uncovering more applications. statistical and biological insight [41]. scRNA-seq is not a “one-size-fits-
scRNA-seq data, although highly potent, poses many challenges on all” technique like the bulk sequencing approach since the depth can
various fronts owing to its big-data characteristics such as sophisticated vary with the protocol being used, cell types being examined, capture
data acquisition techniques, data storage, management, and analysis method, sequencing technique, and alignment stringency during library
[24]. A single scRNA-seq experiment generates a larger volume of high- construction [42]. We discuss a typical scRNA-seq experimental work
dimensional raw data than bulk sequencing methods as it retains the flow and complementary technologies in use.
information of the stochastic expression of genes for individual cells. In
addition, scRNA-seq experimental protocols have more steps compared 2.1. scRNA-seq workflow
to bulk sequencing, which gives rise to more technical biases and
artifacts. 2.1.1. Single-cell isolation
Experimental techniques for scRNA-seq have mushroomed and The generation of scRNA-seq data from a tissue sample involves
improved over time, which has led to the generation of a massive multiple steps. First, the tissue is digested to ensure dissociation, which
amount of data and an increasing demand for computational techniques gives rise to the single-cell suspension from which single-cells are iso
for data analysis. That has led to a spike in developing new experimental lated so that each cell’s mRNA can be profiled separately. Single-cell
protocols, algorithms, and tools to analyze the raw data. Several isolation technique predominantly used for scRNA-seq are plate-based
research groups and commercial companies have designed software techniques and microfluidic-based techniques. Plate-based techniques
tools and packages for data preprocessing and downstream analysis. involve capturing or sorting cells on multi-well plates or microfuge
Machine Learning (ML) approaches, preeminent in big data analysis, tubes, followed by FACS based sorting. Some full-length scRNA-seq
have been a noteworthy addition to the list of approaches used for the techniques like SUPeR-seq, SMART-seq2, MATQ-seq, Cell-seq rely on
underlying computational challenges of dimensionality reduction, plate-based techniques. However, there are many limitations, one being
clustering, and differential expression (DE) analysis [25,26]. Further fewer cells per assay than droplet-based technologies. Microfluidic
more, there are choices between various programming languages like R, technology involves capturing cells in its microfluidic droplet.
Java, MATLAB, C++, and Python. The development of analysis tools is Microfluidics-based techniques have swiftly gained popularity amongst
still in its infancy, and current tools have many shortcomings. The single-cell isolation techniques as it requires less initial volume, is cost-
challenge to improve the reciprocation between speed and accuracy in effective, and aids in massively parallel quantification of single-cell gene
analysis remains. Despite the abundance of techniques, it is hard to expression profiles [43,44]. Microfluidics techniques can be of two
establish a standard that can be used across disciplines, and it is crucial types, i.e. (i) continuous-flow microfluidics, like the Fluidigm’s C1
to make an informed decision while proceeding for analysis as it can Single-Cell Auto Preparation System, (ii) droplet-based microfluidics
have a tremendous impact on the findings. The scRNA-seq analysis tool like InDrop, Drop-seq, and 10X Genomics. Comparison between single-
choice can influence detecting a biological signal comparable to cell isolation techniques is well depicted by Chen X. et al. 2018 [45] and
quadrupling the sample size [27]. Ziegenhain C. et al. 2017 [46]. Droplet-based platforms are readily
This review outlines the general workflow involved in single-cell automated and easily optimizable to suit individual experimental needs
RNA-seq protocols and discusses the popular and promising new depending upon the number of cells to be captured and sequenced.
computational tools for analysis. It provides a comprehensive account of Recent studies have an added advantage of not requiring zero inflation
each step of the analysis, starting from data preprocessing, imputation of over plate-based techniques that need zero inflation for accurate simu
dropouts to tools used for pseudotime ordering, and rare cell type lation [27].
identification. It also discusses ML approaches in the analysis steps, Another technique for single-cell isolation from solid tissue is Laser
wherever applicable. The review concludes with a discussion of appli Capture Microdissection, LCM-seq, a laser system-aided isolation of cells
cations across fields in biological sciences, remaining challenges, and directly from solid tissue, coupled with in-situ RNA-sequencing tech
prospects. niques [47,48], which conserves spatial information of mRNA expres
sion within the morphology of a tissue. This method enables isolation of
2. Single-cell RNA sequencing technology rare cell types even in highly heterogeneous clinical samples, with
increased accuracy and understanding of dynamic cellular systems.
A wide range of scRNA-seq protocols has been developed to In the lookout to overcome these techniques’ inefficiencies, nanowell
accommodate the high demand for improved techniques with high based single-cell isolation like Seq-Well promises cost-effectiveness,
throughput. New methods are being developed to counter batch effects throughput, and portability and requires only nanoliter sized initial
and technical noise since it is important to regulate the initial steps to volume [49]. Some newer techniques eliminate the single-cell isolation
alleviate the computational burden during data analysis. step, like in SPLiT-seq (split-pool ligation-based transcriptome
scRNA-seq technologies currently in use can be divided into four sequencing) and sci-RNA-seq (Single-cell Combinatorial Indexing RNA
broad classes based on transcript coverage approach [28]: (i) full-length sequencing [50]). SPLiT-seq allows for simplified and low-cost tran
transcript sequencing [example- MATQ-seq [29], SMART-seq2 [30], scriptome profiling compatible with fixed cells or nuclei and offers high-
ICELL8 [31] SUPeR-seq [32]], (ii) 5′ -end transcript sequencing resolution [51].
[example- STRT-seq [33,34]], (iii)3′ -end transcript sequencing
[example- Chromium [35] 10X Genomics, Fluidigm C1 [36], Drop-seq 2.2. scRNA-seq library preparation
[37], inDrop [38]]. With full-length transcript sequencing approach,
there is an issue of resolution, speed, and sequencing cost. On the other Much like any RNA library preparation, it roughly entails reverse
hand, a major drawback of cDNA sequencing prioritizing either 5′ or 3′ - transcription of captured mRNA into first-strand cDNA synthesis,
607
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
second-strand synthesis, and cDNA amplification followed by many technical artifacts that may arise due to cell bursting leading to
sequencing [52]. But it is more challenging as the amount of RNA per RNA leakage, multiple cells sticking together, and lowly expressed RNA
cell is low compared to bulk RNA-seq experiments. A thorough analysis leading to dropouts, amplification bias, transcriptional bursting, RNA
of scRNA-seq required the profiling of a large number of represented degradation, and batch effect. Before performing downstream analysis,
individual cells, which is a task worthy of Sisyphus and significantly it is crucial to ensure that all the cellular barcode data obtained from
adds to the cost of carrying out sequencing. The use of Unique Molecular scRNA-seq correspond to viable cells [55]. Another challenge is to pre
Identifiers (UMI) or cellular barcodes somewhat simplified the process. vent false interpretation of technical artifacts (cells that show technical
Cell barcodes are primarily designed to be able to distinguish between noise that appears as distinguishable gene expression pattern) as bio
read transcripts originating from different cells. To fully determine the logical heterogeneity [56].
uniqueness of the reads, UMIs (short molecular tags composed of a A typical scRNA-seq dataset constitutes of three files, genes quanti
unique random sequence) are added to the reverse transcription step fied (gene IDs), cells quantified (cellular barcode data), and a count
(5’end in template switching or 3′ end in oligo-dT primer) [53]. They matrix, irrespective of technology or pipeline used. These files are
constitute the second portion of a barcode and primarily detect and crucial for building quality matrices for QC assessment. The barcodes
quantify unique mRNA transcripts [46] such that amplicons of the same are extracted and annotated, called demultiplexing or barcode extrac
transcript are only counted once. This allows for multiplexing of scRNA- tion, followed by mapping and alignment of the read data using read
seq, even for low abundant transcripts that show poor reproducibility processing pipelines. Post alignment, feature annotation, and quantifi
with previous quantification methods based on the number of cation are carried out on the data to generate gene expression matrices
sequencing reads [54]. Still, current UMI based approaches are poorly (N(cells) x M(genes)) indicative of the level of gene expression in each
suited for the identification of allele-specific expression or alternative cell, based on the molecular counts or read counts Fig. 1. These corre
splice variants [2]. Post library construction, cDNA libraries labeled spond to high mapping quality exonic loci [55]. A list of preprocessing
with cellular barcodes are pooled for sequencing, mapping, and align and downstream analysis tools is shown in Table 1.
ment. Current RNA-seq technologies rely on pooled-sequencing in order
to generate high throughput data. This allows for amplification and 3.1. Quality control
sequencing of multiple cells parallelly in the same pool that generates
batch-specific output data containing sequences from multiple cells. QC filtering can be performed using a combination of strategies.
Some common strategies are based on assigned barcodes- number of
3. Preprocessing scRNA-seq data counts per barcode and number of genes per barcode [57]. Cells that
show unique gene counts and genes expressed in very few cells are not
Sequencing generates reads (raw data) that need to undergo quality always indicative of biological heterogeneity, low or high count-depth
control (QC) before downstream analysis. scRNA-seq data contains are indicative of quiescent/damaged cells or doublets/multiplets,
608
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
609
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
Table 1 (continued ) transcript count within each cell to the median transcript number across
Analysis category Pipeline Environment Description cells [61]. Clustering based on Transcript Compatibility Counts (TCC)
uses equivalence classes in place of genes as parameters and normalizes
https://github.com/
RGLab/MAST\
each parameter by distributing the total count across all the cells.
SCDE R Uses a Bayesian approach Sctransform pipeline interfaces with Seurat and Pearson residuals from
that incorporates an negative binomial regression, which has been regularized. In regression,
evidence-based approach sequencing depth is used as a covariate to eliminate technical artifacts
to evaluate the likelihood
[62]. SCnorm uses quantile regression to approximate the dependency
of the average level of
gene expression for of expression of transcript or depth of sequencing per gene [63]. Similar
individual cells and dependency genes are clustered together, and then a second quantile
measure the fold changes. regression is utilized to approximate scale parameters in every group. In-
Highly sensitive. group correction is achieved by using the approximate scale parameters
https://hms-dbmi.github.
io/scde/index.html
to deliver normalized estimations of expression.
The diversity of scRNA-seq protocols makes it difficult to standardize
Note:https://www.scrna-tools.org/ is a catalog of tools for analyzing single-cell any one normalization method. It has been observed that different
RNA sequencing data.
normalization methods perform optimally for different datasets, and the
same goes for cell-level and gene-level normalization. Post normaliza
respectively. Another covariate used for QC is the fraction of mito tion, log(x + 1) transformed count matrices are obtained that give a
chondrial genes per barcode. Elevated levels of mitochondrial genes simplified account of expression levels in terms of log-fold changes and
(above 5–10%) in a cell is an indication that the cell may have broken bring down the skewness of the data [55]. Downstream analyses that are
and the cytoplasmic mRNA content has leaked. Furthermore, RNA spike- based on the assumption that scRNA-seq data is normally distributed
ins (synthetically generated short RNA polymers of known quantity) are and perform analysis on log-transformed data may sometimes result in
used for calibration purposes, where a low mapping ratio between counterfeiting DE effects. Thus, there is a pressing need to develop more
endogenous RNA and spike-ins is indicative of a low-quality library precise and robust normalization methods designed explicitly for
[58]. scRNA-seq data.
The quality metrices are visualized to determine the outlier cells.
Low-quality cells are filtered out by setting appropriate thresholds. 3.3. Data correction
While filtering out outlier cells, multiple independent variables must be
considered together rather than individual ones as it can lead to Normalized data successfully removes amplification and count depth
misinterpretation of biological heterogeneity. Thresholds can be fixed or biases; however, a few challenging technical and biological biases
adaptive. Setting fixed thresholds requires experience as suitable remain. Data correction deals with batch effects, dropouts, and biolog
thresholds may vary for each experimental protocol or biological sys ical effects. scRNA-seq data is prone to zero-inflated values, otherwise
tems [59]. An alternative is adaptive thresholds that are decided based known as dropouts [64], resulting from low sensitivity of scRNA-seq
on the outlier peaks for the QC covariates. It is essential to reevaluate the protocols, inefficient capture of mRNA, low amounts of mRNA in cells,
QC metrices after filtering before proceeding for further analysis. or transient gene expression [2]. The dependency of downstream anal
ysis on the accuracy of gene expression profiles makes the imputation of
3.2. Normalization dropouts a crucial step. Many analysis pipelines account for dropouts
during analysis; however, recent findings can change how we look at
UMI-based protocols inherently reduce amplification biases, and the dropouts. Imputation is carried out either by direct expression analysis
addition of spike-ins enables assessment of sensitivity. However, cell- or model-based. Newer and more robust ML-based algorithms have
level (counts comparable between cells) and gene-level (counts com taken over popular imputation techniques like Markov Transition
parable between genes) normalization is carried out to cater to sampling Matrix-based MAGIC [65], Clustering-based DrImpute [66], LASSO
effects or technical biases that remain in the data due to variability in the regression-based ScImpute [64]. SAVER(Single-cell Analysis Via
protocols. Once the count matrix is obtained, normalization sought to Expression Recovery) uses gene-gene relationships to recover true
address the gene expression variability between cells in count data to expression levels of each cell [67], RESCUE (REcovery of Single-Cell
prevent the highly expressed genes from influencing the analysis. Pop Under-detected Expression) enhances cell-type identification based on
ular normalization methods have been derived from bulk RNA-seq an ensemble-based method to minimize feature selection bias and count
analysis methods and have been successfully applied to scRNA-seq error and perform imputation by comparing gene expression levels of
data such as DESeq and Trimmed Mean of M-values [60]. Popularly, other cells with similar patterns [68]. SCRABBLE, a matrix regulariza
raw scRNA-seq read library normalization is carried out using read tion framework, uses bulk RNA seq data as a constraint that improves
count normalization/ CPM (Counts Per Million) methods like RPKM the accuracy and estimation of gene expression distribution across cells
(Read Per Kilobase Million), TPM (Transcripts Per Million), and FPKM compared to scRNA-seq analysis in isolation [69]. For datasets that
(Fragments Per Kilobase Million). The scaling factors in these methods suffer from imbalance and limited sample sizes, scHinter, with a hier
are based on the assumption that the majority of the genes are not archical framework for random interpolation by leveraging minority
differentially expressed, so they might fail when fold-change of DE genes oversampling technique [70], proves to be a robust technique than its
is high across cell populations under study [60]. These library-based or predecessors. A recent study explores the opposite view of dropouts,
global-scaling normalization methods derived from bulk RNA analysis, where instead of imputation, it can be used as a signal. It was observed
other than being computationally intensive, have their shortcomings that binary dropout patterns prove almost equally informative as
when used for scRNA-seq data. Given the added complexity of scRNA- quantitative expression patterns of highly variable genes in cell type
seq data due to data sparsity and high heterogeneity, it requires identification [71].
advanced normalization strategies to address specific biases. Furthermore, studies have been conducted to establish that the
SINCERA is a commonly used normalization pipeline, in which gene excess of dropouts is consistent with stochastic sampling of molecular
normalization is performed by z-score, while cell normalization is per counts, and any additional zero values may result from biological vari
formed by trimmed mean. For example, during clustering, BISCUIT uses ation [72]. Such studies suggested that a negative binomial distribution
iterative normalization by learning features representing technical model for UMI based scRNA-seq count data would suffice, and zero-
modifications. RaceID (Rare cell type identification) normalizes the total inflation may not always be necessary [46,72,73]. It can also be
610
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
inferred that the number of dropouts can be decreased by increasing the Table 2
depth of sequencing or increasing global count with more efficient Classes of clustering algorithms used in scRNA-seq following ways [92].
capturing methods. Class of Principle Limitations Pipeline
Apart from dropouts, several other technical covariates like batch clustering
effects and biological covariates need to be considered. Removal of such algorithm
biases must be carried out simultaneously as there might be a de Distance- Unsupervised Sensitive to outliers, SCUBA [102],
pendency between multiple covariates under consideration. Batch ef matrix learning algorithms biased towards data PCAKmeans,
fects arise from data handling in different experiment batches or time like k- means falls shape/cluster shape, pcaReduce, SAIC
in this category. and the number of [103], scVCMD
points and are highly nonlinear variations. Batch effects can have a This algorithm first clusters must be [104]
significant impact on DE analysis. Some methods include aggregation- identifies k specified
based methods [74] that pool cells from batches to form a pseudo- centroids or means beforehand.
bulk sample and use bulk analysis approaches or nested fixed [75] iteratively, and
data points are
and mixed effect models [76] that treat batch effects as fixed effects
assigned to the
nested within each group or random-effects shared between cells from cluster around the
each batch, respectively. A comparison between different batch nearest centroids.
correction methods is given in Chen et al., 2020 [77], and some popular During cluster
batch correction methods are listed in table 1. Upon removal of batch allocation of
datapoints, the in-
effects, data is merged for further unbiased analysis. Biological cova cluster sum of
riates may arise due to important biological processes like cell cycle squares is reduced,
effects that affect cell-size and mRNA counts. Correcting for such vari and the position of
ation helps reveal important biological signals and processes. Linear centroids is
iteratively
regression against a cell cycle score, and correction for cell size during
optimized.
normalization are some ways to remove the effects of the cell cycle [55]. Scalable and time-
However, data correction for biological effects may not always be in the efficient.
best interest, and correction for one effect may mask another. Thus, it is Hierarchical Generates clusters Time intensive BackSPIN,
advised first to evaluate the study’s objective and context before clustering into a hierarchical cellTree [105],
structure and is DendroSplit [106]
deciding on data correction measures.
popular in gene CIDR [93]
Despite the numerous QC measures, it is hard to determine each expression analysis.
step’s stringency before assessing its effect on downstream analysis. It overcomes the
Thus, a feedback system should be followed to regulate QC stringency limitation of k-
means of specifying
alongside downstream analysis.
the number of
clusters a priori and
3.4. Dimensionality Reduction (DR) handling different
shapes of clusters.
scRNA-seq data is computationally intensive, noisy, and suffers from No assumptions are
made about the
the curse of dimensionality. scRNA-seq expression metrics are of high distribution of data
dimension, but not all genes are required for meaningful classification of points, and each
cellular expression profiles, and it can practically be explained in fewer cluster links to
dimensions, focusing only on relevant biological signals. DR enables another by
branches and is
better data visualization and resolves the statistical issue of data spar
nested like a
sity. An effective low-dimensional representation should summarize the hierarchical tree in
data in a few optimal dimensions that must retain the underlying the form of a
structure in the data to describe the variability of the dataset. Some of dendrogram. This
the popular techniques for DR are Principal Component Analysis (PCA), representation
facilitates
t-distributed Stochastic Neighbor Embedding(t-SNE), Uniform Manifold meaningful data
Approximation, and Projection (UMAP), Self Organizing Maps (SOM), interpretation.
and Model embedded dimension reduction [92]. Some clustering pipe Graph-based Supervised- Reliance on heuristic Seurat [78]
lines come integrated for both dropout imputation and DR like CIDR learning solutions sometimes scanpy [79]
algorithms. leads to spurious SNN-Cliq [108]
[93].DR has two components: feature selection, where one selects a
Projects a graph results, and iteration
smaller subset from the original set of variables, and the other part is representation of sometimes masks
feature extraction, where the high dimension data is projected to a lower data in which the small communities.
dimension. Feature selection is carried out based on the expression nodes correspond
variability of genes according to the assumption that genes showing high to datapoints/cells,
and edges
variability correspond to biological variation. Per-gene variation can be correspond to
quantified by calculating the variance of log-normalized value. A subset pairwise similarity
of highly variable genes (HVG), depending on the type of dataset, is between the
selected for further analysis. Selecting a larger subset of HVGs may in datapoints, for
example, K-
crease the noise but reduces the risk of discarding biologically relevant
Nearest Neighbor.
signals. Seurat performs feature selection by modeling the mean- Clusters are based
variance relationship [94]. on neighboring cell
After feature selection, DR is carried out using linear or nonlinear pairs. Graph-based
techniques. PCA is a linear projection method that linearly transforms clustering
techniques have
the original dataset into PCs ranked in decreasing order of variance, and various subtypes
the data’s variance is maximized in the lower dimensional space. It is like Louvain
computationally efficient and removes redundant features, but since (continued on next page)
scRNA-seq has a highly nonlinear structure, PCA alone is not best suited
611
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
612
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
dimensionally reduced dataset, and clusters are made based on cell- approach, several parameters are needed for reliable evaluation. Sta
specific molecular profiles. Results of a clustering analysis can itself be tistical and experimental validation is often needed. Transient biological
of much significance or can serve as a covariate in other downstream states make it difficult to identify cell states. However, the generation of
analyses. There has been an attempt to develop robust cluster analysis more comprehensive and extensive cell atlases will facilitate better
algorithms (as shown in Table 2) that can address the vast heterogeneity clustering, cluster annotation results, and better computational ap
of cells, but there is still a lack of an optimal algorithm that fares well proaches to help overcome technical challenges.
across datasets.
After clustering, the clusters need to be annotated in order to give 4.1.2. Trajectory inference
them biological relevance. Cluster annotation can be achieved either by Trajectory Inference (TI), also known as pseudo-temporal ordering,
thorough examination of literature, reference cell databases, or by is a process of characterization of underlying dynamic cellular processes.
identifying gene signatures or differentially expressed marker genes. Although clustering successfully builds discrete clusters of cell types and
CellAssign [114] and scMAP [89] rely on the former technique for subtypes, it does not account for the variability due to dynamic cellular
cluster annotation, but since most marker genes have been identified by processes like transient cell states in cell differentiation, cell cycle, or
bulk analysis in the reference datasets, it is limited to giving a classical environmental effect. TI deals with this blind spot by ordering cells
view of the cell types. Also, it is not necessary that cell types in reference along a continuous path that minimizes transcriptional changes between
databases will correspond to all the cell types present in the dataset successive cell pairs, called pseudotime (one dimensional manifold),
under investigation. Analysis pipelines like Seurat, SC3, scVDMC, make that represents the progression of the cell through its dynamic processes
use of differential gene expression approach. Full gene expression pro measured in terms of transcriptional changes that a cell undergoes
files are used in DE analysis for marker gene identification and cluster during a biological process. Some datasets have an expected temporal
annotation, that is performed using simple statistical tests. The quanti component such as cells from developing embryos, immune cells during
tative levels in gene expression are measured amongst clusters and all an immune response, tumor cells, progenitor stem cells, etc. Under
the cells in the dataset. Based on statistical tests like Wilcoxon rank sum standing of cell differentiation via bulk RNA analysis gave us the
test (used in Seurat), Welch’s t-test, Kruskal-Wallis test, etc. marker gene impression that cell differentiation occurs in discrete stages, but in re
sets are identified i.e., the top-ranked genes from these tests. DE analysis ality, it is a continuous process that may appear chaotic but can be or
could either be carried out in succession to clustering like in Seurat, dered along continuous trajectories. Following the cells along a pseudo-
simultaneously like in scVDMC and DendroSplit, or by using DE soft temporal trajectory and analyzing gene expression changes yields
ware like MAST [115], SCDE [116], and ZingeR. Gene set enrichment valuable insights into the cellular regulatory processes, dynamic states,
analysis is carried out against reference gene sets (set of genes grouped and abnormal cell states. The progression of cells in a given cellular
as they share common chromosome location, biological function, or process is rarely synchronized. Capturing this asynchrony poses a
regulation) using statistical parameters like Jaccard index, and clusters unique challenge of deciphering the sequence of regulatory events. TI
are annotated accordingly. The important thing to note here is the p- takes into account a snapshot view of these events and uses computa
value that is based on the assumption that the marker gene identified tional techniques to infer the order of the cells along their develop
represents the biological phenomenon, but the p-value is often inflated mental trajectories. The derived trajectory topologies can be linear,
and leads to an overestimation of marker genes. Most existing GSEA bifurcating, multifurcating, complex tree structures, or graph structures
methods have been developed for bulk RNA seq analysis and perform [121]. TI is mostly used for cell-lineage construction. During cellular
poorly in case of scRNA-seq, thus Ma, Y., Sun, S., Shang, X. et al. came up developmental stages, cells express unique cellular markers and various
with an integrative DE-GSE analysis technique called iDEA [117] that lineage marks that can serve helpful for tracing lineage along pseudo
makes use of DE summary statistic and thus easy to use with current DE time trajectories, such as somatic mutations, single nucleotide poly
methods and efficiently produces well-assessed p-values for enriched morphisms, copy number variants, microsatellites, transposons, and
gene set detection. Nowadays, automated cluster annotation techniques retroviral sequences.
are becoming increasingly available like scMAP [89], which follows a Two broad strategies used for TI are DR-based methods (Monocle
projection-based approach where scRNA-seq data is projected onto a [122], Wishbone [123]), which use the reduced latent space as the first
previously annotated cell type or dataset. Garnett uses a supervised phase in inference and assigns pseudotime to individual cells, or
classification approach for rapid annotation [118], scCATCH [119] clustering-based methods (TSCAN [110], SCUBA [102], ÉCLAIR [124])
makes use of CellMatch reference database for annotation followed by which builds a network connecting clusters and applies pseudo-
evidence-based scoring for increased performance. Another important temporal ordering of clusters [125]. Monocle is a pioneering pseudo-
goal is to identify rare cell types that may appear as outliers in clustering temporal ordering algorithm to demonstrate how pseudotime analysis
results that only consider global differences in gene expression. RaceID can reveal important cellular regulatory interactions [122]. It uses an
[61] and GiniClust [120] are clustering algorithms sensitive to identi unsupervised learning approach that constructs minimum spanning
fying rare cell types. RaceID is based on the assumption that a given cell- trees (MST) for ordering cells along pseudotime. Several other algo
type must express some genes that are specific to the cell type and rithms like Wishbone, TSCAN, versions of Monocle (currently Monocle
appear as outliers but if the focus is shifted from global to such cells and 3) have been developed that are more robust and accurate, and a
the technical and the biological noise are accounted for by setting detailed comparative account can be found in Saelens et al. 2018 [121].
appropriate thresholds, it will enable the identification of rare cell types. While most TI algorithms are unsupervised learning-based models,
To an extent, the determination of cell-types is dependent on user- Ouija is a supervised learning-based algorithm that was developed
defined criteria, since for different researchers, the level of cluster res keeping in mind that several confounding factors that affect biological
olution may vary, and in some cases, sub clustering of clusters may also processes like cell-cycle and apoptosis sometimes need to be accounted
be required. The choice of the resolution also has a significant effect on for to get biologically plausible pseudotime trajectories [126]. It uses
the results. DendroSplit, a clustering framework, allows the user to switch-like marker genes and can be used as a complementary method
cluster using feature selection that enables identifying multiple levels of with existing methods owing to its consideration of gene-specific be
biologically meaningful cell populations in a dataset, also suitable for haviors as opposed to unsupervised methods. Despite the availability of
detecting rare cells. [106]. Despite continual attempts on developing more than 70 TI methods, researchers find it challenging to determine
new algorithms, clustering and cluster annotations suffer from various which is best suited for their analysis. Selection depends on the task like
challenges in both biological and computational front. It is advisable to types of biological processes being studied, whether it is a cell differ
use a cocktail of automated and manual cluster annotation measures to entiation process (Wishbone), lineage trees (MerLOT [127]), cell cycle
get precise results. But since clustering is an unsupervised learning (Cyclone [128], reCAT [91]), or downstream analysis, like inferring
613
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
GRN (SCODE [129]), DE. Each trajectory method has its pros and cons, Table 3
and there is a lack of standardization. Dynverse is a collection of R Current and future applications of scRNA-seq analysis in major fields of
packages specifically designed to address this issue so that researchers biomedical sciences.a
can perform TI, quantify it, or compare it to other available methods to Field of study Scope of analysis Applications
decide on the best approach for their dataset [121]. Recent advance Immunology (i) Clustering of regional Identification of novel
ments have led to the use of time-series data in place of snapshot data. immune cells immune cell subtypes
Tempora is an upcoming algorithm that may prove to be more biologi (ii) Trajectory analysis of [137], revealing immune
cally relevant than previous methods as it uses biological pathway in individual immune microenvironment across
lineages tissues [138],
formation and identifies time-dependent pathways for ordering and
understanding regional
inferring time-series data [130]. immunity in tumors [139].
Undoubtedly, TI is becoming a popular tool for studying biological Build developmental
processes like cellular differentiation, immune response, tumor pro trajectory of immune cells,
gression, and resistance, but inferring trajectories alone cannot be reli and gain mechanistic
insights [140].
able as it needs to be validated using supporting biological evidence. Immunology studies with
With better algorithms and more time-series data availability, TI can be the aid of SCT will help us
used to predict pre-disease state of cells, which may help in the early develop better and targeted
detection of diseases. immunotherapies.
Cancer biology (i) Clustering of tumor Researchers and clinicians
microenvironment have struggled with
4.2. Gene-level analysis (ii) DE analysis of tumors understanding tumor
(iii) Construction of gene heterogeneity for a long
Gene-level analysis is an integral part of cell-level analysis for regulatory maps time. scRNA-seq has
revealed intra-tumor, inter-
studying cellular structures and identities, but independently gene-level
tumor heterogeneity, and
analysis of single cells reveals a more comprehensive inference of rare tumor subpopulations
cellular pathways and regulatory networks. It involves DE analysis, [141]. It will help
pathway analysis, gene regulatory networks (GRN), and gene set anal understand cell-cell
ysis. We shall be discussing some of the important gene-level analysis interactions in the tumor
ecosystem, tumor
and the information they have revealed.
resistance, refractory, and
recurrence mechanisms. It
4.2.1. Differential Expression (DE) can help elucidate genetic
We have discussed DE testing in previous sections while discussing and non-genetic
clustering, but at the gene-level, we focus more on the stochastic nature mechanisms for cancer and
help devise better
of gene expression, and distinctive signatures only observed at the treatment regimes.
single-cell level. Although principally the same as bulk DE analysis, DE Cell-cell (i) Integrating the count Deciphering cell-cell
for single-cell data was developed to deal with artifacts inherent to communication matric generated from interactions is crucial to
scRNA-seq such as multimodality, dropouts, and heterogeneity. Among studies scRNA-seq with known understanding both cellular
ligand-receptor interaction development and diseases.
popular DE tools for scRNA-seq, MAST uses a linear hurdle model to
matrix SCT has enabled the
account for confounders, and DE is determined using the likelihood ratio (ii) Construction of GRNs inference of ligand-receptor
test [115]. SCDE uses a Bayesian approach that incorporates an interactions at an
evidence-based approach to evaluate the likelihood of the average level unprecedented resolution.
of gene expression for individual cells and measure the fold changes Employing SCT we can
identify communication
[116]. These techniques have shown higher sensitivity to other tech patterns and use them to
niques. In some cases, bulk DE methods, when used with gene-weights, predict functions of poorly
exceeds performance on scRNA-seq specific DE methods but at the price studied pathways. Tools
of being computationally intensive. Apart from having low detection like NATMI [142] is being
used to identify which cell-
accuracy for true DE genes, there is also a lack of agreement between
type pairs or cellular
various available tools. This demands better tools that account for the communities communicate
multimodality of scRNA-seq data, its artifacts, and identifies true DE more frequently or
genes having biological relevance. Wang, Tianyu et al. performed a specifically, what ligand-
recent comparative analysis of DE tools that could guide researchers to receptor pairs are the most
active within a network,
evaluate DE tools, choose appropriate ones for their analysis, and and has offered insights
improve upon existing techniques [131]. Trajectory-based DE methods into autocrine signaling in
have also been developed called tradeSeq, enabling DE analysis cell-cell communication.
between-lineage and within-lineage, providing a continuous resolution Stem cell (i) Trajectory analysis of stem When used to construct
cells hematopoietic lineage
of gene expression changes through a dynamic process [132].
(ii) DE analysis of progenitor trees, SCT analysis revealed
DE studies allow us to identify distinct expression profiles of cellular cells that the differentiation
pathways that help us understand the effect of perturbations and the process is continuous
underlying mechanisms of disease pathologies. instead of the traditional
belief that it is a stepwise
process [143]. Also helped
4.2.2. Gene Regulatory Network (GRN) reveal a novel pathway
The above discussion on gene expression poses another question on used by stem-cells for self-
how gene expression is regulated in cells. Stochastic gene expression renewal [144]. Another
observed amongst single-cells indicates that gene regulation that relies group of researchers
developed a scRNA-seq
on transcription factors, signaling molecule, and co-factors is regulated
based CRISPER
in a specific way. Uncovering these GRNs will reveal the basis of gene interference technique to
expression stochasticity and provide mechanistic insights into normal (continued on next page)
and abnormal cellular phenotypes. Several tools have been developed
614
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
615
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
to yield more comprehensive results for understanding cellular biology. [14] D. Grün, A. Lyubimova, L. Kester, K. Wiebrands, O. Basak, N. Sasaki, H. Clevers,
A. Van Oudenaarden, Single-cell messenger RNA sequencing reveals rare
Spatial transcriptomics is one such technology that preserves the spatial
intestinal cell types, Nature (2015), https://doi.org/10.1038/nature14966.
location of gene expression in cells during analysis, and when used with [15] S. Petropoulos, D. Edsgärd, B. Reinius, Q. Deng, S.P. Panula, S. Codeluppi,
scRNA-seq, is an excellent way to study tissue microenvironment. The A. Plaza Reyes, S. Linnarsson, R. Sandberg, F. Lanner, Single-cell RNA-seq reveals
possibilities of what can be achieved with this technology are countless. lineage and x chromosome dynamics in human preimplantation embryos, Cell
(2016), https://doi.org/10.1016/j.cell.2016.03.023.
[16] A.A. Tu, T.M. Gierahn, B. Monian, D.M. Morgan, N.K. Mehta, B. Ruiter, W.
6. Conclusion G. Shreffler, A.K. Shalek, J.C. Love, TCR sequencing paired with massively
parallel 3′ RNA-seq reveals clonotypic T cell signatures, Nat. Immunol. (2019),
https://doi.org/10.1038/s41590-019-0544-5.
Single-cell RNA sequencing technology has zoomed in on cellular [17] R.J. Miragaia, T. Gomes, A. Chomka, L. Jardine, A. Riedel, A.N. Hegazy,
biology like never before. This review objectively points out that single- N. Whibley, A. Tucci, X. Chen, I. Lindeman, G. Emerton, T. Krausgruber,
J. Shields, M. Haniffa, F. Powrie, S.A. Teichmann, Single-cell transcriptomics of
cell analysis is a multi-step process, and no one step is exclusive of the regulatory T cells reveals trajectories of tissue adaptation, Immunity (2019),
other. All the steps, only when carefully monitored in tandem, will give https://doi.org/10.1016/j.immuni.2019.01.001.
precise results. Leveraging SCT and multiple single-cell modalities at [18] M.J.T. Stubbington, T. Lönnberg, V. Proserpio, S. Clare, A.O. Speak, G. Dougan, S.
A. Teichmann, T cell fate and clonality inference from single-cell transcriptomes,
once bears a remarkable ingenuity in understanding complex cellular Nat. Methods (2016), https://doi.org/10.1038/nmeth.3800.
processes, capturing cellular heterogeneity, and disease states. It opens [19] A.K. Shalek, R. Satija, X. Adiconis, R.S. Gertner, J.T. Gaublomme,
new frontiers of research. Efforts to characterize all cells in a human R. Raychowdhury, S. Schwartz, N. Yosef, C. Malboeuf, D. Lu, J.J. Trombetta,
D. Gennert, A. Gnirke, A. Goren, N. Hacohen, J.Z. Levin, H. Park, A. Regev,
body, such as the Chan-Zuckerberg Initiative- Human Cell Atlas, serve as
Single-cell transcriptomics reveals bimodality in expression and splicing in
major reservoirs for researchers across several fields from biological immune cells, Nature (2013), https://doi.org/10.1038/nature12172.
sciences to computational sciences to come together and develop from. [20] J. Wagner, M.A. Rapsomaniki, S. Chevrier, T. Anzeneder, C. Langwieder,
A. Dykgers, M. Rees, A. Ramaswamy, S. Muenst, S.D. Soysal, A. Jacobs,
There is an increasing demand for better tools, techniques, analysis al
J. Windhager, K. Silina, M. van den Broek, K.J. Dedes, M. Rodríguez Martínez, W.
gorithms, and experimental validation measures that can rapidly P. Weber, B. Bodenmiller, A single-cell atlas of the tumor and immune ecosystem
materialize the vision of understanding biological processes at a single- of human breast cancer, Cell (2019), https://doi.org/10.1016/j.cell.2019.03.005.
cell resolution. [21] J.M. Granja, S. Klemm, L.M. McGinnis, A.S. Kathiria, A. Mezger, M.R. Corces,
B. Parks, E. Gars, M. Liedtke, G.X.Y. Zheng, H.Y. Chang, R. Majeti, W.J. Greenleaf,
Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype
acute leukemia, Nat. Biotechnol. (2019), https://doi.org/10.1038/s41587-019-
0332-7.
Declaration of Competing Interest
[22] C. Yao, H.W. Sun, N.E. Lacey, Y. Ji, E.A. Moseman, H.Y. Shih, E.F. Heuston,
M. Kirby, S. Anderson, J. Cheng, O. Khan, R. Handon, J. Reilley, J. Fioravanti,
The authors do not have any conflicts of interests to declare. J. Hu, S. Gossa, E.J. Wherry, L. Gattinoni, D.B. McGavern, J.J. O’Shea, P.
L. Schwartzberg, T. Wu, Single-cell RNA-seq reveals TOX as a key regulator of
CD8+ T cell persistence in chronic infection, Nat. Immunol. (2019), https://doi.
Acknowledgments org/10.1038/s41590-019-0403-4.
[23] S.M. Shaffer, M.C. Dunagin, S.R. Torborg, E.A. Torre, B. Emert, C. Krepler,
M. Beqiri, K. Sproesser, P.A. Brafford, M. Xiao, E. Eggan, I.N. Anastopoulos, C.
This work was supported by the project “Genetic Analysis of A. Vargas-Garcia, A. Singh, K.L. Nathanson, M. Herlyn, A. Raj, Rare cell
Dermatological Disorders” (BT/PR5402/BID/7/408/2012 dated: 6/7/ variability and drug-induced reprogramming as a mode of cancer drug resistance,
2017), Department of Biotechnology, Government of India. Nature (2017), https://doi.org/10.1038/nature22794.
[24] P. Yu, W. Lin, Single-cell transcriptome study as big data, Genom. Proteome.
Bioinform. (2016), https://doi.org/10.1016/j.gpb.2016.01.005.
References [25] J. Zheng, K. Wang, Emerging deep learning methods for single-cell RNA-seq data
analysis, Quant. Biol. (2019), https://doi.org/10.1007/s40484-019-0189-2.
[26] R. Petegrosso, Z. Li, R. Kuang, Machine learning and statistical methods for
[1] P. Angerer, L. Simon, S. Tritschler, F.A. Wolf, D. Fischer, F.J. Theis, Single cells
clustering single-cell RNA-sequencing data, Brief. Bioinform. (2019), https://doi.
make big data: new challenges and opportunities in transcriptomics, Curr. Opin.
org/10.1093/bib/bbz063.
Syst. Biol. (2017), https://doi.org/10.1016/j.coisb.2017.07.004.
[27] B. Vieth, S. Parekh, C. Ziegenhain, W. Enard, I. Hellmann, A systematic
[2] B. Hwang, J.H. Lee, D. Bang, Single-cell RNA sequencing technologies and
evaluation of single cell RNA-seq analysis pipelines, Nat. Commun. (2019),
bioinformatics pipelines, Exp. Mol. Med. (2018), https://doi.org/10.1038/
https://doi.org/10.1038/s41467-019-12266-7.
s12276-018-0071-8.
[28] G. Chen, B. Ning, T. Shi, Single-cell RNA-seq technologies and related
[3] S. Huang, Non-genetic heterogeneity of cells in development: more than just
computational data analysis, Front. Genet. (2019), https://doi.org/10.3389/
noise, Development (2009), https://doi.org/10.1242/dev.035139.
fgene.2019.00317.
[4] N. Li, H. Clevers, Coexistence of quiescent and active adult stem cells in
[29] K. Sheng, W. Cao, Y. Niu, Q. Deng, C. Zong, Effective detection of variation in
mammals, Science 80 (2010), https://doi.org/10.1126/science.1180794.
single-cell transcriptomes using MATQ-seq, Nat. Methods (2017), https://doi.
[5] B.D. Aevermann, M. Novotny, T. Bakken, J.A. Miller, A.D. Diehl, D. Osumi-
org/10.1038/nmeth.4145.
Sutherland, R.S. Lasken, E.S. Lein, R.H. Scheuermann, Cell type discovery using
[30] S. Picelli, Å.K. Björklund, O.R. Faridani, S. Sagasser, G. Winberg, R. Sandberg,
single-cell transcriptomics: implications for ontological representation, Hum.
Smart-seq2 for sensitive full-length transcriptome profiling in single cells, Nat.
Mol. Genet. (2018), https://doi.org/10.1093/hmg/ddy100.
Methods (2013), https://doi.org/10.1038/nmeth.2639.
[6] S. Linnarsson, S.A. Teichmann, Single-cell genomics: Coming of age, Genome
[31] L.D. Goldstein, Y.J.J. Chen, J. Dunne, A. Mir, H. Hubschle, J. Guillory, W. Yuan,
Biol. (2016), https://doi.org/10.1186/s13059-016-0960-x.
J. Zhang, J. Stinson, B. Jaiswal, K.B. Pahuja, I. Mann, T. Schaal, L. Chan,
[7] E. Shapiro, T. Biezuner, S. Linnarsson, Single-cell sequencing-based technologies
S. Anandakrishnan, C. Wah Lin, P. Espinoza, S. Husain, H. Shapiro,
will revolutionize whole-organism science, Nat. Rev. Genet. (2013), https://doi.
K. Swaminathan, S. Wei, M. Srinivasan, S. Seshagiri, Z. Modrusan, Massively
org/10.1038/nrg3542.
parallel nanowell-based single-cell gene expression profiling, BMC Genomics
[8] P. Angerer, L. Simon, S. Tritschler, F.A. Wolf, D. Fischer, F.J. Theis, Single cells
(2017), https://doi.org/10.1186/s12864-017-3893-1.
make big data: new challenges and opportunities in transcriptomics, Curr. Opin.
[32] X. Fan, X. Zhang, X. Wu, H. Guo, Y. Hu, F. Tang, Y. Huang, Single-cell RNA-seq
Syst. Biol. 4 (2017) 85–91, https://doi.org/10.1016/j.coisb.2017.07.004.
transcriptome analysis of linear and circular RNAs in mouse preimplantation
[9] G. Brady, M. Barbara, N.N. Iscove, Representative in vitro cDNA amplification
embryos, Genome Biol. (2015), https://doi.org/10.1186/s13059-015-0706-1.
from individual hemopoietic cells and colonies, Methods Mol. Cell. Biol. 2 (1990)
[33] S. Islam, U. Kjällquist, A. Moliner, P. Zajac, J.B. Fan, P. Lönnerberg, S. Linnarsson,
17–25, 08987750.
Characterization of the single-cell transcriptional landscape by highly multiplex
[10] J. Eberwine, H. Yeh, K. Miyashiro, Y. Cao, S. Nair, R. Finnell, M. Zettel,
RNA-seq, Genome Res. (2011), https://doi.org/10.1101/gr.110882.110.
P. Coleman, Analysis of gene expression in single live neurons, Proc. Natl. Acad.
[34] S. Islam, U. Kjällquist, A. Moliner, P. Zajac, J.B. Fan, P. Lönnerberg, S. Linnarsson,
Sci. U. S. A. (1992), https://doi.org/10.1073/pnas.89.7.3010.
Highly multiplexed and strand-specific single-cell RNA 5′ end sequencing, Nat.
[11] F. Tang, K. Lao, M.A. Surani, Development and applications of single-cell
Protoc. (2012), https://doi.org/10.1038/nprot.2012.022.
transcriptome analysis, Nat. Methods (2011), https://doi.org/10.1038/
[35] G.X.Y. Zheng, J.M. Terry, P. Belgrader, P. Ryvkin, Z.W. Bent, R. Wilson, S.
nmeth.1557.
B. Ziraldo, T.D. Wheeler, G.P. McDermott, J. Zhu, M.T. Gregory, J. Shuga,
[12] F. Tang, C. Barbacioru, Y. Wang, E. Nordman, C. Lee, N. Xu, X. Wang, J. Bodeau,
L. Montesclaros, J.G. Underwood, D.A. Masquelier, S.Y. Nishimura, M. Schnall-
B.B. Tuch, A. Siddiqui, K. Lao, M.A. Surani, mRNA-Seq whole-transcriptome
Levin, P.W. Wyatt, C.M. Hindson, R. Bharadwaj, A. Wong, K.D. Ness, L.W. Beppu,
analysis of a single cell, Nat. Methods (2009), https://doi.org/10.1038/
H.J. Deeg, C. McFarland, K.R. Loeb, W.J. Valente, N.G. Ericson, E.A. Stevens, J.
nmeth.1315.
P. Radich, T.S. Mikkelsen, B.J. Hindson, J.H. Bielas, Massively parallel digital
[13] D. Hebenstreit, Methods, challenges and potentials of single cell RNA-seq, Biology
(Basel) (2012), https://doi.org/10.3390/biology1030658.
616
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
transcriptional profiling of single cells, Nat. Commun. (2017), https://doi.org/ [62] C. Hafemeister, R. Satija, Normalization and variance stabilization of single-cell
10.1038/ncomms14049. RNA-seq data using regularized negative binomial regression, Genome Biol.
[36] D.M. DeLaughter, The use of the fluidigm C1 for RNA expression analyses of (2019), https://doi.org/10.1186/s13059-019-1874-1.
single cells, Curr. Protoc. Mol. Biol. (2018), https://doi.org/10.1002/cpmb.55. [63] R. Bacher, L.F. Chu, N. Leng, A.P. Gasch, J.A. Thomson, R.M. Stewart, M. Newton,
[37] E.Z. Macosko, A. Basu, R. Satija, J. Nemesh, K. Shekhar, M. Goldman, I. Tirosh, A. C. Kendziorski, SCnorm: robust normalization of single-cell RNA-seq data, Nat.
R. Bialas, N. Kamitaki, E.M. Martersteck, J.J. Trombetta, D.A. Weitz, J.R. Sanes, Methods (2017), https://doi.org/10.1038/nmeth.4263.
A.K. Shalek, A. Regev, S.A. McCarroll, Highly parallel genome-wide expression [64] W.V. Li, J.J. Li, An accurate and robust imputation method scImpute for single-
profiling of individual cells using nanoliter droplets, Cell (2015), https://doi.org/ cell RNA-seq data, Nat. Commun. (2018), https://doi.org/10.1038/s41467-018-
10.1016/j.cell.2015.05.002. 03405-7.
[38] A.M. Klein, L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V. Li, L. Peshkin, D. [65] D. van Dijk, R. Sharma, J. Nainys, K. Yim, P. Kathail, A.J. Carr, C. Burdziak, K.
A. Weitz, M.W. Kirschner, Droplet barcoding for single-cell transcriptomics R. Moon, C.L. Chaffer, D. Pattabiraman, B. Bierie, L. Mazutis, G. Wolf,
applied to embryonic stem cells, Cell (2015), https://doi.org/10.1016/j. S. Krishnaswamy, D. Pe’er, Recovering gene interactions from single-cell data
cell.2015.04.044. using data diffusion, Cell (2018), https://doi.org/10.1016/j.cell.2018.05.061.
[39] P. Hu, W. Zhang, H. Xin, G. Deng, Single cell isolation and analysis, Front. Cell [66] W. Gong, I.Y. Kwak, P. Pota, N. Koyano-Nakagawa, D.J. Garry, DrImpute:
Dev. Biol. (2016), https://doi.org/10.3389/fcell.2016.00116. imputing dropout events in single cell RNA sequencing data, BMC Bioinformatics
[40] A.E. Saliba, A.J. Westermann, S.A. Gorski, J. Vogel, Single-cell RNA-seq: advances (2018), https://doi.org/10.1186/s12859-018-2226-y.
and future challenges, Nucleic Acids Res. (2014), https://doi.org/10.1093/nar/ [67] M. Huang, J. Wang, E. Torre, H. Dueck, S. Shaffer, R. Bonasio, J.I. Murray, A. Raj,
gku555. M. Li, N.R. Zhang, SAVER: gene expression recovery for single-cell RNA
[41] V. Menon, Clustering single cells: a review of approaches on high-and low-depth sequencing, Nat. Methods (2018), https://doi.org/10.1038/s41592-018-0033-z.
single-cell RNA-seq data, Brief. Funct. Genomics (2018), https://doi.org/ [68] S. Tracy, G.C. Yuan, R. Dries, RESCUE: imputing dropout events in single-cell
10.1093/bfgp/elx044. RNA-sequencing data, BMC Bioinformatics (2019), https://doi.org/10.1186/
[42] M.S. Cembrowski, Single-cell transcriptomics as a framework and roadmap for s12859-019-2977-0.
understanding the brain, J. Neurosci. Methods (2019), https://doi.org/10.1016/j. [69] T. Peng, Q. Zhu, P. Yin, K. Tan, SCRABBLE: Single-cell RNA-seq imputation
jneumeth.2019.108353. constrained by bulk RNA-seq data, Genome Biol. (2019), https://doi.org/
[43] D.B. Weibel, G.M. Whitesides, Applications of microfluidics in chemical biology, 10.1186/s13059-019-1681-8.
Curr. Opin. Chem. Biol. (2006), https://doi.org/10.1016/j.cbpa.2006.10.016. [70] P. Ye, W. Ye, C. Ye, S. Li, L. Ye, G. Ji, X. Wu, scHinter: imputing dropout events for
[44] J.S. Marcus, W.F. Anderson, S.R. Quake, Microfluidic single-cell mRNA isolation single-cell RNA-seq data with limited sample size, Bioinformatics (2020), https://
and analysis, Anal. Chem. (2006), https://doi.org/10.1021/ac0519460. doi.org/10.1093/bioinformatics/btz627.
[45] X. Chen, S.A. Teichmann, K.B. Meyer, From tissues to cell types and back: single- [71] P. Qiu, Embracing the dropouts in single-cell RNA-seq data, bioRxiv (2018),
cell gene expression analysis of tissue architecture, Annu. Rev. Biomed. Data Sci. https://doi.org/10.1101/468025.
(2018), https://doi.org/10.1146/annurev-biodatasci-080917-013452. [72] V. Svensson, Droplet scRNA-seq is not zero-inflated, Nat. Biotechnol. (2020),
[46] C. Ziegenhain, B. Vieth, S. Parekh, B. Reinius, A. Guillaumet-Adkins, M. Smets, https://doi.org/10.1038/s41587-019-0379-5.
H. Leonhardt, H. Heyn, I. Hellmann, W. Enard, Comparative analysis of single-cell [73] W. Tang, F. Bertaux, P. Thomas, C. Stefanelli, M. Saint, S. Marguerat,
RNA sequencing methods, Mol. Cell (2017), https://doi.org/10.1016/j. V. Shahrezaei, BayNorm: bayesian gene expression recovery, imputation and
molcel.2017.01.023. normalization for single-cell RNA-sequencing data, Bioinformatics (2020),
[47] V. Espina, J.D. Wulfkuhle, V.S. Calvert, A. VanMeter, W. Zhou, G. Coukos, D. https://doi.org/10.1093/bioinformatics/btz726.
H. Geho, E.F. Petricoin, L.A. Liotta, Laser-capture microdissection, Nat. Protoc. [74] A.T.L. Lun, J.C. Marioni, Overcoming confounding plate effects in differential
(2006), https://doi.org/10.1038/nprot.2006.85. expression analyses of single-cell RNA-seq data, Biostatistics (2017), https://doi.
[48] S. Nichterwitz, G. Chen, J. Aguila Benitez, M. Yilmaz, H. Storvall, M. Cao, org/10.1093/biostatistics/kxw055.
R. Sandberg, Q. Deng, E. Hedlund, Laser capture microscopy coupled with smart- [75] M.B. Cole, D. Risso, A. Wagner, D. DeTomaso, J. Ngai, E. Purdom, S. Dudoit,
seq2 for precise spatial transcriptomic profiling, Nat. Commun. (2016), https:// N. Yosef, Performance assessment and selection of normalization procedures for
doi.org/10.1038/ncomms12139. single-cell RNA-seq, Cell Syst. (2019), https://doi.org/10.1016/j.
[49] T.M. Gierahn, M.H. Wadsworth, T.K. Hughes, B.D. Bryson, A. Butler, R. Satija, cels.2019.03.010.
S. Fortune, J. Christopher Love, A.K. Shalek, Seq-well: portable, low-cost rna [76] P.Y. Tung, J.D. Blischak, C.J. Hsiao, D.A. Knowles, J.E. Burnett, J.K. Pritchard,
sequencing of single cells at high throughput, Nat. Methods 14 (2017) 395–398, Y. Gilad, Batch effects and the effective design of single-cell gene expression
https://doi.org/10.1038/nmeth.4179. studies, Sci. Rep. (2017), https://doi.org/10.1038/srep39921.
[50] J. Cao, J.S. Packer, V. Ramani, D.A. Cusanovich, C. Huynh, R. Daza, X. Qiu, [77] W. Chen, S. Zhang, J. Williams, B. Ju, B. Shaner, J. Easton, G. Wu, X. Chen,
C. Lee, S.N. Furlan, F.J. Steemers, A. Adey, R.H. Waterston, C. Trapnell, A comparison of methods accounting for batch effects in differential expression
J. Shendure, Comprehensive single-cell transcriptional profiling of a multicellular analysis of UMI count based single cell RNA sequencing, Comput. Struct.
organism, Science 80 (2017), https://doi.org/10.1126/science.aam8940. Biotechnol. J. (2020), https://doi.org/10.1016/j.csbj.2020.03.026.
[51] A.B. Rosenberg, C.M. Roco, R.A. Muscat, A. Kuchina, P. Sample, Z. Yao, L. [78] R. Satija, J.A. Farrell, D. Gennert, A.F. Schier, A. Regev, Spatial reconstruction of
T. Graybuck, D.J. Peeler, S. Mukherjee, W. Chen, S.H. Pun, D.L. Sellers, B. Tasic, single-cell gene expression data, Nat. Biotechnol. (2015), https://doi.org/
G. Seelig, Single-cell profiling of the developing mouse brain and spinal cord with 10.1038/nbt.3192.
split-pool barcoding, Science 80 (360) (2018) 176–182, https://doi.org/10.1126/ [79] F.A. Wolf, P. Angerer, F.J. Theis, SCANPY: large-scale single-cell gene expression
science.aam8999. data analysis, Genome Biol. (2018), https://doi.org/10.1186/s13059-017-1382-
[52] B. Hwang, J.H. Lee, D. Bang, Single-cell RNA sequencing technologies and 0.
bioinformatics pipelines, Exp. Mol. Med. 50 (2018), https://doi.org/10.1038/ [80] S.R. Tyler, P.G. Rotti, X. Sun, Y. Yi, W. Xie, M.C. Winter, M.J. Flamme-Wiese, B.
s12276-018-0071-8. A. Tucker, R.F. Mullins, A.W. Norris, J.F. Engelhardt, PyMINEr finds gene and
[53] T. Kivioja, A. Vähärautio, K. Karlsson, M. Bonke, M. Enge, S. Linnarsson, autocrine-paracrine networks from human islet scRNA-seq, Cell Rep. (2019),
J. Taipale, Counting absolute numbers of molecules using unique molecular https://doi.org/10.1016/j.celrep.2019.01.063.
identifiers, Nat. Methods (2012), https://doi.org/10.1038/nmeth.1778. [81] V. Petukhov, J. Guo, N. Baryawno, N. Severe, D.T. Scadden, M.G. Samsonova, P.
[54] S. Islam, A. Zeisel, S. Joost, G. La Manno, P. Zajac, M. Kasper, P. Lönnerberg, V. Kharchenko, dropEst: pipeline for accurate estimation of molecular counts in
S. Linnarsson, Quantitative single-cell RNA-seq with unique molecular identifiers, droplet-based single-cell RNA-seq experiments, Genome Biol. (2018), https://doi.
Nat. Methods (2014), https://doi.org/10.1038/nmeth.2772. org/10.1186/s13059-018-1449-6.
[55] M.D. Luecken, F.J. Theis, Current best practices in single-cell RNA-seq analysis: a [82] R. Hillje, P.G. Pelicci, L. Luzi, Cerebro: interactive visualization of scRNA-seq
tutorial, Mol. Syst. Biol. (2019), https://doi.org/10.15252/msb.20188746. data, Bioinformatics (2019), https://doi.org/10.1093/bioinformatics/btz877.
[56] O. Stegle, S.A. Teichmann, J.C. Marioni, Computational and analytical challenges [83] K. Rue-Albrecht, F. Marini, C. Soneson, A.T.L. Lun, iSEE: interactive
in single-cell transcriptomics, Nat. Rev. Genet. (2015), https://doi.org/10.1038/ summarizedexperiment explorer, F1000Research (2018), https://doi.org/
nrg3833. 10.12688/f1000research.14966.1.
[57] T. Ilicic, J.K. Kim, A.A. Kolodziejczyk, F.O. Bagger, D.J. McCarthy, J.C. Marioni, [84] C. Arisdakessian, O. Poirion, B. Yunits, X. Zhu, L.X. Garmire, DeepImpute: an
S.A. Teichmann, Classification of low quality cells from single-cell RNA-seq data, accurate, fast, and scalable deep neural network method to impute single-cell
Genome Biol. (2016), https://doi.org/10.1186/s13059-016-0888-1. RNA-seq data, Genome Biol. (2019), https://doi.org/10.1186/s13059-019-1837-
[58] P. Brennecke, S. Anders, J.K. Kim, A.A. Kołodziejczyk, X. Zhang, V. Proserpio, 6.
B. Baying, V. Benes, S.A. Teichmann, J.C. Marioni, M.G. Heisler, Accounting for [85] T. Wang, T.S. Johnson, W. Shao, Z. Lu, B.R. Helm, J. Zhang, K. Huang,
technical noise in single-cell RNA-seq experiments, Nat. Methods (2013), https:// BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing
doi.org/10.1038/nmeth.2645. batch correction reveals hidden high-resolution cellular subtypes, Genome Biol.
[59] Chapter 6 Quality Control, Orchestrating Single-Cell Analysis with Bioconductor, (2019), https://doi.org/10.1186/s13059-019-1764-6.
(s.d.), https://osca.bioconductor.org/quality-control.html (accedit 6 agost 2020), [86] L. Haghverdi, A.T.L. Lun, M.D. Morgan, J.C. Marioni, Batch effects in single-cell
2020. RNA-sequencing data are corrected by matching mutual nearest neighbors, Nat.
[60] C.A. Vallejos, D. Risso, A. Scialdone, S. Dudoit, J.C. Marioni, Normalizing single- Biotechnol. (2018), https://doi.org/10.1038/nbt.4091.
cell RNA sequencing data: challenges and opportunities, Nat. Methods (2017), [87] E. Lin, S. Mukherjee, S. Kannan, A deep adversarial variational autoencoder
https://doi.org/10.1038/nmeth.4292. model for dimensionality reduction in single-cell RNA sequencing analysis, BMC
[61] L. Wen, F. Tang, How to catch rare cell types, Nature (2015), https://doi.org/ Bioinformatics 21 (2020) 64, https://doi.org/10.1186/s12859-020-3401-5.
10.1038/nature15204. [88] M. Amodio, D. van Dijk, K. Srinivasan, W.S. Chen, H. Mohsen, K.R. Moon,
A. Campbell, Y. Zhao, X. Wang, M. Venkataswamy, A. Desai, V. Ravi, P. Kumar,
R. Montgomery, G. Wolf, S. Krishnaswamy, Exploring single-cell data with deep
617
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
multitasking neural networks, Nat. Methods (2019), https://doi.org/10.1038/ heterogeneity in single-cell RNA sequencing data, Genome Biol. (2015), https://
s41592-019-0576-7. doi.org/10.1186/s13059-015-0844-5.
[89] V.Y. Kiselev, A. Yiu, M. Hemberg, Scmap: projection of single-cell RNA-seq data [116] P.V. Kharchenko, L. Silberstein, D.T. Scadden, Bayesian approach to single-cell
across data sets, Nat. Methods (2018), https://doi.org/10.1038/nmeth.4644. differential expression analysis, Nat. Methods (2014), https://doi.org/10.1038/
[90] H. Todorov, R. Cannoodt, W. Saelens, Y. Saeys, TinGa: fast and flexible trajectory nmeth.2967.
inference with growing neural gas, Bioinformatics (2020), https://doi.org/ [117] Y. Ma, S. Sun, X. Shang, E.T. Keller, M. Chen, X. Zhou, Integrative differential
10.1093/bioinformatics/btaa463. expression and gene set enrichment analysis using summary statistics for scRNA-
[91] Z. Liu, H. Lou, K. Xie, H. Wang, N. Chen, O.M. Aparicio, M.Q. Zhang, R. Jiang, seq studies, Nat. Commun. (2020), https://doi.org/10.1038/s41467-020-15298-
T. Chen, Reconstructing cell cycle pseudo time-series via single-cell transcriptome 6.
data, Nat. Commun. (2017), https://doi.org/10.1038/s41467-017-00039-z. [118] H.A. Pliner, J. Shendure, C. Trapnell, Supervised classification enables rapid
[92] R. Petegrosso, Z. Li, R. Kuang, Machine learning and statistical methods for annotation of cell atlases, Nat. Methods (2019), https://doi.org/10.1038/s41592-
clustering single-cell RNA-sequencing data, Brief. Bioinform. (2019), https://doi. 019-0535-3.
org/10.1093/bib/bbz063. [119] X. Shao, J. Liao, X. Lu, R. Xue, N. Ai, X. Fan, scCATCH: automatic annotation on
[93] P. Lin, M. Troup, J.W.K. Ho, CIDR: ultrafast and accurate clustering through cell types of Clusters from single-cell RNA sequencing data, iScience (2020),
imputation for single-cell RNA-seq data, Genome Biol. (2017), https://doi.org/ https://doi.org/10.1016/j.isci.2020.100882.
10.1186/s13059-017-1188-0. [120] L. Jiang, Rare cell type detection, en, Methods Mol. Biol. (2019), https://doi.org/
[94] A. Butler, P. Hoffman, P. Smibert, E. Papalexi, R. Satija, Integrating single-cell 10.1007/978-1-4939-9057-3_5.
transcriptomic data across different conditions, technologies, and species, Nat. [121] W. Saelens, R. Cannoodt, H. Todorov, Y. Saeys, A comparison of single-cell
Biotechnol. (2018), https://doi.org/10.1038/nbt.4096. trajectory inference methods, Nat. Biotechnol. (2019), https://doi.org/10.1038/
[95] L. McInnes, J. Healy, N. Saul, L. Großberger, UMAP: uniform manifold s41587-019-0071-9.
approximation and projection, J. Open Source Softw. (2018), https://doi.org/ [122] C. Trapnell, D. Cacchiarelli, J. Grimsby, P. Pokharel, S. Li, M. Morse, N.J. Lennon,
10.21105/joss.00861. K.J. Livak, T.S. Mikkelsen, J.L. Rinn, The dynamics and regulators of cell fate
[96] E. Pierson, C. Yau, ZIFA: dimensionality reduction for zero-inflated single-cell decisions are revealed by pseudotemporal ordering of single cells, Nat.
gene expression analysis, Genome Biol. (2015), https://doi.org/10.1186/s13059- Biotechnol. (2014), https://doi.org/10.1038/nbt.2859.
015-0805-z. [123] M. Setty, M.D. Tadmor, S. Reich-Zeliger, O. Angel, T.M. Salame, P. Kathail,
[97] D. Risso, F. Perraudeau, S. Gribkova, S. Dudoit, J.P. Vert, A general and flexible K. Choi, S. Bendall, N. Friedman, D. Pe’Er, Wishbone identifies bifurcating
method for signal extraction from single-cell RNA-seq data, Nat. Commun. developmental trajectories from single-cell data, Nat. Biotechnol. (2016), https://
(2018), https://doi.org/10.1038/s41467-017-02554-5. doi.org/10.1038/nbt.3569.
[98] Data Portal, Human Cell Atlas, (s.d.), https://www.humancellatlas.org/data-port [124] G. Giecold, E. Marco, S.P. Garcia, L. Trippa, G.C. Yuan, Robust lineage
al/ (accedit 18 març 2020), 2020. reconstruction from high-dimensional single-cell data, Nucleic Acids Res. (2016),
[99] Single Cell Portal, (s.d.). https://singlecell.broadinstitute.org/single_cell (accedit https://doi.org/10.1093/nar/gkw452.
18 març 2020). 2020. [125] J. Chen, L. Rénia, F. Ginhoux, Constructing cell lineages from single-cell
[100] Home, < Single Cell Expression Atlas < EMBL-EBI (s.d.). https://www.ebi.ac. transcriptomes, Mol. Asp. Med. (2018), https://doi.org/10.1016/j.
uk/gxa/sc/home (accedit 18 març 2020), 2020. mam.2017.10.004.
[101] Samples, PanglaoDB (s.d.). https://panglaodb.se/samples.html?species [126] K. Campbell, C. Yau, Ouija: incorporating prior knowledge in single-cell trajectory
=human&protocol=all protocols&sort=mostrecent (accedit 18 març 2020), learning using Bayesian nonlinear factor analysis, bioRxiv (2016), https://doi.
2020. org/10.1101/060442.
[102] M. Eugenio, R.L. Karp, G. Guo, P. Robson, A.H. Hart, L. Trippa, G.C. Yuan, [127] R. Gonzalo Parra, N. Papadopoulos, L. Ahumada-Arranz, J. El Kholtei,
Bifurcation analysis of single-cell gene expression data reveals epigenetic N. Mottelson, Y. Horokhovsky, B. Treutlein, J. Soeding, Reconstructing complex
landscape, Proc. Natl. Acad. Sci. U. S. A. (2014), https://doi.org/10.1073/ lineage trees from scRNA-seq data using MERLoT, Nucleic Acids Res. (2019),
pnas.1408993111. https://doi.org/10.1093/nar/gkz706.
[103] L. Yang, J. Liu, Q. Lu, A.D. Riggs, X. Wu, SAIC: an iterative clustering approach for [128] A. Scialdone, K.N. Natarajan, L.R. Saraiva, V. Proserpio, S.A. Teichmann,
analysis of single cell RNA-seq data, BMC Genomics (2017), https://doi.org/ O. Stegle, J.C. Marioni, F. Buettner, Computational assignment of cell-cycle stage
10.1186/s12864-017-4019-5. from single-cell transcriptome data, Methods (2015), https://doi.org/10.1016/j.
[104] H. Zhang, C.A.A. Lee, Z. Li, J.R. Garbe, C.R. Eide, R. Petegrosso, R. Kuang, ymeth.2015.06.021.
J. Tolar, A multitask clustering approach for single-cell RNA-seq analysis in [129] H. Matsumoto, H. Kiryu, C. Furusawa, M.S.H. Ko, S.B.H. Ko, N. Gouda,
recessive dystrophic epidermolysis bullosa, PLoS Comput. Biol. (2018), https:// T. Hayashi, I. Nikaido, SCODE: an efficient regulatory network inference
doi.org/10.1371/journal.pcbi.1006053. algorithm from single-cell RNA-Seq during differentiation, Bioinformatics. 33
[105] D.A. du Verle, S. Yotsukura, S. Nomura, H. Aburatani, K. Tsuda, CellTree: an R/ (2017) 2314–2321, https://doi.org/10.1093/bioinformatics/btx194.
bioconductor package to infer the hierarchical structure of cell populations from [130] T.N. Tran, G. Bader, Tempora: Cell Trajectory Inference Using Time-Series Single-
single-cell RNA-seq data, BMC Bioinformatics (2016), https://doi.org/10.1186/ Cell RNA Sequencing Data, bioRxiv, 2019, https://doi.org/10.1101/846907.
s12859-016-1175-6. [131] T. Wang, B. Li, C.E. Nelson, S. Nabavi, Comparative analysis of differential gene
[106] J.M. Zhang, J. Fan, H.C. Fan, D. Rosenfeld, D.N. Tse, An interpretable framework expression analysis tools for single-cell RNA sequencing data, BMC Bioinformatics
for clustering single-cell RNA-seq datasets, BMC Bioinformatics (2018), https:// (2019), https://doi.org/10.1186/s12859-019-2599-6.
doi.org/10.1186/s12859-018-2092-7. [132] K. Van den Berge, H. Roux de Bézieux, K. Street, W. Saelens, R. Cannoodt,
[107] V.A. Traag, L. Waltman, N.J. van Eck, From Louvain to Leiden: guaranteeing well- Y. Saeys, S. Dudoit, L. Clement, Trajectory-based differential expression analysis
connected communities, Sci. Rep. (2019), https://doi.org/10.1038/s41598-019- for single-cell sequencing data, Nat. Commun. (2020), https://doi.org/10.1038/
41695-z. s41467-020-14766-3.
[108] C. Xu, Z. Su, Identification of cell types from single-cell transcriptomes using a [133] S. Aibar, C.B. González-Blas, T. Moerman, V.A. Huynh-Thu, H. Imrichova,
novel clustering method, Bioinformatics (2015), https://doi.org/10.1093/ G. Hulselmans, F. Rambow, J.C. Marine, P. Geurts, J. Aerts, J. Van Den Oord, Z.
bioinformatics/btv088. K. Atak, J. Wouters, S. Aerts, SCENIC: single-cell regulatory network inference
[109] E. Azizi, A.J. Carr, G. Plitas, A.E. Cornish, C. Konopacki, S. Prabhakaran, and clustering, Nat. Methods (2017), https://doi.org/10.1038/nmeth.4463.
J. Nainys, K. Wu, V. Kiseliovas, M. Setty, K. Choi, R.M. Fromme, P. Dao, P. [134] T. Turki, Y.H. Taguchi, SCGRNs: Novel supervised inference of single-cell gene
T. McKenney, R.C. Wasti, K. Kadaveru, L. Mazutis, A.Y. Rudensky, D. Pe’er, regulatory networks of complex diseases, Comput. Biol. Med. (2020), https://doi.
Single-cell map of diverse immune phenotypes in the breast tumor org/10.1016/j.compbiomed.2020.103656.
microenvironment, Cell (2018), https://doi.org/10.1016/j.cell.2018.05.060. [135] A. Pratapa, A.P. Jalihal, J.N. Law, A. Bharadwaj, T.M. Murali, Benchmarking
[110] Z. Ji, H. Ji, TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA- algorithms for gene regulatory network inference from single-cell transcriptomic
seq analysis, Nucleic Acids Res. (2016), https://doi.org/10.1093/nar/gkw430. data, Nat. Methods (2020), https://doi.org/10.1038/s41592-019-0690-6.
[111] X. Qiu, Q. Mao, Y. Tang, L. Wang, R. Chawla, H.A. Pliner, C. Trapnell, Reversed [136] X. Qiu, A. Rahimzamani, L. Wang, B. Ren, Q. Mao, T. Durham, J.L. McFaline-
graph embedding resolves complex single-cell trajectories, Nat. Methods (2017), Figueroa, L. Saunders, C. Trapnell, S. Kannan, Inferring causal gene regulatory
https://doi.org/10.1038/nmeth.4402. networks from coupled single-cell expression dynamics using scribe, Cell Syst.
[112] T. Tian, J. Wan, Q. Song, Z. Wei, Clustering single-cell RNA-seq data with a (2020), https://doi.org/10.1016/j.cels.2020.02.003.
model-based deep learning approach, Nat. Mach. Intell. (2019), https://doi.org/ [137] P. Savas, B. Virassamy, C. Ye, A. Salim, C.P. Mintoff, F. Caramia, R. Salgado, D.
10.1038/s42256-019-0037-0. J. Byrne, Z.L. Teo, S. Dushyanthen, A. Byrne, L. Wein, S.J. Luen, C. Poliness, S.
[113] Y. Yang, R. Huh, H.W. Culpepper, Y. Lin, M.I. Love, Y. Li, SAFE-clustering: single- S. Nightingale, A.S. Skandarajah, D.E. Gyorki, C.M. Thornton, P.A. Beavis, S.
cell aggregated (from Ensemble) clustering for single-cell RNA-seq data, B. Fox, P.K. Darcy, T.P. Speed, L.K. MacKay, P.J. Neeson, S. Loi, Single-cell
Bioinformatics (2019), https://doi.org/10.1093/bioinformatics/bty793. profiling of breast cancer T cells reveals a tissue-resident memory subset
[114] A.W. Zhang, C.O. Flanagan, E.A. Chavez, J.L.P. Lim, N. Ceglia, A. Mcpherson, associated with improved prognosis, Nat. Med. (2018), https://doi.org/10.1038/
M. Wiens, P. Walters, T. Chan, B. Hewitson, D. Lai, A. Mottok, C. Sarkozy, s41591-018-0078-7.
L. Chong, T. Aoki, X. Wang, A.P. Weng, J.N. Mcalpine, S. Aparicio, C. Steidl, K. [138] E. Papalexi, R. Satija, Single-cell RNA sequencing to explore immune cell
R. Campbell, S.P. Shah, RNA-seq for tumor microenvironment profiling, Nat. heterogeneity, Nat. Rev. Immunol. (2018), https://doi.org/10.1038/nri.2017.76.
Methods (2019), https://doi.org/10.1038/s41592-019-0529-1. [139] X. Yu, Y.A. Chen, J.R. Conejo-Garcia, C.H. Chung, X. Wang, Estimation of immune
[115] G. Finak, A. McDavid, M. Yajima, J. Deng, V. Gersuk, A.K. Shalek, C.K. Slichter, H. cell content in tumor using single-cell RNA-seq reference data, BMC Cancer
W. Miller, M.J. McElrath, M. Prlic, P.S. Linsley, R. Gottardo, MAST: a flexible (2019), https://doi.org/10.1186/s12885-019-5927-3.
statistical framework for assessing transcriptional changes and characterizing [140] A.L. Roy, Transcriptional regulation in the immune system: one cell at a time,
Front. Immunol. 10 (2019) 1355, https://doi.org/10.3389/fimmu.2019.01355.
618
R. Nayak and Y. Hasija Genomics 113 (2021) 606–619
[141] M.L. Suvà, I. Tirosh, Single-cell RNA sequencing in cancer: lessons learned and [150] T.V. Lanz, A.K. Pröbstel, I. Mildenberger, M. Platten, L. Schirmer, Single-cell high-
emerging challenges, Mol. Cell (2019), https://doi.org/10.1016/j. throughput technologies in cerebrospinal fluid research and diagnostics, Front.
molcel.2019.05.003. Immunol. (2019), https://doi.org/10.3389/fimmu.2019.01302.
[142] R. Hou, E. Denisenko, H.T. Ong, J.A. Ramilowski, A.R.R. Forrest, Predicting cell- [151] E. Der, H. Suryawanshi, P. Morozov, M. Kustagi, B. Goilav, S. Ranabathou,
to-cell communication networks using NATMI, Nat. Commun. (2020), https:// P. Izmirly, R. Clancy, H.M. Belmont, M. Koenigsberg, M. Mokrzycki,
doi.org/10.1038/s41467-020-18873-z. H. Rominieki, J.A. Graham, J.P. Rocca, N. Bornkamp, N. Jordan, E. Schulte,
[143] I.C. Macaulay, V. Svensson, C. Labalette, L. Ferreira, F. Hamey, T. Voet, S. M. Wu, J. Pullman, K. Slowikowski, S. Raychaudhuri, J. Guthridge, J. James,
A. Teichmann, A. Cvejic, Single-cell rna-sequencing reveals a continuous J. Buyon, T. Tuschl, C. Putterman, J. Anolik, W. Apruzzese, A. Arazi, C. Berthier,
spectrum of differentiation in hematopoietic cells, Cell Rep. (2016), https://doi. M. Brenner, J. Buyon, R. Clancy, S. Connery, M. Cunningham, M. Dall’Era,
org/10.1016/j.celrep.2015.12.082. A. Davidson, E. Der, A. Fava, C. Fonseka, R. Furie, D. Goldman, R. Gupta,
[144] K.S. Yan, C.Y. Janda, J. Chang, G.X.Y. Zheng, K.A. Larkin, V.C. Luca, L.A. Chia, A. J. Guthridge, N. Hacohen, D. Hildeman, P. Hoover, R. Hsu, J. James, R. Kado,
T. Mah, A. Han, J.M. Terry, A. Ootani, K. Roelf, M. Lee, J. Yuan, X. Li, C.R. Bolen, K. Kalunian, D. Kamen, M. Kretzler, H. Maecker, E. Massarotti, W. McCune,
J. Wilhelmy, P.S. Davies, H. Ueno, R.J. Von Furstenberg, P. Belgrader, S. M. McMahon, M. Park, F. Payan-Schober, W. Pendergraft, M. Petri, M. Pichavant,
B. Ziraldo, H. Ordonez, S.J. Henning, M.H. Wong, M.P. Snyder, I.L. Weissman, A. C. Putterman, D. Rao, S. Raychaudhuri, K. Slowikowski, H. Suryawanshi,
J. Hsueh, T.S. Mikkelsen, K.C. Garcia, C.J. Kuo, Non-equivalence of Wnt and R- T. Tuschl, P. Utz, D. Waguespack, D. Wofsy, F. Zhang, Tubular cell and
spondin ligands during Lgr5 + intestinal stem-cell self-renewal, Nature (2017), keratinocyte single-cell transcriptomics applied to lupus nephritis reveal type I
https://doi.org/10.1038/nature22313. IFN and fibrosis relevant pathways, Nat. Immunol. (2019), https://doi.org/
[145] R.M.J. Genga, E.M. Kernfeld, K.M. Parsi, T.J. Parsons, M.J. Ziller, R. Maehr, 10.1038/s41590-019-0386-1.
Single-cell RNA-sequencing-based CRISPRi screening resolves molecular drivers [152] D.R. Gawel, J. Serra-Musach, S. Lilja, J. Aagesen, A. Arenas, B. Asking,
of early human endoderm development, Cell Rep. (2019), https://doi.org/ M. Bengnér, J. Björkander, S. Biggs, J. Ernerudh, H. Hjortswang, J.E. Karlsson,
10.1016/j.celrep.2019.03.076. M. Köpsen, E.J. Lee, A. Lentini, X. Li, M. Magnusson, D. Martínez-Enguita,
[146] S. Darmanis, S.A. Sloan, Y. Zhang, M. Enge, C. Caneda, L.M. Shuer, M.G. A. Matussek, C.E. Nestor, S. Schäfer, O. Seifert, C. Sonmez, H. Stjernman,
H. Gephart, B.A. Barres, S.R. Quake, A survey of human brain transcriptome A. Tjärnberg, S. Wu, K. Åkesson, A.K. Shalek, M. Stenmarker, H. Zhang,
diversity at the single cell level, Proc. Natl. Acad. Sci. U. S. A. (2015), https://doi. M. Gustafsson, M. Benson, A validated single-cell-based strategy to identify
org/10.1073/pnas.1507125112. diagnostic and therapeutic targets in complex diseases, Genome Med. (2019),
[147] Q. Mu, Y. Chen, J. Wang, Deciphering brain complexity using single-cell https://doi.org/10.1186/s13073-019-0657-3.
sequencing, Genom. Proteome. Bioinform. (2019), https://doi.org/10.1016/j. [153] A.K. Shalek, M. Benson, Single-cell analyses to tailor treatments, Sci. Transl. Med.
gpb.2018.07.007. (2017), https://doi.org/10.1126/scitranslmed.aan4730.
[148] A.C. Tolonen, R.J. Xavier, Dissecting the human microbiome with single-cell [154] A.J. Wilk, A. Rustagi, N.Q. Zhao, J. Roque, G.J. Martínez-Colón, J.L. McKechnie,
genomics, Genome Med. (2017), https://doi.org/10.1186/s13073-017-0448-7. G.T. Ivison, T. Ranganath, R. Vergara, T. Hollis, L.J. Simpson, P. Grant,
[149] P.M. Strzelecka, A.M. Ranzoni, A. Cvejic, Dissecting human disease with single- A. Subramanian, A.J. Rogers, C.A. Blish, A single-cell atlas of the peripheral
cell omics: application in model systems and in the clinic, DMM Dis. Model. Mech. immune response in patients with severe COVID-19, Nat. Med. 26 (2020)
(2018), https://doi.org/10.1242/dmm.036525. 1070–1076, https://doi.org/10.1038/s41591-020-0944-y.
619