1 s2.0 S0168165623000123 Main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Journal of Biotechnology 364 (2023) 13–22

Contents lists available at ScienceDirect

Journal of Biotechnology
journal homepage: www.elsevier.com/locate/jbiotec

Single-cell RNA sequencing reveals homogeneous transcriptome patterns


and low variance in a suspension CHO-K1 and an adherent HEK293FT cell
line in culture conditions☆
Giulia Borsi a, Krishna Motheramgari b, Heena Dhiman b, Martina Baumann b, Elly Sinkala c,
Max Sauerland c, Julian Riba c, Nicole Borth a, *
a
BOKU University of Natural Resources and Life Sciences, Institute of Animal Cell Technology and Systems Biology, Muthgasse 18, 1190, Vienna, Austria
b
Austrian Centre of Industrial Biotechnology (acib GmbH), Muthgasse 11, 1190, Vienna, Austria
c
CYTENA GmbH, Germany

A R T I C L E I N F O A B S T R A C T

Keywords: Recombinant mammalian host cell lines, in particular CHO and HEK293 cells, are used for the industrial pro­
CHO cells duction of therapeutic proteins. Despite their well-known genomic instability, the control mechanisms that
Single-cell RNA-seq enable cells to respond to changes in the environmental conditions are not yet fully understood, nor do we have a
Cell-to-cell variation
good understanding of the factors that lead to phenotypic shifts in long-term cultures. A contributing factor could
Gene expression
be inherent diversity in transcriptomes within a population. In this study, we used a full-length coverage single-
Culture condition
cell RNA sequencing (scRNA-seq) approach to investigate and compare cell-to-cell variability and the impact of
standardized and homogenous culture conditions on the diversity of individual cell transcriptomes, comparing
suspension CHO-K1 and adherent HEK293FT cells. Our data showed a critical batch effect from the sequencing of
four 96-well plates of CHO-K1 single cells stored for different periods of time, which was and may be therefore
identified as a technical variable to consider in experimental planning. Besides, in an artificial and controlled
culture environment such as used in routine cell culture technology, the gene expression pattern of a given
population does not reveal any marker gene capable to disclose relevant cell population substructures, both for
CHO-K1 cells and for HEK293FT cells. The variation observed is primarily driven by the cell cycle.

1. Introduction molecular mechanisms that underlie it (Bajić and Poyatos, 2012; Blake
et al., 2006; Munsky et al., 2012). The age of a cell or its differentiation
The transcriptome, among a broad set of variables, is one of the most status, different environmental stimuli, the physiological conditions,
relevant decisive factors involved in the determination of the cellular and random chance all contribute to modulated gene expression levels
phenotype (Kim and Eberwine, 2010). Studies have examined the as reflected in a cell’s transcriptome (Kim and Eberwine, 2010). In the
variation in gene expression levels in single cells with an assortment of context of recombinant cell lines used for industrial production of bio­
methods (Alessio et al., 2020; Battich et al., 2015; Cornelison and Wold, therapeutics, this can result in significant alterations in cell behavior and
1997; Miyashiro et al., 1994), considering mainly tissues in fields such as productivity of producer cell lines (Pilbrough et al., 2009; Weinguny
developmental biology (Scialdone et al., 2016; Treutlein et al., 2014) et al., 2021). On the other hand, while tissues typically consist of mul­
neuroscience (La Manno et al., 2016; Marques et al., 2016), cancer tiple types of cells, a selected subclone is used for protein manufacturing,
(Patel et al., 2014; Tirosh et al., 2016), or immunology (Mahata et al., Thus one would expect a lower divergence and variation in tran­
2014; Shalek et al., 2013). Research using single-cell RNA sequencing scriptome patterns under such conditions.
has been able to reveal the diversity of transcriptomes within the indi­ Amid mammalian expression systems (Gils et al., 2017), Chinese
vidual cells of such tissue derived samples (Buckley et al., 2011; Raser hamster ovary (CHO) cells are the most preferred ones for the produc­
and O’Shea, 2005), and has aimed at generating an understanding of the tion of biotherapeutics, owing to their manufacturing adaptability and


Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
* Corresponding author.
E-mail address: [email protected] (N. Borth).

https://doi.org/10.1016/j.jbiotec.2023.01.006
Received 3 May 2022; Received in revised form 15 January 2023; Accepted 21 January 2023
Available online 26 January 2023
0168-1656/© 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
G. Borsi et al. Journal of Biotechnology 364 (2023) 13–22

their specific glycan pattern which is non-immunogenic to humans and 2. Materials and methods
compatible with their clinical use (Dorner et al., 1987). The second most
frequently used cell line is HEK293 cells, which are more prominent, 2.1. Cell lines and cell culture conditions (CHO-K1 and HEK293FT)
however, in transient rather than stable recombinant production ap­
proaches. Both cell types require adequate growth conditions to Suspension CHO-K1 cells (ECACC 85051005, adapted in-house to
generate and secrete sufficient amounts of the appropriate recombinant grow in suspension) were cultured in CD-CHO (Thermo Fisher Gibco)
protein, and they are well known to be prone to inherent genomic supplemented with 8 mM Glutamine (Sigma-Aldrich) and 1 mL of Anti
instability (Barnes et al., 2003; Dahodwala and Lee, 2019). Control Clumping Agent (Thermo Fisher) per 500 mL of media. Cells were
mechanisms that can trigger shifts in response to environmental con­ cultured in 125 mL Erlenmeyer cell culture flasks shaking at 120 RPM
ditions are still not well understood and most transcriptome studies so with a 125 mm diameter in a Kühner incubator maintained at 37 ◦ C, 5%
far have been comparative, looking at the difference between different CO2, and 85% humidity and were passaged twice a week to 2 × 105
environmental states, exponential and stationary phase, or defined cell cells/mL. HEK293FT cells (Thermo Fisher) were cultured in DMEM
samples at population level (Clarke et al., 2011; Colak et al., 2020; high-glucose (Sigma-Aldrich) supplemented with 10% fetal bovine
Doolan et al., 2013; Hernandez et al., 2019; Hsu et al., 2017; Sudhagar serum (Gibco) and 1% of the following concentrates: 10 mM NEAA
et al., 2018). The emergence of single-cell RNA sequencing techniques (Gibco), 200 mM GlutaMAX (Gibco), and 10,000 U/mL Penicillin-
now offers new tools to explore gene expression profiles at the resolution Streptomycin (Gibco) and were cultured at 37 ◦ C at 5% CO2 in a static
of single-cells (Gaiti et al., 2019; Nagano et al., 2017; Nowotschin et al., incubator.
2019; Tanay and Regev, 2017) which, if performed at sufficient
coverage and resolution, can open up a more detailed understanding of 2.2. Cell harvesting, preparation and single-cell dispensing
the source of variation in cell lines and subclones, be it genomic,
epigenetic, driven by culture conditions or stochastic. Prior to single-cell isolation, an aliquot of CHO-K1 cells was spun
Recently, first studies have become available that explore the het­ down and the spent supernatant was removed. Cells were washed once
erogeneity of the transcriptome and the cell-to-cell variability of CHO with PBS and resuspended in PBS at a concentration of 1 × 106 cells/mL.
cell lines, whether in suspension or adherent culture, as used in the Cell viability was determined and only samples that had a cell viability
biotherapeutic industry. In the study conducted by Ogata (Ogata et al., > 90% were processed. The cell suspension remained on ice prior to use.
2021), according to the data collected from short gene coverage 50 µl of the cell suspension was added to a dispensing cartridge and
sequencing, the single-cell transcriptome analysis in CHO-K1 suspension placed on a single-cell dispenser (SCP or F.SIGHT) (CYTENA) for cell
cultures did not allow the identification of clear substructures - except dispensing. Sample preparation for HEK293 was identical to CHO-K1
for the hidden populations with low and high expression of enolase - and except that cells were harvested with TrypLE (Gibco) prior to the PBS
clusters dependent on cell cycle. However, it must be specified that, wash steps. TrypLE (Gibco) is a recombinant enzyme utilized to detach
unlike our analysis where we estimated the cell cycle directly from the and dissociate adherent mammalian cells. For the CHO-K1 cells, during
transcriptome, the cell cycle annotation of the single-cell samples used dispensing, the selection criteria were a cell roundness of 0.5 – 1 and a
in Ogata’s work was estimated by staining with the Hoechst method. In cell size of 10–30 µm. The HEK293 had similar roundness criteria with a
Tzani et al., 2021 an mRNA Whole Transcriptome Analysis (WTA) in cell size of 10–25 µm. The roundness describes how much a cell tends to
CHO-K1 suspension culture was performed and cell cycle dependent approximate a sphere and it can be quantified from two-dimensional
subpopulations were not detected and the heterogeneity of the cell pictures of a cell, capturing overall the structural shape. Size is the
population seems to increase over time - which would indicate average diameter of the minor and major half axes of the observed
stress-related divergence (Tzani et al., 2021). The correlation between sphere.
gene expression and the cell cycle phase of adherent CHO cells was
however identified during the analyses of CHO adherent cultures per­ 2.3. Library preparation and sequencing
forming a full-length gene coverage sequencing of cells sorted according
to their cell cycle state (Ogata et al., 2021). Furthermore, the single-cell Library preparation for the CHO-K1 single-cells was performed as
study of Tzani et al., 2021 sheds light on the phenotypic and tran­ described in Picelli et al. (2014) using the SMART-Seq v2 protocol. The I.
scriptomic drift of a clonally-derived protein-producing CHO cell line as DOT liquid handler was used to perform scRNA-seq library prep with
production instability of the process and cell line advance over time. 1/10th of the original reaction volume of Takara Bio’s SMART-seq v4
From the data presented in the two above discussed studies, it would Ultra Low Input RNA Kit. Detailed information is available as supple­
appear that, in comparison to tissue samples, the transcriptomes of cells mentary materials.
cultured under controlled and homogenous in-vitro conditions are far
more homogeneous. To further look into this we here present data from 2.4. Data preprocessing
141 individual cells from a CHO-K1 cell line suspension culture and 61
single-cells from HEK293FT cells grown in adherent culture, using Cutadapt v2.0 (Martin, 2011) was used to process raw scRNA-seq
single-cell RNA sequencing (scRNASeq) with the Smart-seq2 (SS2) v2 reads, with a minimum read length cut-off of 20, maximum allowed
protocol (Picelli et al., 2014) and SMART-Seq (Ramsköld et al., 2012). error rate of 0.05. Mapping quality was visually inspected using FastQC
Both protocols are full-length gene coverage approaches that capture the v0.11.5 on the BAM files and summarized using multiQC (Ewels et al.,
entire transcriptome. Our results indicate that in an artificial and 2016).
controlled culture environment such as used in routine cell culture
technology, the gene expression pattern of a given population is 2.5. scRNA-seq CHO-K1 and HEK: mapping and normalisation
discreetly homogeneous. Some slight difference in heterogeneity was
observed between suspension and adherent cultures. The higher tran­ scRNA-seq CHO-K1 reads were aligned using Hisat2 v.2.1.0 in single-
scriptome homogeneity of suspension culture might be due to the better end mode mapped to the Chinese hamster genome assembly (CriGri-
mixing in shaken suspension culture and to local density, and thus ef­ PICR, RefSeq Accession: GCF 003668045.1); scRNA-seq HEK pre­
fects in the adherent population. Despite the transcriptome comparative processed reads were aligned using Hisat2 v.2.1.0 in single-end mode
analyses carried out, we cannot exclude that the modest number of and mapped to the Human genome assembly (GCRh37, RefSeq Acces­
samples used for this study might be the source of homogeneity of the sion: GCF 000001405.1). The reads were mapped using -U flag specific
gene expression pattern and therefore favor the generation of larger for unpaired reads, samtools view -bS flag specified the output files as
numbers in future studies. BAM format, and the reads were sorted by coordinates using samtools

14
G. Borsi et al. Journal of Biotechnology 364 (2023) 13–22

sort (v1.9). Mapping percentages and expression quantification were significative differences in the common quality control metrics of the
calculated by HTSeq-count v.0.11.0. The normalisation was performed raw sequencing data (Suppl. 3). The samples belonging to plates 1 and 2
using scran v.1.12.1 and scater v.1.20.1 in R v.3.6.3 following the show a higher library size (Suppl. 3 A), an increased count of expressed
deconvolution approach from Lun et al., 2016. This method divides cells genes (Suppl. 3B) and a lower proportion of reads mapped to spike-in
into pools, normalises across cells in each pool, then uses the resulting transcripts (Suppl. 3 C): as the same amount of ERCC has been added
system of linear equations to define individual cell-based factors (Lytal to each cell, any enrichment in spike-in counts is symptomatic of a loss
et al., 2020). For the CHO dataset, we calculated the size factors for of endogenous RNA. The inferior quality of the sequencing data of plates
endogenous genes using the computeSumFactors function, then we 3 and 4 are shown on the criteria of library size (Suppl. 3D), the number
computed a separate set of size factors for the ERCC transcripts using the of expressed genes (Suppl 3E), and the ERCC proportion (Suppl. 3 F)
computeSpikesFactors function. Using the logNormCount function we even after the identification and the filtering out of the lowest quality
generated the normalised log-expression values. Each value is defined as cells.
the log ratio of each count to the size factor for the corresponding cell, We additionally analysed the two pairs of CHO-K1 plates separately.
after adding a prior count of 1 to prevent zero count values (Lun et al., The respective PCAs show minimal batch effects. PC1 and PC2 of plates
2016). For the HEK dataset, we performed the same normalisation, but 1 and 2 shift from explaining 17% and 3% of the variance before
without computing the size factors of the ERCC transcripts, since they removing the batch effect (Suppl. 2 C and 2D) to describe the 15% and
are not present in the samples. 3% of variance after removing it (Suppl. 4B and 4 C). That most of the
biological variance within the population is mainly the result of the cell
2.6. Computational workflow in R cycle is also expressed by the density plot in Suppl. 4 A, where the plate
effect is no longer the technical factor influencing the distribution of the
The statistical analyses, detection of Highly Variable Genes (HVGs) population. This last assumption is also valid for plates 3 and 4, as shown
and clustering analysis were performed in R v.3.6.3, using the packages from the density plot in Suppl. 4D. We observed a decrease in the
SingleCellExperiment v.1.6.0, limma v.3.40.6, scater v.1.12.2, scran variance explained by the PCs, from 31% (PC1) and 2% (PC2) before
v.1.12.1, ggplot2 v.3.3.2, Seurat v.3.2.2 and following the workflow from removing the batch effect (Suppl. 2E and 2 F) to 24% (PC1) and 2%
Lun et al. (2016). Detailed descriptions of the pipeline and related plots (PC2) after removing (Suppl. 4E and 4 F).
are presented as supplementary materials. Given the high and systematic technical differences between the two
batches of sequencing plates, we decided to proceed only with the data
2.7. Classification of cell cycle phase of the first two plates for CHO-K1 and compared them with those
generated from HEK293FT cells grown adherent in FCS containing
The cell cycle phase of each cell was estimated from the tran­ medium.
scriptome using the CellCycleScoring function from Seurat v.3.3.2 and The HEK293 scRNA dataset is composed of 12816 expressed genes
stored as metadata (Stuart et al., 2019). This function assigns each cell a which is comparable to the number of genes from the CHO-K1 set. In this
phase score, based on the expression of G2/M and S phase markers. The dataset, derived from a single plate and sequencing run, the PC1
cells that are not expressing any of those markers are annotated as the calculated prior to corrections describes 13% of the total variance
ones in G1 phase, not cycling. The scoring strategy is based on the set of (Suppl. 5B) and the cells show a pattern that followed the cell cycle stage
43 G1/S and 55 G2/M phase marker genes from the study of Tirosh et al. in which they have been annotated, even though again no clear sub­
(2016) (Supplements 1 H and 1I). population was revealed: the blue curve plotted in the density plot
(Suppl. 5 A) suggests that most of the variance, which is minimal, is
3. Results biologically due to the cell cycle phase.

3.1. Data generation and processing 3.2. Identification of highly variable genes in CHO-K1 (Plates 1 and 2)
and HEK293 datasets
For the CHO-K1 cell line, 4 plates were printed, of which two were
immediately sent for sequencing. The other two plates were stored at − As previously explained in paragraph 3.1 and shown in Suppl. 3, we
80◦ C and sequenced 6 months later, after preliminary analysis of the decided to proceed with the downstream analysis of 141 CHO-K1 single
initial dataset. We noticed a significant bias of plate effect, presumably cells from only plates 1 and 2. This selection is functional to avoid any
due to the longer storage times of plates 3 and 4. Specifically, the CHO- technical bias caused by the over-storage of the second sequencing batch
K1 single-cell dataset contains 13,334 genes. For each gene, we plotted of the CHO-K1 samples. We identified the highly variable genes (HVGs)
the percentage of the normalised log-expression values across cells that of both CHO-K1 and HEK293 cell lines separately, to investigate the
is explained by i) the effect of the plate, ii) log-transformed External respective source of variation. Subsequently, the corresponding sets of
RNA Control Consortium (ERCC) counts and iii) cell cycle phase (Supp. HVGs underwent a Gene Ontology (GO) enrichment analysis.
2 A). ERCC are exogenous spike-ins, namely a set of polyadenylated For CHO cells, after normalising the expression profiles by log
transcripts that mimic natural eukaryotic mRNAs (Baker et al., 2005, transformation, we modeled the squared coefficient of variation (CV2)
2005; Devonshire et al., 2010) which enables to compare the accuracy of for each gene using the modelGeneCV2 function, which fits a trend to
the quantification of RNA levels, and to establish trustworthy experi­ assess the mean-variance relationship across them. We only considered a
mental results: cells with a high level of spike-in RNAs have low starting gene to be highly variable if it had a false discovery rate (FDR) less than
amount of RNA, likely due to the cell being dead or stressed. The plate 0.01 and a mean normalised expression per gene greater than 6 resulting
effect is the technical factor from sample handling and sequencing that in 622 HVGs in total (Fig. 1A). We applied dimensionality reduction to
can considerably influence the heterogeneity of gene expression in such visualize the relationship between cells constructing a t-distributed
a way as to increase variance, cause false correlations or bring out stochastic neighbor embedding (t-SNE) plot with a perplexity of 30 from
plate-specific biases. Each curve corresponds to one factor and repre­ the normalised log-expression values of the HVGs. Constructing the
sents the distribution of percentages across all genes: the majority of the dimensionality reduction plots from the HVGs facilitate resolving the
variance (~ 40%) is explained by the plate of origin and only a minor cell population into substructures, if any. Noticeably dissimilar cells
percentage of variance (< 10%) seems to be related to the cell cycle arranged in space to form clusters are not detected. Conversely, the low
phase, which is also shown in Suppl. 2B, where cells are clearly spread in dimensional and non-linear representations as in the t-SNE (Fig. 1B)
the first PC according to the plates the samples were generated from. arranged the transcriptome samples in such a way that a pattern of
The impact of prolonged storage time is also supported by the higher density of cells in G2M phase, labelled according to the cell-cycle

15
G. Borsi et al. Journal of Biotechnology 364 (2023) 13–22

Fig. 1. Mean-dependent trend fitted to the squared coefficient of variation for each gene in the CHO dataset A) and in the HEK dataset D). Blue dots represent genes
with FDR < 0.01 and mean < 6. Red dots represent genes with FDR < 0.01 and mean > 6. t-SNE from normalised log-expression values of HVGs in CHO dataset B)
and in HEK dataset E), before cell cycle regression. t-SNE after regressing out cell cycle effect in CHO C) and HEK F). Each point is a cell and is coloured by its inferred
cell cycle annotation.

Fig. 2. Top 20 biological enriched pathways associated to HVGs of CHO-K1 A) and HEK293 datasets B). Top 20 biological enriched pathways associated to the HVGs
of CHO-K1 C) and HEK293 D) datasets deprived of HVGs related to the cell cycle progression that are listed in Suppl. 4. All the adjusted statistically significant values
of the terms were negative 10-base log transformed. GO enrichment analysis of HVGs was retrieved using gProfiler.

16
G. Borsi et al. Journal of Biotechnology 364 (2023) 13–22

annotation and regression, was observable. are categorized and defined.


In the HEK dataset 497 HVGs that have FDR less than 0.01 and a
mean normalised expression per gene greater than 6 (Fig. 1D) were 3.4. Pathway enrichment analysis of HVGs in common between CHO-K1
detected. As reported for the CHO-K1 dataset, in the t-SNE (Fig. 1E) cells and HEK dataset
are spread in such a way as to visually capture the distribution according
to the progression through the cell cycle: G2M cells are closer to each We applied dimensionality reduction techniques to visualize the re­
other, as are those in S or G1 phase. To improve the resolution of other lationships between CHO-K1 and HEK293 cell lines. This was done by
biological processes beyond the variation due to the cell cycle and to projecting the transcriptome samples together into a t-SNE plot
survey the presence of stronger factors that can drive the distribution of (Fig. 3E), which located the two cell lines far apart from each other,
the transcriptome in the low dimensional space, we regressed out the confirming a marked segregation between the cell lines. This result is
cell cycle phase for each cell line. We utilized the removebatcheffect consistent with the lack of similarities within two cell lines of different
function from limma (Ritchie et al., 2015). The linear model treats each species. However, this natural transcriptome divergence among CHO-K1
phase as a separate batch adjusting the uninteresting phase factor to zero and HEK293 was minimized when the t-SNE plots were constructed only
and then estimates the observations without the phase effect. t-SNE of with the 53 HVGs that were found to be shared between the two cell
CHO-K1 (Fig. 1 C) and of HEK (Fig. 1 F) after the cell cycle correction lines (Fig. 3F).
show that the samples are distributed in a more homogeneous way, The bar plot in Fig. 3 A represents the results of the GO enrichment
regardless the cell cycle phase annotation. analysis on the set of HVGs that are shared among the cell line. The
enriched pathways were related mainly to ‘chromosome segregation’,
3.3. Pathway enrichment analyses of CHO-K1 and HEK HVGs ‘nuclear division, ‘cell cycle process’, ‘spindle organization’, and ‘regu­
lation of mitotic metaphase/anaphase transition’ in terms of biological
The Gene Ontology (GO) enrichment analyses were conducted to process. The enriched pathways carried out by the GO enrichment
explore the functional characteristics of the HVGs, first including and analysis of the HVGs unique of the CHO-K1 and HEK datasets (Figs. 3B
then excluding the high variable genes related to cell cycle from each and 3 C) are comparable to the results of the corresponding HVGs
dataset. The GO analysis results revealed that the CHO-K1 HVGs were deprived of the cell cycle genes (Fig. 2A and 2B).
significantly enriched in ‘cell division’, ‘cell cycle’, ‘mitotic cell cycle
process’, ‘cell migration, ‘regulation of molecular function’, ‘cell 3.5. Clustering analysis reveals homogeneous populations
migration’, ‘cellular response to chemical stimulus’, ‘positive regulation
of cellular process’ and ‘regulation of developmental process’ in terms of Clustering analysis of CHO-K1 cells exhibits three distinct clusters
biological process as represented in Fig. 2A. Regarding the HEK HVGs, (Fig. 4 A). Both the FDR and the summary.logFC metrics of the top 10
significant enrichment was observed especially in ‘nuclear chromosome candidate marker genes for each cluster (Suppl. 7, Table 1,2 and 3) are
segregation, ‘mitotic sister chromatid segregation’, ‘nuclear division’, not statistically significant (FDR too large and/or summary.logFC too
‘mitotic cell cycle’, ‘cell division’ and ‘regulation of cell cycle’ as shown small). For this reason, we are not able to consider them authentic
in Fig. 2B. marker genes, and therefore meaningful enough to drive any pertinent
The results of the GO enrichment analysis are consistent with the intra-cluster heterogeneity. Figs. 4B, 4C, and 4D show violin plots of the
orientation of the cells of both datasets to distribute themselves ac­ 6 top-ranked candidate marker genes for CHO-K1 clusters 1, 2, and 3
cording to their progression through the cell cycle in the dimensionality respectively. As we can see in Fig. 4B, the distribution of the normalised
reduction visualizations (Fig. 1B and E). Hence, it is not unexpected that log-expression values of the top 6 candidate marker genes of Cluster 1 is
most of the variance in the transcriptome is mainly related to pathways equivalent to the expression values that the same genes have in Cluster 2
like cell division, chromatid segregation, or DNA replication. In order to and 3. The same expression distribution is valid for the top 6 markers of
evaluate the most enriched pathways that lie hidden underneath those Cluster 2 when compared to Cluster 1 and 3 (Fig. 4C), and the ones of
related to the cell division, we removed the genes that were used for cell Cluster 3 when compared to Cluster 1 and 2 (Fig. 4D). The 6 top-ranked
cycle annotation (Tirosh et al., 2016) from the set of HVGs (Suppl. 4). candidate marker genes do not express strong differences in the pro­
For the remaining HVGs, a new GO enrichment analysis was per­ portion of expressing cells in each cluster compared to the others
formed. In the CHO-K1 dataset enriched pathways are related to ‘cellular (Fig. 4B, C and D). Unlike the CHO-K1 dataset, the t-SNE in Fig. 5A in­
response to chemical stimulus’, ‘system development’, ‘cell migration’, dicates that the HEK293 has been divided into two distinct clusters.
‘negative regulation of cellular biological process’, ‘regulation of mo­ Tables 4 and 5 (Suppl.7) report the top 10 genes with the lowest p-value
lecular function’, ‘localization of cell’, ‘cell motility’, and ‘regulation of for each comparison within the cell population. In cluster 1, the FDR and
phosphorylation’ in terms of biological process as represented in Fig. 2C. the summary.logFC metrics of each top gene are not statistically sig­
For the HEK HVGs, significant enrichment was observed especially in nificant, and as shown from the violin plots in Fig. 5B, the distribution of
‘cellular metabolic process’, ‘nitrogen compound metabolic process’, the normalised log-expression values of the top 6 candidate marker
‘protein modification process’, ‘chromosome segregation’, ‘transcription genes of Cluster 1 is analogous to the expression values that the same
by RNA polymerase II’, and ‘heterocycle metabolic process’ pathways genes exhibit in Cluster 2. Therefore they cannot be considered marker
(Fig. 2D). candidates. We identified two marker genes in cluster 2: Aurkb (FDR =
The removal of highly variable cell cycle markers from the list of 0.034 and summary.logFC = 2.684) and Cep55 ( FDR = 0.041 and
HVGs detected from the CHO-K1 and HEK datasets identified enriched summary.logFC = 3.098). The expression distribution of these two genes
pathways hidden below these dominant HGVs. Removal of these cell in Cluster 2 is not comparable to the distribution expressed in Cluster 1,
cycle markers showed that the dominant HVGs are instead pathway and this is visually detectable from the violin plots in Fig. 5 C.
related to regulation metabolic processes and transcription by RNA
polymerase II in HEK293 cells, while in CHO-K1 cells, pathways asso­ 4. Discussion
ciated with "cellular response to chemical stimulus" and "negative
regulation of biological processes" were significant. Manual inspection In this study, we explored the gene expression pattern of two cell
and visualization of the ontologies found in CHO-K1 showed that ‘pos­ lines, CHO-K1 and HEK293FT, under controlled culture conditions.
itive regulation of transcription from RNA polymerase promoter The SMART-seq v2 and the SMART-seq v4 protocols can detect more
involved in the cellular response to chemical stimulus’ (GO:1901522) is genes and isolate low abundance transcripts but these advantages are
a part of the ‘cellular response to chemical stimulus’ pathway offset by a costly and time-consuming protocol thereby limiting the total
(GO:0070887), according to how the relationships between GO terms number of biological samples that can be analyzed in an experiment (See

17
G. Borsi et al. Journal of Biotechnology 364 (2023) 13–22

Fig. 3. – Top 20 biological enriched pathways associated with the 53 HVGs in common between the CHO-K1 and the HEK A), to the 569 HVGs unique to the CHO-K1
dataset B), and to the 444 HVGs unique to the HEK dataset C). All the adjusted statistically significant values of the terms were negative 10-base log-transformed. GO
enrichment analysis of HVGs was retrieved using gProfiler. D) Venn diagram of the HVGs of CHO-K1 and HEK. The overlapping area represents a the HVGs shared
between the two cell lines. E) t-SNE of CHO-K1 and HEK293 dataset. Each point represents a cell and it is coloured and shaped according to their cell line. F) t-SNE
constructed from normalised log-expression values of 53 HVGs in common between CHO-K1 and HEK293 population.

et al., 2018; Vieth et al., 2019; Ziegenhain et al., 2017). The SMART-seq in volume and are thus unlikely to impact the biochemistry of small
protocol has implemented exogenous RNA spike-ins that were devel­ reaction volumes and minimize the risk of cross-contamination from
oped by the External RNA Controls Consortium (ERCC). On the other free-floating nuclear acids. Sample cross-contamination is moreover
hand, unlike tag-based protocols, SMART-seq lacks Unique Molecular avoided by using sterile single-use cartridges that can be loaded with
Identifiers (UMIs), short (4–10 bp) random barcodes usually added to minimal sample volumes as low as 5 µl. Another advantage over
transcripts during reverse-transcription to improve the ability to mea­ single-cell isolation via FACS is that the image data that is captured and
sure sources of amplification bias (Parekh et al., 2016). Via the SMART stored by the device during each experiment not only facilitate the direct
method, PCR duplicates that arise during the pre-amplification step assessment of basic morphological properties such as size and round­
cannot be detected by their mapping positions (Parekh et al., 2016). ness, but it also allows for individual visual verification of true
Even so, there is no evidence that accuracy and precision can be single-cell events and exclusion of potential double-cell events from
improved by eliminating PCR bias and noise or that they can be nega­ downstream analysis. In fact, the F.SIGHT’s software monitors different
tively impacted by discarding useful information (Parekh et al., 2016). control parameters for the classification of the detected objects in
The first and critical step in the SMART-seq2 workflow is the isola­ real-time to determine which can be considered true single-cells events.
tion of single cells into PCR plates for processing. For this purpose, we The size and roundness of the objects are categorized according to the
utilized the Cytena single-cell dispenser (F.SIGHT). The Cytena device is user-selected upper and lower thresholds so that only cells of roundness
an easy-to-use device that facilitates flexible, highly automated, and and size within the predefined criteria are considered for dispension and
precise isolation of single cells (Riba et al., 2016). Although FACS is all other objects, e.g., debris, dividing cells, or clusters of cells, can be
more widely available and facilitates highly complex multi-parameter excluded. Moreover, only if a single object is detected within the region
sorting of heterogeneous cell populations, single cell isolation with the of interest the droplet is ejected from the nozzle and targeted to the well,
F.SIGHT requires only minimal setup time without the need for manual efficiently excluding double or triple-cell events.
fine-tuning of sorting parameters. The system is also equipped with a The thresholds of morphological properties (roundness of 0.5 – 1 and
module for Automated dispenser Offset Correction (AOC) circumventing a cell size of 10–30 µm for CHO, 10–25 µm for HEK) that have been used
the need for manual droplet alignment to the substrate target positions. to isolate single cells for this study are based on the fact that the cells
In addition, potential electrostatic charging of the target plate is that fall within that range are part of the > 95% viable cell population.
neutralized using ionized air which effectively prevents droplet deflec­ In mammalian cell culture producing biotherapeutics good viability is
tion and further contributes to a highly precise deposition onto the essential in extending the duration of productive culture and main­
center positions of conical 384 well PCR plates. This allows cell lysis in taining product quality. The loss of membrane integrity and its
minute amounts of lysis buffer (0.5–1 µl) which is the prerequisite for smoothness are inextricably linked to cell death (Cooper and McNeil,
the downscaling of reaction volumes in plate-based workflows such as 2015; Draeger et al., 2011; Koerdt et al., 2019; Walker et al., 1988),
single-cell RNA seq. This in turn enables the user to cut down reagent hence it is used to investigate the cell population viability. All passively
consumption and costs and / or increase throughput. Moreover, the destroyed cells have a ruptured plasma membrane, which is a charac­
single-cell-containing droplets generated by the system are only 160 pl teristic of a necrotic phenotype (Koerdt et al., 2019; Zhang et al., 2018).

18
G. Borsi et al. Journal of Biotechnology 364 (2023) 13–22

Fig. 4. – A) t-SNE plot of the CHO-K1 dataset, where each point represents a cell and is coloured and shaped according to the identity of the assigned cluster. The
violin plots describe the distribution of expression values of the 6 top-marker genes identified in CHO-K1 clusters number 1 B), number 2 C), and number 3 D). The y-
axes indicates the log-normalised expression values of the genes. The x-axes separates the clusters. Each point represents a cell and is coloured according to the cluster
they belong to.

Fig. 5. A) t-SNE plot of the HEK293 dataset, where each point represents a cell and is coloured according to the identity of the assigned cluster and shaped according
to the cell cycle phase. The violin plots describe the distribution of expression values of the 6 top-marker genes identified in HEK293 cluster number 1 B), and number
2 C). The y-axes indicates the log-normalised expression values of the genes. The x-axes separates the clusters. Each point represents a cell and is coloured according
to the cluster they belong to.

The product released from necrotic cells can be damaged, with alter­ technology platforms. Not infrequently these differences lead to large
ation in glycosylation profiles, or can disturb downstream purification variations or batch effects, which can be confused with interesting
(Goldman et al., 1997; Gramer and Goochee, 1993; Hansen et al., 1997) biological variation during downstream analysis. For this reason, the
and, more to the purpose of this study, the transcriptome profiles ob­ removal of these technical factors is essential for adequate data analysis.
tained from dying or dead cells are more likely to be of poor quality In the CHO-K1 single-cell dataset, a first PCA plot (Suppl. 2 A) shows
(Gallego Romero et al., 2014). how the PC1 describes 42% of the variance, arranging the cells in such a
Differences found in single-cell data can often be traced back to in­ way as to observe a clear separation between the data of the first and
dependent operators performing the experiments, different plate second set of plates, processed at the same time, and those of the third
sequencing times, using various lots of reagents, or even distinct and fourth plates, processed 6 months later. The difference in the longer

19
G. Borsi et al. Journal of Biotechnology 364 (2023) 13–22

freezing time of plates 3 and 4 may have caused this remarkable batch of the values of FDR and summary.logFC allowed a quick evaluation of
effect, possibly by degradation of RNA during storage and an associated the suitability of a candidate marker: bearing in mind that the lack of
decrease of the common cell-quality metrics. This is an essential aspect evidence is not necessarily evidence of lack of differential expression,
to consider in experimental planning, in particular, if samples are to be the statistical quality of FDR, and summarylog.FC was so poor that in the
collected over prolonged periods of time. Hence, given the pervasive CHO-K1 dataset we could not identify a reliable set of genes capable of
technical source of variation, we chose to rely only on the transcriptome distinguishing one cluster from all the others. Despite the attempt to
from plates 1 and 2 of CHO-K1 single-cells and compare it directly to the observe in detail the presence of hidden cellular subpopulations, and
HEK transcriptome. therefore to find a source of heterogeneity, the top 6 candidate markers
From the t-SNE plot, the variation of the CHO-K1 single-cells is quite of each cluster did not vary significantly across the clusters (Figs. 4B, 4C,
low and the cell cycle factor seems to spread the population through the and 4D).
two dimensions: in particular, we noticed a broader distribution of the A different story, however, emerged from the candidate marker
G2M phase cells. In the study by Ogata et al. (2021) the lack of clear genes of cluster 2 of the HEK dataset. It was observed that the difference
clusters in CHO-K1 suspension cultures was also apparent, except for the in expression levels between the two clusters is statistically significant
hidden populations that showed low and high enolase expression. In our for two genes, namely Cep55 and Aurkb (Suppl. 7, Table 5). Cep55,
study, 36 cell cycle marker genes were classified as highly variable, and centrosomal protein 55, plays a role in mitotic exit and it is required for a
the most enriched pathways of the HVGs set are indeed linked to the successful completion of cytokinesis (Fabbro et al., 2005; Morita et al.,
regulation of the cell cycle. We used a linear model to regress out any 2007). Aurkb is a serine/threonine-protein kinase component of the
effect associated with the annotated phases and checked the more ho­ Chromosomal Passenger Complex (CPC), which acts as a pivotal regu­
mogeneous distribution of the samples in the high-dimensional spaces. lator of mitosis and ensures proper chromosome alignment and segre­
Similarly, for the single cells of the HEK293FT dataset, the t-SNE plot gation (Dabbeekeh et al., 2007; Pouwels et al., 2007). In this regard, as
does not visually generate any distinct cluster. As already foreseen in visible from the t-SNE plot in Fig. 5 A since they were distinguished by a
Suppl. 5 A and more evidently from their functional enrichment analysis different shape, many of the cells annotated in the G2M and S phase
(Fig. 2B), the HEK’s HVGs are involved in the cell cycle process and were found to be part of cluster 2. Thus, for the limited population of
regulation, spindle organization, chromosome segregation, and DNA HEK293 cells examined, the minor source of heterogeneity was attrib­
replication. Previous studies have already shown that transcriptome utable in large part to the cell cycle. That this is not the case for CHO
heterogeneity in adherent cultures is mainly correlated to the cell cycle. cells might conceivably be a consequence of their adaptation to sus­
As for CHO, we used a linear model to regress out any effect associated pension culture, where the large morphological changes that adherent
with the cell cycle phases and checked their more uniform distribution cells go through during mitosis and division are not as evident.
in a t-SNE plot. Overall, a major driver to initiate this study was the fact that CHO
The removal of highly variable cell cycle markers from the list of cells have been extensively characterized for their genetic variation and
HVGs detected from the CHO-K1 and HEK datasets identified enriched malleability (Dhiman et al., 2019). This might conceivably lead to
pathways hidden below these dominant HGVs. Pathways connected to comparable heterogeneity in single cell transcriptomes within a given
the action of RNA polymerase II appear in both CHO-K1 and HEK cul­ population. Nevertheless, under controlled and homogenous culture
tures: this is an enzyme found only in the nucleoplasm, that synthesizes conditions such heterogeneity can not be observed in an unstressed,
small nuclear RNAs (snRNAs) and the messenger RNAs (mRNAs). Such a nonproducing host cell line both under adherent conditions (which
fundamental regulatory process is the next highest source of variation might have been expected to have a higher heterogeneity due to local
after the cell cycle. Similarly, Tzani et al. (2021) already demonstrated differences in the microenvironment that cells encounter) and in sus­
that the heterogeneity in a clonally-derived cell line could arise from the pension culture.
stress induced as a result of recombinant protein production, which can
lead to production instability. 5. Conclusions
Based on these HVGs, we visualized and compared the tran­
scriptomes of CHO-K1 and HEK293 single-cells performing a t-SNE plot In conclusion, the storage time of the CHO-K1 plates might affect the
(Fig. 3E). The samples were arranged according to their specific cell line. variance in the data sets and the quality of the raw data. That can be a
As expected, the distance between the two populations was dramatically fundamental aspect to consider for future studies and outcomes, while
reduced in the dimensional space when we constructed the same plot this aspect would need further insights. Apart from that, the cell-to-cell
using only the 53 HVGs that have been identified as being commonly variation observed in the samples from the first two plates of the CHO-
shared between the two cell lines. Although there is no safe interpre­ K1 cell line is comparable to the one detected in the HEK293FT dataset,
tation of t-SNE (Wattenberg et al., 2016), we can infer that the 53 shared with the minimal exception of progression through the cell cycle which
HVGs are capable of displaying the transcriptomes as close neighbours runs its biological course in each cell individually - irrespective of where
in the two-dimensional embedding (Fig. 3F). This indicates that the cells the other cells are at that time. This behavior might be predominantly
were nearest neighbors also in the high-dimensional space. Neverthe­ determined by the homogenous artificial culture conditions and it can
less, it is evident that a general substructure is maintained: this might be suggest that under such externally set and controlled culture conditions
due to the inter-species variation in the expression level of this set of the stochasticity is negligible, despite the known genetic variation
genes, since the CHO-K1 population possesses 569 unique HVGs, while observed in CHO cells or any other immortalized cell line that grows at
the HEK has 444 unique HVGs. The GO analysis of the 53 shared HVGs high speed. This homogeneity can likely be representative of other
describes a strong representation of pathways related to cell cycle established cell lines, where the well-controlled homogeneous culture
regulation and progression (Orford and Scadden, 2008). Regardless of conditions entail a comparable transcriptome within the cells present.
the natural differences between the two cell lines, the capability of an Nevertheless, we cannot rule out the possibility that the limited quantity
organism to maintain cellular homeostasis is dependent upon an equi­ of samples in this study did not fully capture the diversity in the cell
librium of cell proliferation, differentiation, and death and is one of the population. A potential direction and validation will be to compare
fundamental biological activities (Fabbro et al., 2005). homogeneities among producer cell lines or between hosts and pro­
Cluster analysis has been performed to evaluate, describe and ducers: further work is certainly required to disentangle these
confirm the homogeneity of CHO-K1 and HEK293 cell populations. Cell complexities.
clusters should be internally homogeneous but heterogeneous between
themselves, so we searched for cluster-specific candidate marker genes
in order to be able to assign them a biological explanation. Examination

20
G. Borsi et al. Journal of Biotechnology 364 (2023) 13–22

Funding Cornelison, D.D., Wold, B.J., 1997. Single-cell analysis of regulatory gene expression in
quiescent and activated mouse skeletal muscle satellite cells. Dev. Biol. 191 (2),
270–283. https://doi.org/10.1006/dbio.1997.8721.
This study was fully funded by the EuroStars project grant number E! Dabbeekeh, J.T.S., Faitar, S.L., Dufresne, C.P., Cowell, J.K., 2007. The EVI5 TBC domain
12355 supported by the Austrian Science Fund FFG, and the University provides the GTPase-activating protein motif for RAB11. Oncogene 26 (19),
of Natural Resources and Life Science. 2804–2808. https://doi.org/10.1038/sj.onc.1210081.
Dahodwala, H., Lee, K.H., 2019. The fickle CHO: a review of the causes, implications,
and potential alleviation of the CHO cell line instability problem. Curr. Opin.
CRediT authorship contribution statement Biotechnol. 60, 128–137. https://doi.org/10.1016/j.copbio.2019.01.011.
Devonshire, A.S., Elaswarapu, R., Foy, C.A., 2010. Evaluation of external RNA controls
for the standardisation of gene expression biomarker measurements. BMC Genom.
Giulia Borsi performed all the bioinformatics analyses, took the lead 11, 662. https://doi.org/10.1186/1471-2164-11-662.
in writing the manuscript, contributed to samples preparation, and Dhiman, H., Gerstl, M.P., Ruckerbauer, D., Hanscho, M., Himmelbauer, H., Clarke, C.,
Barron, N., Zanghellini, J., Borth, N., 2019. Genetic and epigenetic variation across
single-cell printing. Elly Sinkala performed the single-cell printing, genes involved in energy metabolism and mitochondria of Chinese hamster ovary
wrote and edited the manuscript with support from Max Sauerland. cell lines. Biotechnol. J. 14 (7), 1800681. https://doi.org/10.1002/biot.201800681.
Nicole Borth, Martina Baumann, Krishna Motheramgari and Heena Doolan, P., Clarke, C., Kinsella, P., Breen, L., Meleady, P., Leonard, M., Zhang, L.,
Clynes, M., Aherne, S.T., Barron, N., 2013. Transcriptomic analysis of clonal growth
Dhiman edited and reviewed the manuscript, provided critical feedback
rate variation during CHO cell line development. J. Biotechnol. 166 (3), 105–113.
and helped shape the research. Nicole Borth and Julian Riba devised and https://doi.org/10.1016/j.jbiotec.2013.04.014.
supervised the study. Dorner, A.J., Bole, D.G., Kaufman, R.J., 1987. The relationship of N-linked glycosylation
and heavy chain-binding protein association with the secretion of glycoproteins.
J. Cell Biol. 105 (6 Pt 1), 2665–2674. https://doi.org/10.1083/jcb.105.6.2665.
Draeger, A., Monastyrskaya, K., Babiychuk, E.B., 2011. Plasma membrane repair and
Declaration of Competing Interest cellular damage control: The annexin survival kit. Biochem. Pharmacol. 81 (6),
703–712. https://doi.org/10.1016/j.bcp.2010.12.027.
Ewels, P., Magnusson, M., Lundin, S., Käller, M., 2016. MultiQC: Summarize analysis
The authors declare the following financial interests/personal re­
results for multiple tools and samples in a single report. Bioinforma. (Oxf., Engl. ) 32
lationships which may be considered as potential competing interests: (19), 3047–3048. https://doi.org/10.1093/bioinformatics/btw354.
Giulia Borsi reports financial support was provided by University of Fabbro, M., Zhou, B.-B., Takahashi, M., Sarcevic, B., Lal, P., Graham, M.E., Gabrielli, B.
Natural Resources and Life Sciences Vienna. Elly Sinkala, Max Sauerland G., Robinson, P.J., Nigg, E.A., Ono, Y., Khanna, K.K., 2005. Cdk1/Erk2- and Plk1-
dependent phosphorylation of a centrosome protein, Cep55, is required for its
and Julian Riba are employees of Cytena. recruitment to midbody and cytokinesis. Dev. Cell 9 (4), 477–488. https://doi.org/
10.1016/j.devcel.2005.09.003.
Data Availability Gaiti, F., Chaligne, R., Gu, H., Brand, R.M., Kothen-Hill, S., Schulman, R.C., Grigorev, K.,
Risso, D., Kim, K.-T., Pastore, A., Huang, K.Y., Alonso, A., Sheridan, C., Omans, N.D.,
Biederstedt, E., Clement, K., Wang, L., Felsenfeld, J.A., Bhavsar, E.B., Aryee, M.J.,
As already specified in the manuscript (section "Raw data availabil­ 2019. Epigenetic evolution and lineage histories of chronic lymphocytic leukaemia.
ity"), the single-cell RNA-seq data will be available at ENA and all the Nature 569 (7757), 576–580. https://doi.org/10.1038/s41586-019-1198-z.
Gallego Romero, I., Pai, A.A., Tung, J., Gilad, Y., 2014. RNA-seq: Impact of RNA
scripts will be available on GitHub. degradation on transcript quantification. BMC Biol. 12 (1), 42. https://doi.org/
10.1186/1741-7007-12-42.
Gils, A., Bertolotto, A., Mulleman, D., Bejan-Angoulvant, T., Declerck, P.J., 2017.
Appendix A. Supporting information
Biopharmaceuticals: reference products and biosimilars to treat inflammatory
diseases. Ther. Drug Monit. 39 (4), 308–315. https://doi.org/10.1097/
Supplementary data associated with this article can be found in the FTD.0000000000000385.
online version at doi:10.1016/j.jbiotec.2023.01.006. Goldman, M.H., James, D.C., Ison, A.P., Bull, A.T., 1997. Monitoring proteolysis of
recombinant human interferon-gamma during batch culture of Chinese hamster
ovary cells. Cytotechnology 23 (1–3), 103–111. https://doi.org/10.1023/a:
References 1007947130709.
Gramer, M.J., Goochee, C.F., 1993. Glycosidase activities in Chinese hamster ovary cell
lysate and cell culture supernatant. Biotechnol. Prog. 9 (4), 366–373. https://doi.
Alessio, E., Bonadio, R.S., Buson, L., Chemello, F., Cagnin, S., 2020. A single cell but
org/10.1021/bp00022a003.
many different transcripts: a journey into the world of long non-coding RNAs. Int. J.
Hansen, K., Kjalke, M., Rasmussen, P.B., Kongerslev, L., Ezban, M., 1997. Proteolytic
Mol. Sci. 21 (1), E302 https://doi.org/10.3390/ijms21010302.
cleavage of recombinant two-chain factor VIII during cell culture production is
Bajić, D., Poyatos, J.F., 2012. Balancing noise and plasticity in eukaryotic gene
mediated by protease(s) from lysed cells. The use of pulse labelling directly in
expression. BMC Genom. 13, 343. https://doi.org/10.1186/1471-2164-13-343.
production medium. Cytotechnology 24 (3), 227–234. https://doi.org/10.1023/A:
Baker, S.C., Bauer, S.R., Beyer, R.P., Brenton, J.D., Bromley, B., Burrill, J., Causton, H.,
1007988713571.
Conley, M.P., Elespuru, R., Fero, M., Foy, C., Fuscoe, J., Gao, X., Gerhold, D.L.,
Hernandez, I., Dhiman, H., Klanert, G., Jadhav, V., Auer, N., Hanscho, M., Baumann, M.,
Gilles, P., Goodsaid, F., Guo, X., Hackett, J., Hockett, R.D., External RNA Controls
Esteve-Codina, A., Dabad, M., Gómez, J., Alioto, T., Merkel, A., Raineri, E., Heath, S.,
Consortium, 2005. The external RNA controls consortium: a progress report. Nat.
Rico, D., Borth, N., 2019. Epigenetic regulation of gene expression in Chinese
Methods 2 (10), 731–734. https://doi.org/10.1038/nmeth1005-731.
Hamster Ovary cells in response to the changing environment of a batch culture.
Barnes, L.M., Bentley, C.M., Dickson, A.J., 2003. Stability of protein production from
Biotechnol. Bioeng. 116 (3), 677–692. https://doi.org/10.1002/bit.26891.
recombinant mammalian cells. Biotechnol. Bioeng. 81 (6), 631–639. https://doi.
Hsu, H.-H., Araki, M., Mochizuki, M., Hori, Y., Murata, M., Kahar, P., Yoshida, T.,
org/10.1002/bit.10517.
Hasunuma, T., Kondo, A., 2017. A systematic approach to time-series metabolite
Battich, N., Stoeger, T., Pelkmans, L., 2015. Control of transcript variability in single
profiling and RNA-seq analysis of chinese hamster ovary cell culture. Sci. Rep. 7,
mammalian cells. Cell 163 (7), 1596–1610. https://doi.org/10.1016/j.
43518. https://doi.org/10.1038/srep43518.
cell.2015.11.018.
Kim, J., Eberwine, J., 2010. RNA: state memory and mediator of cellular phenotype.
Blake, W.J., Balázsi, G., Kohanski, M.A., Isaacs, F.J., Murphy, K.F., Kuang, Y., Cantor, C.
Trends Cell Biol. 20 (6), 311–318. https://doi.org/10.1016/j.tcb.2010.03.003.
R., Walt, D.R., Collins, J.J., 2006. Phenotypic consequences of promoter-mediated
Koerdt, S.N., Ashraf, A.P.K., Gerke, V., 2019. Annexins and plasma membrane repair.
transcriptional noise. Mol. Cell 24 (6), 853–865. https://doi.org/10.1016/j.
Curr. Top. Membr. 84, 43–65. https://doi.org/10.1016/bs.ctm.2019.07.006.
molcel.2006.11.003.
La Manno, G., Gyllborg, D., Codeluppi, S., Nishimura, K., Salto, C., Zeisel, A., Borm, L.E.,
Buckley, P.T., Lee, M.T., Sul, J.-Y., Miyashiro, K.Y., Bell, T.J., Fisher, S.A., Kim, J.,
Stott, S.R.W., Toledo, E.M., Villaescusa, J.C., Lönnerberg, P., Ryge, J., Barker, R.A.,
Eberwine, J., 2011. Cytoplasmic intron sequence-retaining transcripts can be
Arenas, E., Linnarsson, S., 2016. Molecular diversity of midbrain development in
dendritically targeted via ID element retrotransposons. Neuron 69 (5), 877–884.
mouse, human, and stem cells. e19 Cell 167 (2), 566–580. https://doi.org/10.1016/
https://doi.org/10.1016/j.neuron.2011.02.028.
j.cell.2016.09.027.
Clarke, C., Doolan, P., Barron, N., Meleady, P., O’Sullivan, F., Gammell, P., Melville, M.,
Lun, A.T.L., McCarthy, D.J., Marioni, J.C., 2016. A step-by-step workflow for low-level
Leonard, M., Clynes, M., 2011. Large scale microarray profiling and coexpression
analysis of single-cell RNA-seq data with Bioconductor. F1000Research 5, 2122.
network analysis of CHO cells identifies transcriptional modules associated with
https://doi.org/10.12688/f1000research.9501.2.
growth and productivity. J. Biotechnol. 155 (3), 350–359. https://doi.org/10.1016/
Lytal, N., Ran, D., An, L., 2020. Normalization methods on single-cell RNA-seq data: an
j.jbiotec.2011.07.011.
empirical survey. Front. Genet. 11, 41. https://doi.org/10.3389/fgene.2020.00041.
Colak, D., Al-Harazi, O., Mustafa, O.M., Meng, F., Assiri, A.M., Dhar, D.K., Broering, D.C.,
Mahata, B., Zhang, X., Kolodziejczyk, A.A., Proserpio, V., Haim-Vilmovsky, L., Taylor, A.
2020. RNA-Seq transcriptome profiling in three liver regeneration models in rats:
E., Hebenstreit, D., Dingler, F.A., Moignard, V., Göttgens, B., Arlt, W., McKenzie, A.
Comparative analysis of partial hepatectomy, ALLPS, and PVL. Sci. Rep. 10 (1),
N.J., Teichmann, S.A., 2014. Single-cell RNA sequencing reveals T helper cells
5213. https://doi.org/10.1038/s41598-020-61826-1.
synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep. 7 (4),
Cooper, S.T., McNeil, P.L., 2015. Membrane repair: mechanisms and pathophysiology.
1130–1142. https://doi.org/10.1016/j.celrep.2014.04.011.
Physiol. Rev. 95 (4), 1205–1240. https://doi.org/10.1152/physrev.00037.2014.

21
G. Borsi et al. Journal of Biotechnology 364 (2023) 13–22

Marques, S., Zeisel, A., Codeluppi, S., van Bruggen, D., Mendanha Falcão, A., Xiao, L., Riba, J., Renz, N., Niemöller, C., Bleul, S., Pfeifer, D., Stosch, J.M., Metzeler, K.H.,
Li, H., Häring, M., Hochgerner, H., Romanov, R.A., Gyllborg, D., Muñoz Hackanson, B., Lübbert, M., Duyster, J., Koltay, P., Zengerle, R., Claus, R.,
Manchado, A., La Manno, G., Lönnerberg, P., Floriddia, E.M., Rezayee, F., Zimmermann, S., Becker, H., 2016. Molecular genetic characterization of individual
Ernfors, P., Arenas, E., Hjerling-Leffler, J., Castelo-Branco, G., 2016. cancer cells isolated via single-cell printing. PloS One 11 (9), e0163455. https://doi.
Oligodendrocyte heterogeneity in the mouse juvenile and adult central nervous org/10.1371/journal.pone.0163455.
system. Science 352 (6291), 1326–1329. https://doi.org/10.1126/science.aaf6463. Scialdone, A., Tanaka, Y., Jawaid, W., Moignard, V., Wilson, N.K., Macaulay, I.C.,
Martin, M., 2011. Cutadapt removes adapter sequences from high-throughput Marioni, J.C., Göttgens, B., 2016. Resolving early mesoderm diversification through
sequencing reads. EMBnet. J. 17 (1), 10–12. https://doi.org/10.14806/ej.17.1.200. single-cell expression profiling. Nature 535 (7611), 289–293. https://doi.org/
Miyashiro, K., Dichter, M., Eberwine, J. 1994. On the nature and differential distribution 10.1038/nature18633.
of mRNAs in hippocampal neurites: Implications for neuronal functioning. See, P., Lum, J., Chen, J., Ginhoux, F., 2018. A single-cell sequencing guide for
Proceedings of the National Academy of Sciences of the United States of America, 91 immunologists. Front. Immunol. 9, 2425. https://doi.org/10.3389/
(23), 10800–10804. https://doi.org/10.1073/pnas.91.23.10800. fimmu.2018.02425.
Morita, E., Sandrin, V., Chung, H.-Y., Morham, S.G., Gygi, S.P., Rodesch, C.K., Shalek, A.K., Satija, R., Adiconis, X., Gertner, R.S., Gaublomme, J.T., Raychowdhury, R.,
Sundquist, W.I., 2007. Human ESCRT and ALIX proteins interact with proteins of the Schwartz, S., Yosef, N., Malboeuf, C., Lu, D., Trombetta, J.J., Gennert, D., Gnirke, A.,
midbody and function in cytokinesis. EMBO J. 26 (19), 4215–4227. https://doi.org/ Goren, A., Hacohen, N., Levin, J.Z., Park, H., Regev, A., 2013. Single-cell
10.1038/sj.emboj.7601850. transcriptomics reveals bimodality in expression and splicing in immune cells.
Munsky, B., Neuert, G., van Oudenaarden, A., 2012. Using gene expression noise to Nature 498 (7453), 236–240. https://doi.org/10.1038/nature12172.
understand gene regulation. Science 336 (6078), 183–187. https://doi.org/10.1126/ Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W.M., Hao, Y.,
science.1216379. Stoeckius, M., Smibert, P., Satija, R., 2019. Comprehensive Integration of Single-Cell
Nagano, T., Lubling, Y., Várnai, C., Dudley, C., Leung, W., Baran, Y., Mendelson Data. e21 Cell 177 (7), 1888–1902. https://doi.org/10.1016/j.cell.2019.05.031.
Cohen, N., Wingett, S., Fraser, P., Tanay, A., 2017. Cell-cycle dynamics of Sudhagar, A., Kumar, G., El-Matbouli, M., 2018. Transcriptome analysis based on RNA-
chromosomal organization at single-cell resolution. Nature 547 (7661), 61–67. Seq in understanding pathogenic mechanisms of diseases and the immune system of
https://doi.org/10.1038/nature23001. fish: a comprehensive Review. Int. J. Mol. Sci. 19 (1), E245 https://doi.org/
Nowotschin, S., Setty, M., Kuo, Y.-Y., Liu, V., Garg, V., Sharma, R., Simon, C.S., Saiz, N., 10.3390/ijms19010245.
Gardner, R., Boutet, S.C., Church, D.M., Hoodless, P.A., Hadjantonakis, A.-K., Tanay, A., Regev, A., 2017. Scaling single-cell genomics from phenomenology to
Pe’er, D., 2019. The emergent landscape of the mouse gut endoderm at single-cell mechanism. Nature 541 (7637), 331–338. https://doi.org/10.1038/nature21350.
resolution. Nature 569 (7756), 361–367. https://doi.org/10.1038/s41586-019- Tirosh, I., Izar, B., Prakadan, S.M., Wadsworth, M.H., Treacy, D., Trombetta, J.J.,
1127-1. Rotem, A., Rodman, C., Lian, C., Murphy, G., Fallahi-Sichani, M., Dutton-
Ogata, N., Nishimura, A., Matsuda, T., Kubota, M., Omasa, T., 2021. Single-cell Regester, K., Lin, J.-R., Cohen, O., Shah, P., Lu, D., Genshaft, A.S., Hughes, T.K.,
transcriptome analyses reveal heterogeneity in suspension cultures and clonal Ziegler, C.G.K., Garraway, L.A., 2016. Dissecting the multicellular ecosystem of
markers of CHO-K1 cells. Biotechnol. Bioeng. 118 (2), 944–951. https://doi.org/ metastatic melanoma by single-cell RNA-seq. Science 352 (6282), 189–196. https://
10.1002/bit.27624. doi.org/10.1126/science.aad0501.
Orford, K.W., Scadden, D.T., 2008. Deconstructing stem cell self-renewal: genetic Treutlein, B., Brownfield, D.G., Wu, A.R., Neff, N.F., Mantalas, G.L., Espinoza, F.H.,
insights into cell-cycle regulation. Nat. Rev. Genet. 9 (2), 115–128. https://doi.org/ Desai, T.J., Krasnow, M.A., Quake, S.R., 2014. Reconstructing lineage hierarchies of
10.1038/nrg2269. the distal lung epithelium using single-cell RNA-seq. Nature 509 (7500), 371–375.
Parekh, S., Ziegenhain, C., Vieth, B., Enard, W., Hellmann, I., 2016. The impact of https://doi.org/10.1038/nature13173.
amplification on differential expression analyses by RNA-seq. Sci. Rep. 6, 25533. Tzani, I., Herrmann, N., Carillo, S., Spargo, C.A., Hagan, R., Barron, N., Bones, J.,
https://doi.org/10.1038/srep25533. Shannon Dillmore, W., Clarke, C., 2021. Tracing production instability in a clonally
Patel, A.P., Tirosh, I., Trombetta, J.J., Shalek, A.K., Gillespie, S.M., Wakimoto, H., derived CHO cell line using single-cell transcriptomics. Biotechnol. Bioeng. 118 (5),
Cahill, D.P., Nahed, B.V., Curry, W.T., Martuza, R.L., Louis, D.N., Rozenblatt- 2016–2030. https://doi.org/10.1002/bit.27715.
Rosen, O., Suvà, M.L., Regev, A., Bernstein, B.E., 2014. Single-cell RNA-seq Vieth, B., Parekh, S., Ziegenhain, C., Enard, W., Hellmann, I., 2019. A systematic
highlights intratumoral heterogeneity in primary glioblastoma. Science 344 (6190), evaluation of single cell RNA-seq analysis pipelines. Nat. Commun. 10 (1), 4667.
1396–1401. https://doi.org/10.1126/science.1254257. https://doi.org/10.1038/s41467-019-12266-7.
Picelli, S., Faridani, O.R., Björklund, A.K., Winberg, G., Sagasser, S., Sandberg, R., 2014. Walker, N.I., Harmon, B.V., Gobé, G.C., Kerr, J.F., 1988. Patterns of cell death. Methods
Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9 (1), 171–181. Achiev. Exp. Pathol. 13, 18–54.
https://doi.org/10.1038/nprot.2014.006. Wattenberg, M., Viégas, F., Johnson, I., 2016. How to Use t-SNE effectively. Distill 1 (10),
Pilbrough, W., Munro, T.P., Gray, P., 2009. Intraclonal protein expression heterogeneity e2. https://doi.org/10.23915/distill.00002.
in recombinant CHO cells. PloS One 4 (12), e8432. https://doi.org/10.1371/journal. Weinguny, M., Klanert, G., Eisenhut, P., Lee, I., Timp, W., Borth, N., 2021. Subcloning
pone.0008432. induces changes in the DNA-methylation pattern of outgrowing Chinese hamster
Pouwels, J., Kukkonen, A.M., Lan, W., Daum, J.R., Gorbsky, G.J., Stukenberg, T., ovary cell colonies. Biotechnol. J. 16 (6), e2000350 https://doi.org/10.1002/
Kallio, M.J., 2007. Shugoshin 1 plays a central role in kinetochore assembly and is biot.202000350.
required for kinetochore targeting of Plk1. Cell Cycle 6 (13), 1579–1585. https:// Zhang, Y., Chen, X., Gueydan, C., Han, J., 2018. Plasma membrane changes during
doi.org/10.4161/cc.6.13.4442. programmed cell deaths. Cell Res. 28 (1), 9–21. https://doi.org/10.1038/
Ramsköld, D., Luo, S., Wang, Y.-C., Li, R., Deng, Q., Faridani, O.R., Daniels, G.A., cr.2017.133.
Khrebtukova, I., Loring, J.F., Laurent, L.C., Schroth, G.P., Sandberg, R., 2012. Full- Ziegenhain, C., Vieth, B., Parekh, S., Reinius, B., Guillaumet-Adkins, A., Smets, M.,
length mRNA-Seq from single-cell levels of RNA and individual circulating tumor Leonhardt, H., Heyn, H., Hellmann, I., Enard, W., 2017. Comparative analysis of
cells. Nat. Biotechnol. 30 (8), 777–782. https://doi.org/10.1038/nbt.2282. single-cell RNA sequencing methods. e4 Mol. Cell 65 (4), 631–643. https://doi.org/
Raser, J.M., O’Shea, E.K., 2005. Noise in gene expression: Origins, consequences, and 10.1016/j.molcel.2017.01.023.
control. Science 309 (5743), 2010–2013. https://doi.org/10.1126/science.1105891.

22

You might also like