Figures
Abstract
Sepsis is a life-threatening condition characterized by an exaggerated immune response to pathogens, leading to organ damage and high mortality rates in the intensive care unit. Although deep learning has achieved impressive performance on prediction and classification tasks in medicine, it requires large amounts of data and lacks explainability, which hinder its application to sepsis diagnosis. We introduce a deep learning framework, called scCaT, which blends the capsulating architecture with Transformer to develop a sepsis diagnostic model using single-cell RNA sequencing data and transfers it to bulk RNA data. The capsulating architecture effectively groups genes into capsules based on biological functions, which provides explainability in encoding gene expressions. The Transformer serves as a decoder to classify sepsis patients and controls. Our model achieves high accuracy with an AUROC of 0.93 on the single-cell test set and an average AUROC of 0.98 on seven bulk RNA cohorts. Additionally, the capsules can recognize different cell types and distinguish sepsis from control samples based on their biological pathways. This study presents a novel approach for learning gene modules and transferring the model to other data types, offering potential benefits in diagnosing rare diseases with limited subjects.
Author summary
Deep learning models used in disease diagnosis usually suffer from insufficient data for training and the lack of explainability, especially in rare diseases. These shortages hinder their application to sepsis diagnosis. Here we propose a diagnostic framework name scCaT, which transfers knowledge learned from single-cell RNA-seq, for diseases with insufficient bulk data. The framework uses capsulating architecture to group genes into capsules and provide explainability to the deep learning model for sepsis diagnosis. ScCaT achieves robust and outstanding performance for sepsis diagnosis in both scRNA-seq and bulk RNA datasets. This architecture offers potential approaches in diagnosing rare diseases with limited subjects with explainability.
Citation: Zheng X, Meng D, Chen D, Wong W-K, To K-H, Zhu L, et al. (2024) scCaT: An explainable capsulating architecture for sepsis diagnosis transferring from single-cell RNA sequencing. PLoS Comput Biol 20(10): e1012083. https://doi.org/10.1371/journal.pcbi.1012083
Editor: Wei Li, Children’s National Hospital, George Washington University, UNITED STATES OF AMERICA
Received: April 16, 2024; Accepted: October 7, 2024; Published: October 21, 2024
Copyright: © 2024 Zheng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The source code of scCaT has been uploaded to https://github.com/Kimxbzheng/CaT. The data cohorts used in this study can be found on Broad Institute Single Cell Portal (https://singlecell.broadinstitute.org/single_cell), with portal ID: SCP548 and Gene Expression Omnibus (GEO) database according to the accession IDs (GSE185263, GSE95233, GSE26440, GSE57065, GSE28750, GSE8121, GSE9692, GSE13904, GSE26378, GSE4607). The preprocess data, single-cell pretrained model, transferred model can be accessed on Zenodo (https://doi.org/10.5281/zenodo.13131665).
Funding: This work was supported in part by National Natural Science Foundation of China (32370711 and 32300554) received by LC and XZ, Shenzhen Medical Research Fund (A2303033) received by LC and Shenzhen Science and Technology Program (JCYJ20220530152409020) received by LC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Sepsis is a severe condition with the highest mortality rate in the intensive care unit (ICU), caused by an overreaction of the immune system to pathogens which can damage multiple organs [1]. However, diagnosing sepsis is challenging as its symptoms can also be caused by other disorders, and there are no biomarkers approved by U.S. Food and Drug Administration (FDA) for its precision diagnosis. Current diagnosis relies on a series of tests, including blood test, urine test, X-ray and computed tomography (CT)-scan, which is time-consuming and may miss the optimal window for intervention. Therefore, an effective molecular diagnostic model is critical to assist in sepsis detection.
The rapid development of the high-throughput sequencing technologies and the deep learning techniques has led to numerous computational methods for disease detection and diagnosis [2–6]. Deep learning-based diagnostic models, such as those proposed by Bilal et al.[7], Ethan et al.[8], and Kam et al. [9] have utilized convolutional neural network and long short-term memory for sepsis detection. However, these models rely on vital signs and signals like electrocardiogram (ECG) and biochemistry laboratory values measured by different clinical tests, which could be time-consuming. On the other hand, biomarkers based on gene expression, such as sNIP [10], SeptiCyte [11], and FAIM3/PLAC8 [12], only require a blood test. Nevertheless, they are merely developed based on microarray gene expression data and not generalizable across platforms.
Although deep learning models provide opportunities for molecular diagnostics of sepsis via analysis of host transcriptome data, we face two major challenges. Firstly, training these models requires a substantial amount of data, typically on the order of tens of thousands. In clinical settings, however, it is difficult to acquire such a large dataset due to factors such as privacy concerns, resource-intensive processes, time constraints, and limited participant availability, particularly for rare diseases or complex medical conditions. Secondly, the deep learning models inherently lack explainability, often functioning as "black boxes" with little insight into the underlying biological mechanisms driving their predictions. This lack of interpretability poses challenges for clinicians and researchers in understanding the decision-making process of the models.
To overcome these challenges, we proposed a deep learning framework called scCaT that fused capsulating architecture and Transformer to develop a detection model for sepsis diagnosis using single-cell RNA sequencing (scRNA-seq) data. ScRNA-seq provides high-resolution gene expression data at individual cell level, but the model trained solely on scRNA-seq data is not readily adapted to bulk RNA measurements, which are commonly used in clinical diagnostics. To address this, transfer learning was applied to transfer the model to bulk RNA cohorts. Specifically, the model trained on the scRNA-seq dataset was utilized as a pre-trained model and was fine-tuned on microarray and bulk RNA-seq data.
Moreover, scCaT utilized the capsulating architecture as an effective gene expression encoder to improve explainability. We functionally characterized the capsulating architecture, also referred as capsule network, which groups genes with similar biological functions into capsules by dynamic routing owing to genes tend to work together as modules during the biological process. Subsequently, the self-attention mechanism known as Transformer was employed as a classifier to identify the cells from sepsis patients or controls.
ScCaT achieved high accuracy with an area under the receiver operating characteristic curve (AUROC) of 0.93 on single-cell test set and an average AUROC of 0.99 on six microarray independent cohorts. Moreover, the results demonstrated that the capsule network learned to recognize different cell types and distinguished sepsis in each cell type based on the biological functions grouped in 20 capsules. The 20 capsules were found to be enriched in the immune-related functions in the inflammation associated with sepsis.
By combining the strengths of the capsulating architecture, Transformer, and transfer learning, scCaT addresses the problems of data availability, adaptability to other data types, and the need for applying explainability in deep learning models for disease diagnosis. Our framework shows promising potential for improving the accuracy and interpretability of sepsis detection, and it paves the way for future advancements in the diagnosis and the prediction of outcomes for diseases and medical conditions with limited data.
Material and methods
Study design
We collected single-cell RNA sequencing (scRNA-seq) data for septic patients and normal controls from the Broad Institute Single Cell Portal (https://singlecell.broadinstitute.org/single_cell), with portal ID: SCP548 (subject PBMCs). The dataset contains gene expression data for 126,351 cells from peripheral blood mononuclear cells of 29 septic patients and 36 corresponding controls across three cohorts. The septic patients include subjects with urinary-tract infection and mild or transient organ dysfunction (Int-URO), clear or persistent organ dysfunction (Urosepsis, URO), and sepsis in hospital wards (Bac-SEP) and medical intensive care unit (ICU-SEP). The control samples included subjects with urinary-tract infection but no organ dysfunction (Leuk-UTI), subjects admitted in the medical intensive care unit without sepsis (ICU-NoSEP), and uninfected healthy controls. The scRNA-seq was performed on the Chromium 10x3’ v2 profiling chemistry (10X Genomics) and was normalized with log2 transcripts per kilobase million (TPM). We filtered out cells that profiled less than 20% of genes and genes that were recorded in less than 20% of cells.
We also collected nine microarray cohorts that profiled the blood of septic patients and normal controls from the Gene Expression Omnibus (GEO) database. GSE185263 was profiled by RNA-seq performed on Illumina Hi-Seq and normalized with log2 TPM. The other eight cohorts were profiled by microarray performed on the platform Affymetrix Human Genome U133 Plus 2.0 Array (AffyU133p2) and normalized by the Robust Multi-array Average (RMA). Detailed description for each dataset was shown in Table 1. We then intersected the genes profiled in scRNA-seq (SCP548) with those in RNA-seq and microarray cohorts, resulting in the expression data for 2,869 genes (S1 Table) used for downstream analysis.
ScCaT
The proposed deep learning framework, scCaT, incorporated capsulating architecture, capsule network, and the self-attention model, Transformer. The capsule network served as an encoder that encoded effective genes’ expression into capsules, which can recognize different cells in terms of cell types and status. The Transformer was applied to estimate the probability of sepsis by considering the global connectivity between capsules.
Capsule network.
The capsule network is a capsulating neural network architecture proposed for computer vision recognition [13]. In this study, we adopted the capsule network as a gene expression encoder due to its grouping architecture of genes into capsules through dynamic routing. The capsule network was aimed to group genes with similar biological functions into capsules that would represent immune related properties in sepsis.
The capsule network architecture for gene expression in this study contained two layers. The first layer was a fully connected layer that converted gene expression X = (x1, x2,…,xn) into eight primary capsules u1, u2,…,u8 by using different weights W1, W2,…,W8, where ui = WiX. The second layer was the capsule layer that included 20 capsules v1, v2,…,v20, where each capsule was a vector with 16 dimensions calculated by dynamic routing. The primary capsules u1, u2,…,u8 were mapped to higher-level vectors where wij is the weight matrix. Dynamic routing introduced a coupling coefficient to concentrate on the important information within the primary capsule of genes without losing other features like maxpooling. The vector sj was then composed of the with coupling coefficient, which is . The capsule vj was obtained by squashing, which was (1)
The first component of Eq (1), , scaled the result to be within 0 and 1. The second component, , normalized the result and preserved the direction of the vector sj. The coupling coefficient cij included a coefficient bij which was updated as follows: (2)
The coefficients in dynamic routing were updated by iteration in the following steps: (3) (4) (5)
The dynamic routing was iterated for three times and the was the final capsule output. The network weight wij was obtained by backward propagation. The capsule network was applied as an encoder that represents different biological pathways through learning gene expression for septic classification.
Transformer.
Transformer is a network architecture widely used in artificial intelligence, including natural language processing, computer vision, and content generation [14]. Through self-attention techniques, it can capture context by tracking relationships between distant elements in a sequence, making it well-suited for learning connectivity between different gene expression modules represented by capsules.
In this study, we used a multi-head attention mechanism as a decoder to decode the capsule output from the capsule encoder, denoted as (6)
Each capsule output was transformed into a query matrix (Q), a key matrix (K), and a value matrix (V) by linear matrix transformations, i.e. . The attention of the capsule output was then computed as follows: (7) where dk represented the dimension of the capsule, which was set to 16 in this study.
The output of a single self-attention head was denoted as Zi = attention(Qi, Ki, Vi). We used multiple self-attention heads Z1, Z2,…,Zn, and concatenated them to Z′. Finally, we applied a dense layer with 150 neurons and a sigmoid activation function to obtain the probability of sepsis occurrence based on the concatenated attention output.
Pre-Training on single-cell RNA-seq
The cells collected from 29 septic patients and 36 corresponding controls were normalized and integrated. They were then randomly divided into three sets: the training set (80% of the cells), the validation set (10% of the cells) to fine tune the neural network, and the test set (10% of the cells) to evaluate the performance of the scCaT.
To optimize the output probability, we applied binary cross-entropy loss function: (8) where yi is the label of the cells, and p(yi) is the predicted probability obtained using scCaT . We performed backpropagation to update the network parameters using Adam optimization based on the loss function. The network was trained for 50 epochs with a batch size of 32 and early stopping.
After training, the network model was tested on the test set, which consisted of 4,265 cells, based on area under ROC curve (AUC). The AUC shows the trade-off between true positive rate (TPR) and false positive rate (FPR) that were calculated as follows: (9) (10) where TP is the number of true positively classified samples, FN is the number of false negatively classified samples, FP is the number of false positively classified samples, and TN is the number of true negatively classified samples.
We compared the performance of scCaT to that of existing biomarkers including FAIM3/PLAC8, SeptiCyte, and sNIP, and the traditional machine learning methods, including decision tree, random forest, naïve bayes, K-nearest neighborhoods, and quadratic discriminant analysis on the single-cell RNA-seq test set.
Transferring to bulk RNA cohorts
Our final model aimed to assist diagnosis in the clinic, so we attempted to deploy it on a different data type using transfer learning. We utilized the scCaT trained on single-cell RNA-seq as a pretrained model and fine-tuned it on bulk RNA-seq and microarray data. By following the same procedure, our model could be transferred to other types of data commonly used in the clinic.
We collected nine microarray cohorts and one bulk RNA-seq for transfer and evaluation (Table 1). The three largest microarray cohorts, GSE95233, GSE26440, and GSE57065, were integrated to fine-tune the pretrained network. We initialized the last dense layer and froze all the previous layers, including the capsule and self-attention layers. In this way, we trained the last layer on the three largest cohorts and then trained the entire network to focus more on microarray data. Following the same procedure, we fine-tuned the pretrained network on 30% samples in bulk RNA-seq cohort, GSE185263, which contained 103 sepsis patients and 14 healthy controls.
We evaluated the performance of the fine-tuned network on six independent cohorts and 70% samples in the bulk RNA-seq cohort, GSE185263, using AUROC and compared it to existing biomarkers, including FAIM3/PLAC8, SeptiCyte, and sNIP, as well as traditional machine learning methods such as decision tree, random forest, naïve bayes, K-nearest neighborhoods, and quadratic discriminant analysis. We also conducted a rotated test, where we transferred the model to one cohort and tested it on other cohorts.
Notably, we evaluated the performance in area under the receiver operating characteristic (AUROC) and area under the precision recall curve (AUPRC) with precrec [15], an R package that produce accurate estimation [16].
Annotation of network model
To investigate the biological functions learned by scCaT from gene expression, we extracted the capsule outputs and performed visualization using uniform manifold approximation and projection (UMAP) with cell type annotation.
To resolve the function of the capsule network, we visualized and extracted the genes captured in the primary capsule layer. We extracted the genes’ weights for each of the eight primary capsules and displayed the most important genes in a heatmap. Next, we compared the genes with higher importance, whose weights were larger than 0.06 in absolute value, across the eight primary capsules. To gain insight into the biological pathways associated with each capsule, we performed enrichment analysis of the important genes using Gene Ontology (GO).
To uncover the pathways of capsules, we conducted an activation test on the capsule outputs and retrieved genes participating in each capsule. Specifically, we systematically inhibited some of the gene inputs to determine which genes made large positive or negative contribution to each capsule. For instance, the capsule 8 and capsule 19 were significantly differential when they had particular genes activating. We then utilized Gene Ontology (GO) to investigate the biological pathways associated with the activated genes, which enabled us to infer the capsule pathways learned by the capsule network for each of these capsules. Subsequently, we constructed a capsule-pathway network to better understand the relationships between the biological functions of the 20 capsules.
Results
Overall framework
scCaT is a novel framework that incorporates the capsulating architecture with Transformer and the transfer learning from single-cell RNA sequencing (scRNA-seq) data to bulk RNA data. We collected cells from patients with sepsis and cells from control individuals (Fig 1A). Then, we used 80% of the cells as training samples to train the capsulating architecture with Transformer (Fig 1B), while the remaining cells were used for tuning hyperparameters and single-cell validation. The capsulating architecture is also referred as capsule network, where genes adopted in capsules was updated by dynamic routing (Fig 1C). The model trained on scRNA-seq was treated as a pretrained model, and we then transferred it to bulk RNA data, specifically microarray and bulk RNA-seq data. The neural network was then fine-tuned on the three largest cohorts out of nine microarray profiles and a small proportion of a bulk RNA-seq profile, while the remaining six cohorts and the rest of the bulk RNA-seq profile were used for validation and comparison to other biomarkers and traditional machine learning methods (Fig 1D).
A. Single-cell gene expression of peripheral blood mononuclear cells collected from sepsis patients and normal controls. B. Deep neural network architecture of scCaT. scCaT was constructed by blending capsule network and Transformer, and then it was trained using the gene expression of cells. C. Dynamic routing procedures of scCaT. D. Transfer learning. scCaT was transferred to subjects using bulk RNA data for fine-tune. It was evaluated and compared on independent cohorts.
Performance evaluation on single-cell RNA sequencing data
In order to assess the performance of scCaT, we conducted a test on single-cell RNA sequencing data’s test set, which accounted for 10% of the SCP548 dataset. We compared scCaT with existing biomarkers, such as FAIM3/PLAC8 [12], SeptiCyte [11], and sNIP [10], as well as traditional machine learning models like nearest neighbors, decision tree, random forest, naïve bayes, and quadratic discriminant analysis.
scCaT achieved an AUROC of 0.93 and an AURPC of 0.93, which is higher than FAIM3/PLAC8 (AUROC = 0.49 and AUPRC = 0.44), SeptiCyte (AUROC = 0.60 and AUPRC = 0.53), and sNIP (AUROC = 0.51 and AUPRC = 0.46) (Fig 2A and 2B). The nearest neighbors, decision tree, random forest, naive bayes, and quadratic discriminant analysis models achieved AUROC scores of 0.61, 0.71, 0.87, 0.68, 0.77, and AUPRC scores of 0.53, 0.69, 0.71, 0.58, 0.64, respectively. The superior classification capability of scCaT mainly attributed to the higher complexity and parameter count for scRNA-seq data that contained different cell types (as explained below).
A-B. ROC and PRC demonstrating the performance of scCaT, existing biomarkers, and traditional machine learning methods, for sepsis diagnosis from single-cell data. C-D. AUROC and AUPRC scores demonstrating the performance of scCaT and the other methods for sepsis diagnosis from microarray data. E-F. Heatmap showing the AUROC and AUPRC scores of scCaT transferred on one cohort and tested on the others.
Performance evaluation on bulk RNA data
To transfer scCaT on other data type, we applied the pre-trained model on scRNA-seq data and fine-tuned on another data type. Specifically, we fine-tuned the model on the three largest microarray cohorts and evaluated its performance on six independent validation cohorts. We also fine-tuned the model on a small proportion of bulk RNA-seq samples and evaluated it on the rest 70% samples of the bulk RNA-seq cohort.
We compared the scCaT’s performance to that of existing biomarkers (FAIM3/PLAC8, SeptiCyte, and sNIP) and the traditional machine learning methods (nearest neighbors, decision tree, random forest, naive Bayes, and quadratic discriminant analysis) on these six microarray cohorts and one bulk RNA-seq cohort. ScCaT outperformed all other methods with an average AUROC of 0.986 and AUPRC of 0.994 (Fig 2C and 2D). Although FAIM3/PLAC8 achieved comparable performance to our model on average, it performed poorly on GSE4607 and GSE13904 (AUROC of 0.886 and 0.821, respectively) compared to our model (AUROC of 0.984 and 0.971, respectively).
Notably, FAIM3/PLAC8 and sNIP were discovered from microarray data, which may explain their comparable performance to our model on microarray. However, these biomarkers lack the robustness to be transferred to other data types for clinical use. In contrast, our framework allows for fine-tuning of the model, making it applicable in a clinical setting with only a few clinical-type data.
To assess the generalization performance of the model, we conducted a rotated test on the nine microarray cohorts, where the model was fine-tuned on one cohort and tested on another one (Fig 2E and 2F). The results demonstrate that scCaT achieved an AUROC and AUPRC above 0.90 in most cases, with the worst case having an AUROC and AUPRC of 0.82. These findings indicate that our framework has good generalization performance.
Cell types learned by capsule network
The capsule network played an important role in the model by adopting the associated genes into capsules and providing encoder for the whole model. We visualized the capsule outputs using UMAP and found that the capsule network can learn the cell type automatically (Fig 3). Compared to the raw input single-cell data (Fig 3A), the capsule network transformed the genes into capsules and clustered the cells into distinct groups (Fig 3B). Interestingly, within each cluster, the cells from sepsis and control samples were separated. However, the cell proportion did not show much difference between sepsis and controls (S1 Fig).
The raw input data (A) and the capsule outputs (B) annotated by samples collected from sepsis patients and controls. The capsule outputs clustered cells in several groups and separate sepsis and controls in each group. By annotating the raw input data (C & E) and the capsule outputs (D & F) with cell types and cell states, we found that the group clustered by the capsule outputs corresponded to different cell types and cell states.
To further investigate the capabilities of the capsule network, we annotated the cells by their cell types and states. We found that the capsule outputs clustered different cell types and cell states in comparison to the raw input (Fig 3C–3F), indicating that the capsule network can learn cell type information. The capsule network can group B cells, T cells, and natural killer cells, which are differentiated from lymphoid progenitor, into the same cluster (Fig 3D). Moreover, the capsule network was able to separate two dendritic cell states (DS1 and DS2 in Fig 3F).
Our observations suggested that the differences across cell types were greater than the differences between cells from sepsis and controls. Theoretically, the capsule network first learned how to distinguish different cell types and then classified cases and controls within each cell type. Besides that, we found that each dimension of the capsules can be used to represent an aspect of the capsule outputs (S2 and S3 Figs). Although the capsule network learnt cell type information, scCaT demonstrated better performance than using cell proportion in diagnosis (S2 Table).
Functional characterization of primary capsules
To reveal the genes participating in primary capsules, we extracted and visualized their weights for each gene (Fig 4A). We identified the genes with higher importance and analyzed their intersection between primary capsules using an upset graph (Fig 4B). The results showed that only a few genes were shared among two or three primary capsules, indicating that each primary capsule captured distinct gene information from different aspects.
A. Heatmap visualization of the weights of the eight primary capsules. B. Upset graph showing the intersection of genes included in the eight primary capsules. C-J. Pathway analysis of the genes included in each of the eight primary capsules. The enriched biological pathways are different from the other primary capsules. Gray node represents genes included in each capsule and yellow node represents enriched pathway.
We further explored the biological pathways associated with the genes in primary capsules. Except for primary capsule 4, the genes extracted by the primary capsules were mainly involved in neutrophils degranulation and activation, which are critical events in sepsis pathogenesis and can cause tissue damage [17] (Fig 4C–4J). Moreover, each primary capsule was also associated with specific pathways related to inflammation in sepsis. For instance, primary capsule 5 and 7 were involved in the NF-kappaB transcription factor activity, a central mediator of pro-inflammatory gene induction and functions [18] (Fig 4G). The primary capsule 2 was involved in neuron death and the regulation of interleukin-12 production, a proinflammatory cytokine with immunoregulatory functions [19] (Fig 4D). Other primary capsules were also involved in pathways related to the inflammation in sepsis, such as primary capsule 1 for T-cell activation (Fig 4C), primary capsule 3 for neuron death and interferon-gamma [20] (Fig 4E), primary capsule 4 for apoptotic signaling pathway (Fig 4F), primary capsule 6 for leukocyte cell-cell adhesion [21] (Fig 4H), and primary capsule 7 for reactive oxygen species metabolic process [22] (Fig 4I).
Capsule-pathway network analysis
After the primary capsule layer, genes were further grouped into several capsules based on their biological functions by dynamic routing. To identify the genes in each capsule, we conducted activation analysis in the capsule network. Specifically, we selectively activated or shut down certain gene inputs and identified the genes that contributed most to each capsule through enumeration (Fig 5A and 5B). These important genes were learned by the capsule network and helped classify sepsis and control samples.
A-B. By activating specific patterns of genes as input, the capsule was activated and the specific genes adopted in each capsule can be identified. C-D. The biological pathways enriched in capsules 8 and 19. E. Capsule-pathway network. Blue nodes represent capsules and yellow nodes represent pathways.
Next, we identified the capsule pathways, i.e., the functions enriched in each capsule, based on Gene Ontology (GO) (S4 Fig). For instance, the pathways of capsule 8 included neutrophil activation and neutrophil degranulation that are critical in inflammation, the proliferation of lymphocyte, nonnuclear cell, and leukocyte, which are hallmark of the adaptive immune response to pathogens [23] (Fig 5C). The capsule 19 was involved in RNA splicing, regulation of viral process, cellular response to interferon-gamma, and the antigen processing, indicating that it grouped genes related to the immune responses to viral infection (Fig 5D).
Finally, we constructed a capsule-pathway network of the 20 capsules. Only a few genes were shared between different capsules. Our results suggested that genes in each capsule were mainly enriched in several pathways, indicating that the capsules filtered genes based on different biological aspects. We also found that Capsules 1, 3, 6, 8, 10, and 15 participated in the activation and degranulation of neutrophils, which are critical in the inflammation seen in sepsis (Fig 5E). Furthermore, different capsules enriched in specific biological functions, suggesting that they can learn about cells and sepsis by considering different functions from different perspectives.
For example, Capsule 9 in the capsule-pathways network focuses on the a series of bacterial infection-related functions different from other capsules, including response to molecule of bacterial origin, response to lipopolysaccharide, regulation of interleukin-1 beta production and osteoclast differentiation (Fig 5E). Lipopolysaccharide is produced by Gram-negative bacteria and can activate innate immune responses to the molecule of bacterial origin [24]. The bacteria-induced inflammation is associated with overproduction of cytokines including interleukin-1, which can amplify osteoclast differentiation [25]. We mapped the genes in each capsule to proteins and found that the connectivity within capsules in the protein-protein interaction network using STRING [26] were greater than the overall connectivity including cross-capsules (S5 Fig).
Discussion
We developed scCaT, a deep learning framework that combines capsule network and Transformer to create an explainable model for sepsis detection based on gene expression. We identified the genes involved in each capsule and discovered that scCaT’s capsulating architecture groups genes with similar biological pathways into 20 capsules, which can differentiate between various cell types and distinguish sepsis from control samples in each cell type.
The capsules in this study were acquired using dynamic routing instead of backward propagation. Dynamic routing involved updating the coefficient cij of each primary capsule in a way that each capsule output would concentrate more on one or several primary capsules. This was achieved through the update of bij using Eq (2). If the capsule output was similar to one of the primary capsules, the bij would assign greater weight to that primary capsule through iterations. The number of iterations determined the strength of this weight. In our study, we utilized three times of iterations since an excessive number of iterations could eliminate dissimilar primary capsules and render the process redundant.
It is of vital importance for researchers to gain insights into the specific genes, pathways, or biological processes that contribute to pathogenesis, enabling further exploration and refinement of diagnostic criteria [39]. Activation tests on the capsule explicitly show that the genes adopted in different capsules have only a few intersections, indicating that the capsules extracted features from different aspects (Fig 4). Each capsule possesses specific biological functions related to sepsis and identifies sepsis based on these functions. Gene ontology analysis show that each capsule enriched in particular functions related to different processes in inflammation (Fig 5).
In the capsule-pathway network, it seems that capsules focus on a specific series of pathways regulated in sepsis inflammation such as Capsule 9 (Fig 5E). Moreover, some capsules focused on biological pathways such as the degranulation and activation of neutrophils, indicating that the capsule network has learned the critical events in sepsis pathogenesis that can cause tissue damage. Furthermore, different capsules enriched in specific biological functions, suggesting that they can learn about cells and sepsis by considering different functions from different perspectives.
The model was pretrained on single-cell RNA-seq data and then transferred to bulk RNA-seq and microarray cohorts using transfer learning. Following the same procedure, scCaT can also be adapted to the data types used in clinical practice, such as RT-PCR. We also tried data augmentation and domain specific normalization. We synthesized new data from the fine-tuning data as an augmentation for transfer learning, but the data augmentation did not achieve the performance as high as original data, probably due to the introduction of extra noise from the augmented data [27] (S6A Fig and S3 Table). Adaptive instance normalization is a domain-specific normalization, which can align the distributions from different content features [28] (S6B Fig). Adding adaptive instance normalization only achieved slightly better performance (0.02 in AUROC and 0.01 in AUPRC) on one cohort, but slightly worse or nearly the same on the other six cohorts. Transfer learning can transfer knowledge from single cells to bulk RNAs without further preprocessing or retraining when applying to new patients, while domain-specific normalization will introduce preprocessing procedures before applying in clinical practice. Transfer learning has two advantages: (1) deploying the model across data measured by different biological techniques, and (2) improving the performance by associating large datasets to pretrain the model. This study provides a framework for transferring deep learning model to clinical settings and diagnosing rare diseases with insufficient data for training.
During the construction of the network architecture, hyperparameters were determined by 5-fold cross-validation on the training set and evaluated on the validation set for a better performance of scCaT. We examined the number of routing iterations in dynamic routing and found that the number of iterations did not have much effect on the performance but three iterations achieved slightly better performance (S7A Fig). Besides, sigmod function was superior to Relu and Tanh functions as the activation function in each neuron (S7B Fig). Moreover, we performed 5-fold cross-validation for the number of capsules and neurons. The results demonstrated that the number of capsules did not have much impact in the 5-fold cross-validation but using 20 capsules achieved the best performance in the testing set (S7C and S7D Fig). Moreover, the performance was getting better along with the increase of the number of neurons in dense layer in 5-fold cross-validation but slightly worse after 150 neurons (S7E and S7F Fig). Therefore, we took a balance and selected 150 neurons.
During the pretraining process, splitting the dataset based on cells may rise concern. We have also considered splitting the data by individuals, but the results were worse as expected. One of the reasons for splitting data by cells is that a proportion of the individuals in this cohort only include dozens of cells, which would bring large data imbalance leading to the bias on specific cell types. Furthermore, the scRNA-seq cohort was used for pretraining the model and was then transferred to individuals. Pretraining a better model based on cells would benefit the final performance in transfer learning for individual diagnosis. Therefore, we split the data by cells.
Although scCaT has exceptional performance, there is room for improvement. As the omics data in the sepsis research field is accumulating rapidly [40], in our future work, we plan to enhance scCaT by integrating diverse sources of datasets, including genomics, proteomics, and metabolomics. Furthermore, we will keep collecting clinical data and sepsis samples to fine-tune and attempt to deploy the model in clinical applications after clinical validation.
The study demonstrated a method for learning gene modules with a capsulating architecture and transferring the model to other data types. This framework could be useful in diagnosing rare diseases where obtaining a large number of subjects is challenging.
Supporting information
S2 Table. Diagnostic performance of cell proportion and scCaT.
https://doi.org/10.1371/journal.pcbi.1012083.s002
(XLSX)
S1 Fig. The cell proportion of patients and controls in single-cell sepsis cohort.
https://doi.org/10.1371/journal.pcbi.1012083.s004
(TIF)
S2 Fig. The output of the dimensions for all the capsules annotated by phenotype.
https://doi.org/10.1371/journal.pcbi.1012083.s005
(TIF)
S3 Fig. The output of the dimensions for all the capsules annotated by cell types.
https://doi.org/10.1371/journal.pcbi.1012083.s006
(TIF)
S4 Fig.
The biological pathways (Y axis) that the twenty capsules (from No.1 to No.20 on X axis) enriched using Gene Ontology (GO).
https://doi.org/10.1371/journal.pcbi.1012083.s007
(TIF)
S5 Fig. Protein-Protein interaction within capsule. The red dash line refers to the overall protein-protein interaction.
https://doi.org/10.1371/journal.pcbi.1012083.s008
(TIF)
S6 Fig. The data augmentation and normalization before transferring from single-cell to bulk RNAs.
A. The performance of using data augmentation. B. The performance of using domain-specific normalization called adaptive instance normalization.
https://doi.org/10.1371/journal.pcbi.1012083.s009
(TIF)
S7 Fig. The parameter tuning.
A. The performance of different numbers of routing iterations. B. The performance of different activation functions. C-D. The performance of different capsules. E-F. The performance of different neurons in the dense layer.
https://doi.org/10.1371/journal.pcbi.1012083.s010
(TIF)
References
- 1. Vincent JL, Opal SM, Marshall JC, Tracey KJ. Sepsis definitions: time for change. Lancet. 2013;381(9868):774–5. pmid:23472921; PubMed Central PMCID: PMC4535310.
- 2. Li Q, Zheng X, Xie J, Wang R, Li M, Wong M-H, et al. bvnGPS: a generalizable diagnostic model for acute bacterial and viral infection using integrative host transcriptomics and pretrained neural networks. Bioinformatics. 2023. pmid:36857587
- 3. Wu Q, Zheng X, Leung KS, Wong MH, Tsui SK, Cheng L. meGPS: a multi-omics signature for hepatocellular carcinoma detection integrating methylome and transcriptome data. Bioinformatics. 2022. Epub 2022/06/09. pmid:35674358.
- 4. Li H, Zheng X, Gao J, Leung K-S, Wong M-H, Yang S, et al. Whole transcriptome analysis reveals non-coding RNA’s competing endogenous gene pairs as novel form of motifs in serous ovarian cancer. Computers in Biology and Medicine. 2022;148:105881. pmid:35940161
- 5. Zheng X, Leung KS, Wong MH, Cheng L. Long non-coding RNA pairs to assist in diagnosing sepsis. BMC Genomics. 2021;22(1):275. Epub 2021/04/18. pmid:33863291; PubMed Central PMCID: PMC8050902.
- 6. Wang R, Zheng X, Wang J, Wan S, Song F, Wong MH, et al. Improving bulk RNA-seq classification by transferring gene signature from single cells in acute myeloid leukemia. Brief Bioinform. 2022;23(2). Epub 2022/02/10. pmid:35136933.
- 7. Al-Mualemi BY, Lu L. A Deep Learning-Based Sepsis Estimation Scheme. Ieee Access. 2021;9:5442–52. WOS:000608200200001.
- 8. Strickler EAT, Thomas J, Thomas JP, Benjamin B, Shamsuddin R. Exploring a global interpretation mechanism for deep learning networks when predicting sepsis. Scientific Reports. 2023;13(1):3067. pmid:36810645
- 9. Kam HJ, Kim HY. Learning representations for the early detection of sepsis with deep neural networks. Comput Biol Med. 2017;89:248–55. Epub 20170819. pmid:28843829.
- 10. Scicluna BP, Wiewel MA, van Vught LA, Hoogendijk AJ, Klarenbeek AM, Franitza M, et al. Molecular Biomarker to Assist in Diagnosing Abdominal Sepsis upon ICU Admission. American Journal of Respiratory and Critical Care Medicine. 2018;197(8):1070–3. WOS:000430039900019. pmid:28972859
- 11. McHugh L, Seldon TA, Brandon RA, Kirk JT, Rapisarda A, Sutherland AJ, et al. A Molecular Host Response Assay to Discriminate Between Sepsis and Infection-Negative Systemic Inflammation in Critically Ill Patients: Discovery and Validation in Independent Cohorts. PLoS Med. 2015;12(12). ARTN e1001916 WOS:000368451100006. pmid:26645559
- 12. Scicluna BP, Klein Klouwenberg PM, van Vught LA, Wiewel MA, Ong DS, Zwinderman AH, et al. A molecular biomarker to diagnose community-acquired pneumonia on intensive care unit admission. Am J Respir Crit Care Med. 2015;192(7):826–35. pmid:26121490.
- 13. Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. Advances in neural information processing systems. 2017;30.
- 14. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30.
- 15. Saito T, Rehmsmeier M. Precrec: fast and accurate precision–recall and ROC curve calculations in R. Bioinformatics. 2017;33(1):145–7. pmid:27591081
- 16. Chen W, Miao C, Zhang Z, Fung CS-H, Wang R, Chen Y, et al. Commonly used software tools produce conflicting and overly-optimistic AUPRC values. Genome Biology. 2024;25(1):118. pmid:38741205
- 17. Lacy P. Mechanisms of degranulation in neutrophils. Allergy Asthma Clin Immunol. 2006;2(3):98–108. Epub 20060915. pmid:20525154; PubMed Central PMCID: PMC2876182.
- 18. Sun SC. Non-canonical NF-kappaB signaling pathway. Cell Res. 2011;21(1):71–85. Epub 20101221. pmid:21173796; PubMed Central PMCID: PMC3193406.
- 19. Trinchieri G. Interleukin-12—a Proinflammatory Cytokine with Immunoregulatory Functions That Bridge Innate Resistance and Antigen-Specific Adaptive Immunity. Annu Rev Immunol. 1995;13:251–76. WOS:A1995QV29000010. pmid:7612223
- 20. Ivashkiv LB. IFNgamma: signalling, epigenetics and roles in immunity, metabolism, disease and cancer immunotherapy. Nat Rev Immunol. 2018;18(9):545–58. pmid:29921905; PubMed Central PMCID: PMC6340644.
- 21. Wahl SM, Feldman GM, McCarthy JB. Regulation of leukocyte adhesion and signaling in inflammation and disease. J Leukoc Biol. 1996;59(6):789–96. pmid:8691062.
- 22. Forrester SJ, Kikuchi DS, Hernandes MS, Xu Q, Griendling KK. Reactive Oxygen Species in Metabolic and Inflammatory Signaling. Circ Res. 2018;122(6):877–902. pmid:29700084; PubMed Central PMCID: PMC5926825.
- 23. Heinzel S, Marchingo JM, Horton MB, Hodgkin PD. The regulation of lymphocyte activation and proliferation. Curr Opin Immunol. 2018;51:32–8. Epub 20180203. pmid:29414529.
- 24. Bryant CE, Spring DR, Gangloff M, Gay NJ. The molecular basis of the host response to lipopolysaccharide. Nat Rev Microbiol. 2010;8(1):8–14. pmid:19946286.
- 25. Redlich K, Smolen JS. Inflammatory bone loss: pathogenesis and therapeutic intervention. Nat Rev Drug Discov. 2012;11(3):234–50. Epub 20120301. pmid:22378270.
- 26. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research. 2023;51(D1):D638–D46. pmid:36370105
- 27. Park DS, Chan W, Zhang Y, Chiu C-C, Zoph B, Cubuk ED, et al. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:190408779. 2019.
- 28. Huang X, Belongie S, editors. Arbitrary style transfer in real-time with adaptive instance normalization. Proceedings of the IEEE international conference on computer vision; 2017.
- 29. Reyes M, Filbin MR, Bhattacharyya RP, Billman K, Eisenhaure T, Hung DT, et al. An immune-cell signature of bacterial sepsis. Nat Med. 2020;26(3):333–40. Epub 20200217. pmid:32066974; PubMed Central PMCID: PMC7235950.
- 30. Baghela A, Pena OM, Lee AH, Baquir B, Falsafi R, An A, et al. Predicting sepsis severity at first clinical presentation: The role of endotypes and mechanistic signatures. EBioMedicine. 2022;75:103776. Epub 20220110. pmid:35027333; PubMed Central PMCID: PMC8808161.
- 31. Venet F, Schilling J, Cazalis MA, Demaret J, Poujol F, Girardot T, et al. Modulation of LILRB2 protein and mRNA expressions in septic shock patients and after ex vivo lipopolysaccharide stimulation. Hum Immunol. 2017;78(5–6):441–50. pmid:28341250.
- 32. Wong HR, Cvijanovich NZ, Allen GL, Thomas NJ, Freishtat RJ, Anas N, et al. Corticosteroids are associated with repression of adaptive immunity gene programs in pediatric septic shock. American journal of respiratory and critical care medicine. 2014;189(8):940–6. pmid:24650276.
- 33. Cazalis MA, Lepape A, Venet F, Frager F, Mougin B, Vallin H, et al. Early and dynamic changes in gene expression in septic shock patients: a genome-wide approach. Intensive Care Med Exp. 2014;2(1):20. pmid:26215705; PubMed Central PMCID: PMC4512996.
- 34. Sutherland A, Thomas M, Brandon RA, Brandon RB, Lipman J, Tang B, et al. Development and validation of a novel molecular biomarker diagnostic test for the early detection of sepsis. Crit Care. 2011;15(3):R149. pmid:21682927; PubMed Central PMCID: PMC3219023.
- 35. Shanley TP, Cvijanovich N, Lin R, Allen GL, Thomas NJ, Doctor A, et al. Genome-level longitudinal expression of signaling pathways and gene networks in pediatric septic shock. Mol Med. 2007;13(9–10):495–508. pmid:17932561; PubMed Central PMCID: PMC2014731.
- 36. Cvijanovich N, Shanley TP, Lin R, Allen GL, Thomas NJ, Checchia P, et al. Validating the genomic signature of pediatric septic shock. Physiol Genomics. 2008;34(1):127–34. pmid:18460642; PubMed Central PMCID: PMC2440641.
- 37. Wong HR, Cvijanovich N, Allen GL, Lin R, Anas N, Meyer K, et al. Genomic expression profiling across the pediatric systemic inflammatory response syndrome, sepsis, and septic shock spectrum. Crit Care Med. 2009;37(5):1558–66. pmid:19325468; PubMed Central PMCID: PMC2747356.
- 38. Wong HR, Cvijanovich N, Wheeler DS, Bigham MT, Monaco M, Odoms K, et al. Interleukin-8 as a stratification tool for interventional trials involving pediatric septic shock. American journal of respiratory and critical care medicine. 2008;178(3):276–82. Epub 05/29. pmid:18511707.
- 39. Jin Nana1,2; Cheng Lixin1; Geng Qingshan1. Multiomics on Mental Stress-Induced Myocardial Ischemia: A Narrative Review. Heart and Mind 8(1):p 15–20, Jan–Mar 2024. |
- 40. Li Xiaodan, Bai Yi, Tian Ci, Yang Fan, Fan Wenyang, Zhang Kuo and Ma Qingbian. “Effects of metaraminol and norepinephrine on hemodynamics and kidney function in a miniature pig model of septic shock” Journal of Translational Internal Medicine, vol. 12, no. 3, 2024, pp. 253–262.