Abstract
Graph neural networks (GNNs) process the graph-structured data using neural networks and have proven successful in various graph processing tasks. Currently, graph pooling operators have emerged as crucial components that bridge the gap between node representation learning and diverse graph-level tasks by transforming node representations into graph representations. Given the rapid growth and widespread adoption of graph pooling, this review aims to summarize the existing graph pooling operators for GNNs and their representative applications in omics. Specifically, we first present a comprehensive taxonomy of existing graph pooling algorithms, expanding the categorization for both global and hierarchical pooling operators, and for the first time reviewing the inverse operation of graph pooling, named unpooling. Next, we describe the general evaluation framework for graph pooling operators, encompassing three fundamental aspects: experimental setup, ablation analysis, and model interpretation. We also discuss open issues that significantly influence the design of graph pooling operators, including complexity, connectivity, adaptability, additional loss, and attention mechanisms. Finally, we summarize bioinformatics applications of graph pooling operators in omics, including graphs of gene interaction, medical images, and protein structures for drug discovery and disease diagnosis. Furthermore, we showcase the impact of graph pooling operators on research in specific real-world domains, with a focus on prediction performance and model interpretability. This review provides methodological insights in machine learning based graph modeling and related omics research, as well as an ongoing resource by gathering related papers and code in a dedicated GitHub repository (https://github.com/Hou-WJ/Graph-Pooling-Operators-and-Bioinformatics-Applications).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
A graph consisting of nodes and edges (or links) is a data structure to model complex real-world systems. Graphs are widely used to represent a wide range of relations between physical or conceptual entities, such as communication networks and social networks. Graph-structured biological data, such as gene regulatory networks (GRNs), protein–protein interaction (PPI) networks, and brain connectivity networks, are extensively growing in biomedical and bioinformatic domains, ranging from molecule structures to medical imaging systems. Graph Neural Networks (GNNs) have been introduced for geometric deep learning abstracting meaningful representations from graph-structured data (Bronstein et al. 2017; Wu et al. 2021b). Graph Convolutional Neural Network (GCN) is one of the early representative GNNs that propagates node information along edges and aggregates with their message passing mechanism in graph node classification (Ye et al. 2022). Theoretically, neighbor aggregation in GNNs can be viewed as an aggregation function over nodes in the graph, and a GNN with representational capacity should be able to differentiate distinct topologies (Xu et al. 2019). While individual nodes can easily aggregate the features of their neighbors, this localized summation does not capture the broader structural nuances of the graph. Sum aggregation alone cannot distinguish between graphs with the same node features but differing topologies. Consequently, this prompts the need for advanced methods capable of learning graph representations that encompass both node attributes and the graph’s inherent structural complexity.
Graph pooling is an essential component of GNNs for graph-level representations. The goal of graph pooling is to learn a graph representation that captures topology, node features, and other relational characteristics in the graph, which can be used as input to downstream machine learning (ML) tasks. Typically, there are two types of graph pooling: (1) global pooling or readout to condense the input graph into a single vector, and (2) hierarchical pooling to condense the input graph as a smaller-sized graph. Hierarchical pooling and global pooling work in different modules of GNNs, with hierarchical pooling employed in the feature extraction module and global pooling used to connect the feature extraction module with downstream tasks. Despite substantial differences in output and purpose, both types of pooling can be described using a unified framework. Global pooling can also be regarded as a special form of hierarchical pooling. Specifically, global pooling maps an arbitrary-sized original graph to a graph with only one node, and the node embedding of pooled graph serve as the graph representation of the original graph.
Graph pooling, to a certain extent, is inspired by pooling operators in Convolutional Neural Network based (CNN-based) tasks in computer vision. In CNNs, a downsampling or typical pooling layer can be defined as: \(pool\left(pixel\right)=P\left(\{CNN\left({pixel}{\prime}\right):{pixel}{\prime}\in \mathcal{N}\left(pixel\right)\}\right)\), where \(\mathcal{N}\left(pixel\right)\) is \(pixel\)’s neighborhood and \(P(\cdot )\) is a permutation-invariant function like an \({L}_{p}\)-norm (Bronstein et al. 2017). Pooling layers expand the CNN’s receptive field, enhance representation, and reduce sensitivity to input changes by transforming local data into abstracted high-level features. (Akhtar and Ragavendran 2020). For early attempts to generalize CNN architectures to the graph domain, namely spectral CNNs or spectral-based GCNs, the geometric analogy of pooling is graph coarsening, in which only a fraction of the graph nodes are retained (Bronstein et al. 2017). For the cross-domain spatial-based approaches, it is feasible to aggregate all local information into a single vector, but all spatial information will be lost after such pooling. These can be viewed as early prototypes of hierarchical graph pooling and global graph pooling.
Practical tasks like graph classification and graph clustering drive the transition from global to hierarchical pooling. Graph pooling is essential for reducing dimensions, adapting to variable graph structures, hierarchical extracting crucial substructures, and embedding knowledge into representations (Bacciu et al. 2020; Cheung et al. 2020). By creating smaller graphs, graph pooling minimizes parameters, curbing overfitting, oversmoothing, and computational load. (Wang et al. 2020c; Ye et al. 2022). Graph pooling operators, unlike coarsening techniques reliant on Laplace matrices, accommodate graphs with varying node counts. Graph pooling operators facilitate graph alignment by collapsing graphs of different sizes to coarsened versions with a uniform number of supernodes, allowing the mapping of graph signals to a standardized hypergraph structure. Hierarchical learning on key substructures explicitly extracts structural information, incorporates it into graph representation, and discovers the model’s prediction preference for certain parts of the graph. This preference helps researchers to understand how the model makes the decisions, and thereby understand potential patterns present in the graphs (Adnan et al. 2020; Li et al. 2021b; Tang et al. 2022; Zhang et al. 2023b).
Numerous reviews have summarized graph neural networks and representation learning (Zhou et al. 2020a, 2022; Makarov et al. 2021; Zhang et al. 2022), yet few have delved into graph pooling with limited methods (Liu et al. 2022a; Grattarola et al. 2022). Existing reviews on graph pooling offer taxonomies and mathematical overviews but tend to catalog methods, often missing recent advancements. They tend to concentrate on a select few classical pooling techniques, leading to an incomplete picture of global pooling and the variety in operator designs. Moreover, GNNs, along with graph pooling operators, have been widely and successfully applied to various real-world tasks, such as transportation systems (Rahmani et al. 2023), power systems (Liao et al. 2022), electronic design automation (Sánchez et al. 2023), and materials science (Gong and Yan 2021; Reiser et al. 2022), receiving significant attention and thorough reviews. However, applications on biological networks from omics data, a key topic in graph modeling, have not received adequate attention and organization.
High-throughput technologies have rapidly accumulated vast patient data, yet biomedical research still lacks sufficient knowledge. Computational bioinformatics is crucial for managing and utilizing this data, especially in precision medicine and cancer research, where integrating multi-omics data offers unprecedented opportunities for understanding complex diseases. Now, omics is a broad field in biological sciences that characterizes and quantifies biological molecules to understand an organism’s structure, function, and dynamics (Kaur et al. 2021). Omics began with genomics, focusing on the whole genome, rather than single genes or variants, and it has since expanded to include various disciplines, each targeting different biomolecules or processes, such as the proteome, transcriptome, and metabolome. Data types have also evolved from traditional structured formats to non-structured, semi-structured, and heterogeneous architectures with diverse characteristics (Li et al. 2022a). Despite Zhang et al. and Li et al. summarizing the success of ML and deep neural networks on omics data, many graph-based methods remain unreviewed and unsystematized (Zhang et al. 2019b; Li et al. 2022a).
To close these gaps, this paper thoroughly reviews current global and hierarchical global pooling operators and summarizes the applications of graph pooling operators in omics data as a notable example of broad applicability for real-world domains. Specifically, the main contributions of our paper are as follows: (1) we propose a taxonomy for global pooling, extend the classification of hierarchical pooling, and provide reviews for hybrid pooling, edge-based pooling, and inverse operation of graph pooling for the first time; (2) we discuss the evaluation framework for graph pooling operators and several open issues related to the design and application; (3) we summarize the representative bioinformatics application in omics data, demonstrating how they enhance predictive performance, provide model interpretability, and drive research advancements in specific practical domains.
This survey includes conference and journal publications on graph pooling operators and related omics applications indexed by the Web of Science (WoS) and published between April 2014 and March 2024. We also included several preprints (arXiv papers) that have not been peer-reviewed or formally published in March 2024. This review is organized as follows: Sect. 2 briefly describes definitions of GNNs (Sect. 2.1), and related surveys on graph pooling and omics applications (Sect. 2.2) with the aim of explaining key concepts for the uninitiated reader. Section 3 details the taxonomy and computational flows of graph pooling (Sects. 3.1 and 3.2), the inverse operation of pooling (Sect. 3.3), evaluation framework (Sect. 3.4.1), and open problems of graph pooling operators (Sects. 3.4.2–3.4.4). It seeks to update researchers on the latest in graph pooling and provide a roadmap for developing and accessing new operators. Section 4 analyzes the representative applications on omics, including genomics (Sect. 4.1), radiomics (Sect. 4.2), and proteomics (Sect. 4.3), highlighting the necessary adaptations, advantages, and persistent challenges of graph pooling in practical contexts. This section also aims to help bioinformatics researchers in choosing appropriate pooling methods for similar scenarios. Section 5 concludes the survey, and outlines prospective research directions in graph pooling.
2 Preliminaries
2.1 Definitions
Graph. Given a graph \(G=(V, E, {\varvec{X}}, {\varvec{A}})\) where \(V\) is the set of nodes with \(N=|V|\), \(E\) is the set of edges, \({\varvec{A}}\in {\mathbb{R}}^{N\times N}\) denotes the adjacency matrix and \({\varvec{X}}\in {\mathbb{R}}^{N\times d}\) denotes the node feature matrix in which each node has \(d\) features.
Graph convolution network. Given a GNN architecture with \(L\) layers of graph convolutions, the \(l\)-th layer computes the node representation \({{\varvec{h}}}_{v}^{l}\in {{\varvec{H}}}^{l}\) for a node \(v\in V\) by the neighborhood aggregation function (i.e., message passing (Gilmer et al. 2017)) and \({{\varvec{H}}}^{0}={\varvec{X}}\):
where \(Update(\cdot , \cdot )\) is a learnable function with distinct weights at each layer for generating new node representations, \(Aggregation(\cdot )\) is a general learnable permutation-invariant function for receiving message from neighborhood, and \(\mathcal{N}(v)\) denotes neighborhood of node \(v\).
Graph representation learning. The task of graph representation learning is to learn the latent features of all nodes \({\varvec{H}}=\{{{\varvec{h}}}_{1}, ... , {{\varvec{h}}}_{N}\}\), and get the representation \({{\varvec{H}}}_{G}\) for the entire graph \(G\).
Graph classification. Given a set of labeled graphs \((\mathcal{G}, \mathcal{Y})=\{({G}_{1}({V}_{1}, {E}_{1}, {{\varvec{X}}}_{1}, {{\varvec{A}}}_{1}),{ y}_{1}), ({G}_{2}({V}_{2}, {E}_{2}, {{\varvec{X}}}_{2}, {{\varvec{A}}}_{2}), {y}_{2}), ...\}\) where \({y}_{i}\) is the label of\({G}_{i}\), the task of graph classification is to learn a mapping function \(\mathcal{F}:\mathcal{G}\to \mathcal{Y}\) that maps the set of graphs \(G\) to the set of labels \(\mathcal{Y}\) to predict the discrete labels for unknown graphs.
Graph regression. The task of graph regression consists of approximating a function \({\mathcal{F}}_{R}:\mathcal{G}\to Y\) where \(\mathcal{Y}\in {\mathbb{R}}\) is the set of ground truth and predicting the continuous proprieties of graphs.
Graph signal classification. A graph signal \({\varvec{X}}\in {\mathbb{R}}^{N\times d}\) is defined as the matrix containing the features of the nodes in the graph. Given a set of labeled node feature matrices \((\mathcal{X}, \mathcal{Y})=\{({{\varvec{X}}}_{1},{ y}_{1}), ({{\varvec{X}}}_{2}, {y}_{2}), ...\}\) where node feature matrices constitute signals supported on the same graph\({G}_{sup}\), the task of graph signal classification is to learn a mapping function \({\mathcal{F}}_{sup}:({G}_{sup}, \mathcal{X})\to \mathcal{Y}\) that maps signals on \({G}_{sup}\) to the labels and predict the labels for unknown signals.
Hierarchical pooling. A hierarchical pooling operator can be defined as a function \({\mathcal{F}}_{P}\) that maps a graph \(G=(V, E, {\varvec{X}}, {\varvec{A}})\) to a coarsened graph \({G}_{p}=({V}_{p}, {E}_{p}, {{\varvec{X}}}_{{\varvec{p}}}, {{\varvec{A}}}_{{\varvec{p}}})\), where generally \(|{V}_{p}|<|V|\) and \({{\varvec{X}}}_{{\varvec{p}}}\) and \({{\varvec{A}}}_{{\varvec{p}}}\) are transformed from \({\varvec{X}}\) and \({\varvec{A}}\) using matrix multiplication or indexing operation, as in:
Global pooling (Readout). A global pooling operator, also called readout function, computes a graph representation vector \({{\varvec{h}}}_{G}^{l}\in {\mathbb{R}}^{{N}^{l}}\) for a graph \(G\) from its node representations \({{\varvec{H}}}^{l}\in {\mathbb{R}}^{{N}^{l}\times {d}^{l}}\) of the \(l\)-th layer, as in:
2.2 Related works
Even though there are already several reviews summarizing graph neural networks or graph representation learning algorithms (Zhou et al. 2020a, 2022; Makarov et al. 2021; Zhang et al. 2022), only a few works have exclusively focused on graph pooling (Liu et al. 2022a; Grattarola et al. 2022). These reviews included taxonomies and mathematical descriptions of existing methods, as well as a summary of implementation and operational frameworks. Grattarola et al. elucidate graph pooling as a combination of three main operations: selection, reduction, and connection, so that all graph pooling operators can be unified under a common framework (Grattarola et al. 2022). They propose taxonomy of pooling operators based on the four properties: trainability, density of the supernodes, adaptability, and hierarchy. Similarly, Liu et al. divide pooling operators into two categories, namely flat pooling and hierarchical pooling, and propose universal and modularized frameworks for describing the process of node clustering pooling and node drop pooling (Liu et al. 2022a). Due to the limited number of reviewed literatures, earlier works mostly focused on hierarchical pooling and emphasized the commonalities among pooling operators, resulting in a less comprehensive overview of global pooling and the diversity of pooling operator designs (Liu et al. 2022a; Grattarola et al. 2022). Yang et al. identified graph pooling as one of the four representative algorithms for graph-level learning and further explored the subcategories within global and hierarchical pooling (Yang et al. 2023a). In their taxonomy, global pooling encompasses numeric operations, attention-based, CNN-based, and global top-K methods, while hierarchical pooling is divided into three branches: clustering-based, hierarchical top-K, and tree-based.
Moreover, the application of graph neural networks in bioinformatics has gained significant attention. High-throughput omics data analysis for reconstructing biological networks is challenging, yet it enables the creation of varied networks including PPI, GRN, and networks related to metabolism, brain, and diseases (Sulaimany et al. 2018). Muzio et al. discuss the current domains in bioinformatics where GNNs are extensively applied, including proteomics, drug development and discovery, disease diagnosis, metabolic and GRNs (Muzio et al. 2021). The rise of single-cell sequencing has accelerated the generation of omics datasets, enhancing insights into cellular diversity and function. Consequently, it has fostered numerous cell and gene-centric graphs, highlighting GNNs as a key tool for single-cell analysis (Hetzel et al. 2021; Lazaros et al. 2024). Liu et al. reviewed and compared the performance of various GNN approaches for spatial clustering tasks on spatial transcriptomics (Liu et al. 2024). Zhang et al. surveyed deep learning’s role in genomics, transcriptomics, and proteomics, offering a streamlined guideline for resolving omics problems using deep learning (Zhang et al. 2019b). Li et al. explored the integration of artificial intelligence with an extensive spectrum of omics fields, including genomics, transcriptomics, proteomics, metabolomics, radiomics, as well as single-cell omics (Li et al. 2022a).
3 Graph pooling
Graph pooling operators, essential for downsampling in GNNs, transform a graph signal defined on the input graph to a matching graph signal defined on the coarsened graph, typically with fewer nodes. These methods fall into two categories: global pooling and hierarchical pooling (Zhou et al. 2020a, 2022; Liu et al. 2022a; Ye et al. 2022; Zhang et al. 2022). Global pooling condenses the graph to a representation vector, leveraging node attributes to enhance representational capacity, though often at the expense of structural data, local structural detail particularly (Bronstein et al. 2017; Xu et al. 2019; Murphy et al. 2019; Chen et al. 2021). It can be viewed as a special case of hierarchical pooling, in which the entire graph is collapsed or aggregated to one node. Hierarchical pooling, on the other hand, maintains significant substructures and adjacencies, allowing the preservation of important graph features while reducing complexity. Communities of nodes, and representative nodes or edges, can be identified as significant substructures and selected to build coarsened graphs (Li et al. 2020a; Tang et al. 2021; Yu et al. 2022). This section provides a comprehensive review of these operators, detailing their classifications, computational flow, and integration within GNN architectures. We discuss global pooling in Sect. 3.1, delve into the more intricate hierarchical pooling in Sect. 3.2, and explore unpooling, the inverse operation of pooling, in Sect. 3.3. In Sect. 3.4, we focus on several pioneering efforts in evaluating pooling operators, including benchmark datasets, recent available libraries, experimental and theoretical comparison, and suggestions for evaluating operators. And we also discuss several key considerations in their design and implementation, including computational complexity, network connectivity, adaptivity, additional loss functions, and the incorporation of attention mechanisms.
In this review, we propose general computational flows for the most prevalent pooling methods, as shown in Fig. 1, as well as a comprehensive categorization based on the core ideas. The frequently used methods for summarizing node features in global pooling can be categorized into the following groups, in order of increasing complexity: simple permutation-invariant functions, grouping and cascading, weighted summation based on attention, and learnable readout functions. Hierarchical pooling can be categorized into clustering pooling, selection pooling, edge pooling, and hybrid pooling based on the strategies used to retrieve relevant local structures. The term "hybrid pooling" refers to a group of pooling operators that use multiple techniques and consider the properties of various strategies.
Figure 2 depicts the core mechanisms of pooling. The global pooling (Shown in Fig. 2a) aims to convert the graph into representation vectors. Clustering pooling considers local structures to connected subgraphs (i.e., node communities or node clusters), whereas selection pooling considers important local structures to be representative key nodes, and edge pooling focuses on edges rather than nodes. Clustering pooling (Fig. 2b) groups the nodes and aggregates the nodes in the same cluster as the supernode. The selection pooling (Fig. 2c) scores each node, thus retaining the top-ranked nodes and discarding the others. There are two kinds of edge pooling strategies: edge contraction (Fig. 2d) and edge deletion (Fig. 2e). The former chooses an edge and merges its connected vertices, whereas the latter chooses an edge and only retains its connected vertices, discarding other edges and nodes. Unpooling is the inverse operation of pooling utilized for upsampling on nodes and is mainly used to restore the coarsened graph to its earlier version, the fine graph. Graph unpooling (Fig. 2f) restores the original graph structure while restoring representations of dropped nodes via the current node representations.
In general, graph pooling operators are often applied to two levels of tasks: node-level tasks such as node classification and link prediction, and graph-level tasks in inductive learning. Aside from the most common graph-level task, graph classification, various GNNs can be used for graph regression, graph signal classification, graph generation, and graph reconstruction. Architectures of commonly used GNN are summarized in Fig. 3. GNN of the simple structure (Shown in Fig. 3a) consists of several consecutive GCN layers, a pooling layer, and some fully connected (FC) layers. FC layers with different activation functions can be considered as multilayer perceptron (MLP) for specialized tasks such as graph classification or graph regression. To reduce the scale of the graphs and extract features layer by layer, a hierarchical graph neural network (HGNN, shown in Fig. 3b) comprises many graph pooling operators interspersed with GCN layers. To incorporate information from coarsened graphs of varying scales and generate a more robust representation, a variation of the HGNN shown in Fig. 3c incorporates jump connections from Jumping Knowledge Networks (JK nets) (Cangea et al. 2018; Xu et al. 2018). The coarsened graphs have multiple ways of aggregating the readout results at each scale, including concatenation, addition, weighted summation, and parameterized methods (Chen et al. 2022c). Due to its capacity to retain information from jump connections as a residual network, the HGNN with jumping connections, called JK-net-style hierarchical architecture, has become the dominant form of GNNs with hierarchical pooling for graph-level tasks (He et al. 2016). Another HGNN version shown in Fig. 3d is known as parallel hierarchical pooling or multi-channel pooling. Different from the previous pooling operator, the subsequent pooling operator in this architecture is conducted on the input graph or the updated graph after message passing, thus the pooling operators are parallel to each other. Furthermore, each pooling operator focuses on different parts of the graph for the same graph structure, which makes multi-channel pooling. In this architecture, pooling operators from different channels will result in hypergraphs with distinct structures by scoring the same node differently or clustering the nodes differently (Roy et al. 2021; Xu et al. 2022). Additionally, the U-Net network structure from the computer vision field has been adapted to GNNs in recent studies, as shown in Fig. 3e (Ronneberger Olaf and Fischer 2015). Graph unpooling is used in tandem with the pooling operators to generate descent and ascent channels in U-Net. Such graph U-nets are talented at juggling graph classification and node classification (Gao and Ji 2019, 2022).
3.1 Global pooling
The global graph pooling operators are employed as readout functions to transform the graph into a single low-dimensional dense vector. In practical usage, global pooling generally has wider applications than hierarchical pooling. Based on complexity and representation capability, representative methods of global pooling operators can be categorized into four categories, including simple functions, grouping/cascading, attention, and learnable functions, as in Tables S1–S4 (Supplementary File 1). Simple functions refer to simple permutation-invariant functions (Sect. 3.1.1), and Grouping/Cascading (Sect. 3.1.2) refers to a class of graph representation methods in grouping or cascading nodes. Attention refers to the method of weighted summation of node representations, with the weights typically filled by attention coefficients (Sect. 3.1.3). Learnable functions (Sect. 3.1.4) are parametric approaches, particularly utilizing neural networks.
3.1.1 Simple permutation-invariant functions
Simple permutation-invariant functions act on the features of the nodes in the graph while ignoring the connection relationship, namely the Sum, Mean, and Max functions. Sum and Mean functions were the first global pooling technique implemented (Duvenaud et al. 2015; Atwood and Towsley 2016). In early convolutional neural networks on graphs, the graph representation is created by summarizing the representations of all nodes in the graph with the Sum function, as shown in Eq. (4) (Duvenaud et al. 2015). The Mean function is essentially equal to summation and differs only in multiplicative factors, as shown in Eq. (5) (Sun et al. 2021; Pham et al. 2021; Bianchi et al. 2022b). The Max pooling refers to a more sophisticated element-wise Max function (Simonovsky and Komodakis 2017; Gao et al. 2021a). According to the theoretical representational power on multisets, the Sum has the strongest representation power, while the Mean is better than the Max function (Xu et al. 2019). Thus, many GNNs adopt Sum pooling as readout functions in practice (Duvenaud et al. 2015; Li et al. 2018; Morris et al. 2019; Yang et al. 2021b; Bacciu et al. 2021). Theoretically, calculating the mean value of node representations utilizes first-order statistical information on node representation. The node representation matrix’s second-order statistics can also be utilized, although certain adjustments are necessary to address problems such as large dimensionality (Wang and Ji 2023). The simple permutation-invariant readout function has the advantages of being easy to implement, easy to understand, and computationally efficient. These simple permutation-invariant readout functions guarantee permutation invariance by definition, which provides the same representation for isomorphic graphs and is robust to graph perturbations. On the other hand, these simple permutation-invariant readout functions, treat each node equally and fail to discern significant structures for isomorphic graphs.
3.1.2 Grouping and cascading
To obtain a more expressive representation, grouping and cascading improve the graph representations by concatenating different representations. Methods in this category often start with the most expressive Sum function and cascade it with other functions (Bacciu et al. 2021; Gao et al. 2022a), but the range of values drifts due to the Sum function. One of the notable differences is that the representations produced by the Sum functions frequently differ in order of magnitude from the other representations. The concatenation of Mean and Max has been the most popular cascade approach, and is often selected as the default readout option of GNNs (Cangea et al. 2018; Luzhnica et al. 2019; Lee et al. 2019; Zhang et al. 2019a, 2021b; Qin et al. 2020; Yu et al. 2021, 2022; Bi et al. 2021). Figure 3c depicts another cascading form: connecting different layers’ readout results. DropGNN aggregates the node representations using Mean operation when running GNN independently multiple times before invoking the graph readout function, and the graph representations in each run are aggregated using an auxiliary readout function trained with auxiliary loss (Papp et al. 2021). Generally, any GNN with readout modules needs to be cascaded, and the representations need to be concatenated or added together. Eqs. (6 and 7) show the formulas for function-level and layer-level cascading, where \(CONCAT(\cdot )\) and \(||\) denote concatenation operations.
Although cascading focuses on the most salient features of certain nodes, it cannot explicitly distinguish nodes with diverse statuses. Based on the grouping strategy, the global pooling operators used the divide and conquer principle, which refers to a set of methods dividing nodes into different groups before aggregating and cascading the group representations. DEMO-Net grouped nodes according to their degrees, grouping nodes with the same degree and allowing the readout scheme to learn a graph representation within a degree-specific Hilbert kernel space (Wu et al. 2019). Another strategy is to learn each node’s positions of graph structure, as well as its assigned communities (Roy et al. 2021; Li and Wu 2021; Lee et al. 2021). Roy et al. develop a structure-aware pooling readout that generates pooled representations for individual communities and identifies different substructures by utilizing topological indicators such as degree, clustering coefficient, and betweenness centralities, among other metrics (Roy et al. 2021). To keep the structural consistency for any input graph, SSRead (Structural Semantic Readout) predefines a specific number of structural positions and then maps the node representation to the position representation, i.e., each hidden vector is aligned with the semantically closest structural position (Lee et al. 2021). In other words, nodes are grouped according to their structural position. Aggregating node representations within the same group can be considered as a subproblem, and other approaches, such as simple functions, attention, and other global pooling methods, can be applied (Su et al. 2021; Duan et al. 2022). This is comparable to hierarchical pooling methods based on clustering or node selection, with the distinction that graph representation vectors are created directly without the need for intermediary graphs.
3.1.3 Weighted summation based on attention
For the issues that each node equally contributes to the output representation and multiple graphs mapping to the same representation, one popular solution is to replace a simple summation operation with a weighted sum of all node representations in the graph (Chen et al. 2019b; Aggarwal and Murty 2021). Furthermore, with the prevalence of attention mechanisms in deep neural networks, attention scores are found to be suited for variable weighting. In general, the readout function, which is an attention-based weighted summation, can be defined as Eq. (8), in which \(\tau (\cdot )\) denotes a linear or nonlinear transformation (Gilmer et al. 2017; Fan et al. 2020; Itoh et al. 2022). In practice, either matrix multiplication (Chen et al. 2019b; Wang and Ji 2023) or a variety of neural networks such as long short-term memory networks (LSTM) (Vinyals et al. 2016), GCN (Meltzer et al. 2019), MLP (Gilmer et al. 2017; Chen et al. 2019b, a; Li et al. 2019; Fan et al. 2020; Baek et al. 2021; Itoh et al. 2022), and many others can build differentiable attention mechanisms for training and learning in an end-to-end fashion.
The key advantage of attention is its capability to quantitatively treat each feature differently, enabling the model to discern the essential information for classification and amplify the contribution of this relevant information through attention coefficients. To avoid identical attention coefficients, the Frobenius norm is introduced as a penalty term for the attention coefficient in the Self-attentive Graph Embedding (SAGE) method (Li et al. 2019). Inspired by the Transformer architecture, Graph Multiset Transformer (GMT) uses attention-based blocks to condense the graph into a few important nodes and consider their interaction (Vaswani et al. 2017; Baek et al. 2021). In this approach, the mapping of general nodes to important nodes is based on a multi-head attention block with key-value pairs, while a self-attentive function evaluates the interaction of significant nodes, and the important nodes are then mapped back to represent the entire graph. Another advantage of attention pooling is its capability to address the challenge of learning a fixed-size graph representation for graphs with varying dimensions, all while preserving permutation invariance (Meltzer et al. 2019; Chen et al. 2019a). In the Dual Attention Graph Convolutional Networks (DAGCN), the self-attention pooling layer learns several graph representations in different spaces and returns a fixed-size matrix graph embedding. Each row is one representation learned by weighted summation in one space, while the number of spaces is a tunable hyperparameter (Chen et al. 2019a).
3.1.4 Learnable readout functions
Attention-based weighted summation operators are a special subset of learnable readout functions (Wu et al. 2021a). All other approaches with trainable parameters, except learning a coefficient for each node, can be categorized as learnable readout functions. Hence, learnable graph pooling belongs to a group of global pooling with higher complexity, and it is difficult to express a general formula for the variety of implementation strategies. SortPooling is the first approach using learnable global pooling, it learns the graph presentation by cropping graph nodes to a fixed size and fed into a CNN (Zhang et al. 2018). SortPooling considers the last layer’s output to be the nodes’ most refined continuous Weisfeiler-Lehman (WL) colors, then sorts all nodes by these final colors and drops the nodes with lower ranks. In addition to cropping the node representation matrix, they use graph structure alignment to create uniformly sized graph representation matrices that align with the convolutional layers (Yuan and Ji 2020; Bai et al. 2021; Xu et al. 2022). Bai et al. present a node matching framework for transitively aligning the nodes of a family of graphs by gradually minimizing the inner-node-cluster sum of squares over all graph nodes, and this framework maps graphs of arbitrary sizes into fixed-sized aligned node grid structures (Bai et al. 2021). Another learnable pooling is an LSTM on a global representation, in which the network concentrates on one portion of the graph at a time and gradually puts its representation into memory (Lee et al. 2018). DKEPool expects to learn the distribution of node features in the graph through the Gaussian Manifold in the non-linear distribution space (Chen et al. 2022b). Murphy et al. propose an idealized framework, named Relational Pooling (RP), in which the representational power exceeds the WL isomorphism test by exhausting all permutations and then summing and averaging their representations (Murphy et al. 2019). Viewing readouts as learning over sets, Navarin et al. present a general formulation capable of encoding or approximating any continuous permutation-invariant functions over sets, facilitating the mapping from the set of node representations to a fixed-size vector (Navarin et al. 2019). However, due to the extreme complexity of this framework, its direct implementation is usually impractical, hence RP and Local Relational Pooling (LRP) propose computationally tractable or targeted approximation approaches (Murphy et al. 2019; Chen et al. 2021). Buterez et al. investigated the potential of adaptive readout provided by various neural networks on more than 40 datasets from different areas, and their experimental results demonstrate that constraints on permutation invariance can be relaxed in some specific tasks (Buterez et al. 2022).
3.2 Hierarchical pooling
Hierarchical graph pooling operators transform a graph into a coarsened graph with fewer nodes and edges. Hierarchical pooling is inspired by regular pooling in CNNs in grid structures, serves a similar role to downsampling, or is considered a neural network implementation for graph reduction, graph coarsening, and graph sparsification. Commonly, hierarchical pooling is used in conjunction with the global pooling operators to readout the coarsened graph. We categorize the hierarchical pooling operators into four groups based on the strategy used to construct the condensed graph, including clustering pooling, selection pooling, edge pooling, and hybrid pooling. Their representative methods are listed in Tables S5–S8 (Supplementary File 1), and each of these types of pooling is discussed in detail in Sects. 3.2.1 to 3.2.4.
3.2.1 Clustering pooling
Based on the assumption that each node is a member of a potentially significant substructure, clustering pooling transfers nodes from the input graph to nodes of the coarsened graph. The nodes in the coarsened graph can also be referred to as supernodes, and the coarsened graph can also be referred to as a hypergraph since each node represents a substructure of the original graph, where clustering pooling operators can benefit from existing community detection and graph clustering algorithms. A real-life application analogous to this assumption can be found in molecular structures, where functional groups are considered as communities of atoms, and molecules are assemblies of several functional groups. In addition to discovering the alignment between nodes and supernodes, clustering pooling requires learning the representation of supernodes and determining the links between supernodes. The last two could be summarized as learning hypergraphs, whose central task is node clustering. Formally, we describe a generic computational flow for clustering pooling in the following steps.
Step 1, Node Clustering: Given a potential hypergraph \({G}_{hyper}\) with node set \({V}_{hyper}\), we assume \({|V}_{hyper}| < |V|\) and define a surjection \({f}_{clus}: V\to {V}_{hyper}\), also known as the vertex mapping function or clustering function, in which each node in \(V\) corresponds to at least one node in \({V}_{hyper}\). The variation in the clustering function \({f}_{clus}\) results in a significant difference in clustering pooling. Furthermore, permutation invariance of the clustering functions is required for clustering pooling permutation invariance. An explicit or implicit cluster assignment matrix \({\varvec{S}}\in {\mathbb{R}}^{|V|\times |{V}_{hyper}|}\), with each row representing a node and each column representing a supernode, can be used to describe the outcome of node clustering. The pooling ratio \(r\) is defined as the number of clusters to the number of nodes, i.e.,\(r={|V}_{hyper}|/|V|\). The pooling ratio is a hyperparameter in both deterministic algorithms and parametric networks since it decides the size of the hypergraph, which impacts the computational complexity of the algorithm and the quantity of maintained information.
Step 2, Learning Hypergraph: Step 1 specifies which supernodes are included in the hypergraph, but it does not establish what features the supernodes have and how they are linked to one another. Hence, we revised the hypergraph created in the previous phase to add the two items listed below. Step 2.1, Learning the representation of supernodes: This step can be considered as a local readout operation, which can use any readout function described in Sect. 3.1. One of the classical methods is to use the cluster assignment matrix to conduct a weighted summation in Eq. (9), where \({{\varvec{Z}}}^{l+1}\) is the representation matrix of supernodes that has not yet performed message passing.
Step 2.2, Learning the hyperedges: Intuitively, if two clusters are adjacent, at least one node in each cluster is adjacent to a node in the other cluster. Similarly, the transformation of a node’s adjacency matrix yields the adjacency matrix of a supernode. Hence, the hypergraph’s adjacency matrix \({{\varvec{A}}}^{l+1}\) can be calculated using the original graph’s adjacency matrix \({{\varvec{A}}}^{l}\) and the cluster assignment matrix \({\varvec{S}}\), as shown in Eq. (10).
In addition, pooling methods can also include task-specific procedures such as edge sparsification and cluster selection. Specifically, clustering pooling can be categorized into three types: graph clustering pooling, soft clustering pooling, and rule-based central node pooling.
3.2.1.1 Graph clustering pooling
Graph clustering pooling refers to the pooling operators employing deterministic algorithms of graph clustering, community detection, and graph topology. In the early stages, researchers employed pre-existing graph clustering techniques without modification, including hierarchical clustering, spectral clustering, and others (Bruna et al. 2014; Henaff et al. 2015; Defferrard et al. 2016; Monti et al. 2017). Defferrard et al. introduced the Graclus algorithm to generalize the CNN framework from low-dimensional regular grids to high-dimensional irregular graphs, which has been proven to be highly successful in clustering a large number of diverse graphs (Dhillon et al. 2007; Defferrard et al. 2016). Graclus has been widely used as a base algorithm for clustering pooling with its capability to calculate successive coarser versions of input graphs (Fey et al. 2018; Levie et al. 2019; Bianchi et al. 2022a). However, these pooling methods are not designed with contemporary neural network models, which limits their adaptability currently. EigenPooling is the first graph clustering pooling operator integrated with current GNN frameworks (Ma et al. 2019). It employs spectral clustering to obtain a controllable number of subgraphs while considering both local and global properties. HaarPooling is a spectral graph pooling operator that relies on compressive Haar transforms to filter out fine-detail information in the Haar wavelet domain, resulting in the creation of a sparse coarsened graph (Wang et al. 2020c; Zheng et al. 2023). More methods in spectral graph coarsening, graph reduction, and graph sparsification are reviewed elsewhere (Shuman et al. 2016; Loukas 2019; Bravo-Hermsdorff and Gunderson 2019). Tsitsulin et al. introduced DMoN (Deep Modularity Networks), an unsupervised module designed to optimize cluster assignments with the objective which combines spectral modularity maximization and collapse regularization (Tsitsulin et al. 2023). WGDPool (Weighted Graph Dual Pooling) is a graph clustering pooling algorithm that provides a differentiable k-means clustering variant, utilizing a Softmin assignment based on node-to-centroid distances (Xiao et al. 2024).
In order to obtain interpretable clustering results, CommPOOL provides a community pooling mechanism that captures the inherent community structure of the graph in an interpretable way, using an unsupervised clustering method Partitioning Around Medoids (PAM) on the node latent feature vectors (Tang et al. 2021). Other graph clustering pooling operators use only graph topology to discover node clusters, and these methods are usually non-parametric and make it easy to interpret the clustering results. Luzhnica et al. calculated the nodes’ maximal cliques using a modified Bron-Kerbosch algorithm (Luzhnica et al. 2019). SEP measures the complexity of hierarchical graph structure using structural entropy and globally optimizes hierarchical cluster assignment by minimizing structural entropy (Wu et al. 2022). KPlexPool is based on the concepts of graph covers and k-plexes, enabling a more flexible definition of cliques and guaranteeing the complete coverage of cliques on nodes. This ensures the clustering function is a surjection (Bacciu et al. 2021). Bacciu et al. proposed a clustering pooling method based on the Maximal k-Independent Sets (k-MIS) graph theory, which is designed to detect nodes that maintain a minimum distance of k from each other in a graph (Bacciu et al. 2023). Graph Parsing Network (GPN) utilizes a bottom-up graph parsing algorithm, similar to grammar induction, inferring clusters from nodes and learning personalized pooling structure for each graph (Song et al. 2024).
3.2.1.2 Soft clustering pooling
Soft clustering pooling refers to a differentiable pooling operation with learnable parameters in computing the cluster assignment matrix. DiffPool is the first soft clustering pooling operator, which learns a clustering assignment matrix using GraphSAGE (Ying et al. 2018). The soft cluster assignment matrix \({\varvec{S}}\) is calculated in Eq (11), and the adjacency matrix and node representation of the hypergraph is calculated in Eqs. (9 and 10):
where the \(Softmax(\cdot )\) function is applied in a row-wise fashion. It’s important to note that the assignment coefficient between any node and the supernode is between 0 and 1 in the soft assignment matrix. This contrasts the cluster assignment relationship derived by the deterministic algorithm, which usually only contains 0 or 1. This implies that the soft assignment matrix provides a more expressive representation of how a node can be assigned to multiple clusters with varying probabilities or contribute to forming multiple clusters at different degrees. DiffPool is widely adopted as the benchmark for differentiable pooling, and many follow-ups made their contributions to DiffPool, including replacement modules with more powerful GCN (Bandyopadhyay et al. 2020), parameters reduction with the merge of GCNs for learning representation and cluster assignment (Pham et al. 2021), and multi-channel pooling mechanisms (Zhou et al. 2020b; Liang et al. 2020). Ying et al. refined dense clustering pooling by incorporating persistent homology, simplifying the coarsened graphs (Ying et al. 2024). The key procedure entails resampling the adjacency matrix using Gumbel-softmax, applying persistence injection, and directing the training with a topological loss function.
Furthermore, conditional random fields (CRFs) and Non-Negative Matrix Factorization (NMF) can be applied in soft clustering pooling to capture node-cluster connections (Bacciu and Di Sotto 2019; Yuan and Ji 2020). Meanwhile, classical graph algorithms are also integrated into GNN systems; for example, GNN implementations of spectral clustering and Graph Mapper-based soft clustering are competitive in terms of theoretical and practical performance (Maria Bianchi et al. 2020; Bodnar et al. 2021). Unlike the above methods, graph classification methods based on graph capsule networks also incorporate the concept of clustering without defined explicit clusters and cluster assignment matrices (Xinyi and Chen 2018; Yang et al. 2021a). Node capsules are connected to graph capsules in a graph capsule network. Each graph capsule represents a meaningful substructure or node feature space, and the graph capsules are linked to classification capsules to provide classification results. Dynamic routing, which corresponds to the cluster assignment matrix in cluster assignment, is used to connect these capsules.
3.2.1.3 Rule-based central node pooling
The last category of clustering pooling is rule-based central node pooling, which generates clusters formed and filtered around a central node with predefined rules. The differences in these pooling strategies center on two fundamental questions: How are the central nodes decided? How are clusters formed and selected? One approach is to treat each node as a central node and create a cluster structure around it using specific criteria, often involving the node’s first-order or second-order neighborhood (Ranjan et al. 2020; Su et al. 2021; Yu et al. 2021, 2022; Li et al. 2022c). A popular approach is Adaptive Structure Aware Pooling (ASAP), which treats each node’s first-order neighbors as clusters, carries local soft cluster assignment learning, and then coarsens the graph using clusters as nodes again (Ranjan et al. 2020). Another option is to select a predefined number of nodes as central nodes using heuristic methods or topological metrics, such as degrees or locally normalized neighboring signals (Noutahi et al. 2019; Sun et al. 2021).
To generate clusters, a parametric approach involves mapping the non-central nodes into the clusters represented by the central nodes and learning an assignment matrix (Noutahi et al. 2019; Su et al. 2021). On the other hand, non-parametric approaches often identify multi-hop neighborhoods of nodes (Ranjan et al. 2020; Yu et al. 2021, 2022; Li et al. 2022c) or subgraphs (Sun et al. 2021) as clusters depending on the nodes’ local connectivity. All approaches that do not impose a limit on the number of central nodes require cluster filtering based on cluster fitness scores. The fitness score can be calculated using Local Extrema Convolution (LEConv), a graph convolution method for obtaining local extremum information (Ranjan et al. 2020; Li et al. 2022c). Other approaches, such as scalar projection (Sun et al. 2021; Yu et al. 2021) and ego-networks’ closeness scores (Zhong et al. 2022), can be used to evaluate a model’s coefficient in determining the role that clusters play in both structure and features. There are also a few operators that pick and collapse mergeable node pairs or node sets that are deemed merged one by one, driven by the notion that nodes with similar characteristics should belong to the same cluster and be merged. This is done without the need for an explicit cluster assignment matrix (Hu et al. 2019; Xie et al. 2020).
3.2.2 Selection pooling
Intuitively, the selection pooling reduces the size of the graph by removing some of the nodes. This graph sparsity strategy avoids dense coarsened graphs, which makes the model challenging in computational complexity and memory. The primary emphasis of the node selection pooling operator is to determine which nodes to retain and which to discard, typically accomplished through developing an evaluation mechanism for scoring the nodes. The edges among these preserved nodes in the original graph are also maintained to establish the topological connections between the supernodes in the hypergraph created through the selection pooling approach. The following steps describe the generic computational procedure in selection pooling.
Step 1, Node Evaluation: Given the node representation matrix \({\varvec{H}}\) and the adjacency matrix \({\varvec{A}}\), the selection pooling operator maintains a node evaluation function \({f}_{sel}:({{\varvec{h}}}_{v},{\varvec{H}},{\varvec{A}})\to {s}_{v}\) to transfer from the non-comparable node in the high-dimensional space to the fitness function value \({s}_{v}\in {\mathbb{R}}\). In selection pooling, the evaluation outcome of nodes is often represented as a projection vector \({\varvec{p}}\), similar to the cluster assignment matrix \({\varvec{S}}\) in clustering pooling, this selection function can also be stated as \({f}_{sel}:({\varvec{H}},{\varvec{A}})\to {\varvec{s}}\) and \(s\in {\mathbb{R}}^{N}\). The pooling ratio is a critical hyperparameter in selection pooling, as the removal of nodes inherently leads to information loss. The pooling ratio determines the extent to which essential node information is retained. According to the differences in the implementation of the fitness or importance evaluation functions, the evaluation functions could be generally categorized into parametric and non-parametric pooling.
Non-parametric selection pooling: One of the early pooling methods that incorporated the concept of node selection was implemented within an extended CNN architecture designed for graph data. The authors extensively explore the potential of graph signal sampling methods to compute node sampling matrices, which filter both nodes and node features (Gama et al. 2019). Parameter-free approaches often employ some kind of deterministic or heuristic metrics, such as degree, degree centrality (Zhang et al. 2021b), subgraph centrality (Ma et al. 2020), Manhattan distance between the node representation itself and the one constructed from its neighbors (Zhang et al. 2019a), neighborhood information gain (Gao et al. 2022a), correlation coefficient (Jiang et al. 2020), and the distance between the node and cluster center precomputed by K-means (Wang et al. 2022).
Parametric selection pooling: The first parametric approach can be traced back to graph pooling (gPool), which uses a trainable projection vector \({\varvec{p}}\) to project all node features to 1D and uses k-Max pooling to select nodes (Gao and Ji 2019). The scalar projection of a node \(v\) and its feature vector \({{\varvec{h}}}_{v}\) on \({\varvec{p}}\) is \({s}_{v}={{\varvec{h}}}_{v}{\varvec{p}}/||{\varvec{p}}||\), where \(||\cdot ||\) is the L2 norm (Cangea et al. 2018; Gao and Ji 2019, 2022; Qin et al. 2020; Bi et al. 2021; Gao et al. 2021a). To select coarse nodes, Ma et al. employed GNN to weight all nodes in the OTCoarsening (Ma and Chen 2021). Currently, many parametric approaches adjust attention mechanism to compute importance scores (Gao and Ji 2019; Lee et al. 2019; Knyazev et al. 2019; Huang et al. 2019; Qin et al. 2020; Li et al. 2020a; Gao et al. 2020; Aggarwal and Murty 2021; Bi et al. 2021; Duan et al. 2022). Recently, several approaches have emerged, employing diverse methodologies to concurrently compute and consider both node feature information and structural information while addressing aspects of local and global node features. Before weighted summation, Graph Self-adaptive Pooling (GSAPool) calculates the scores on node features and topology using MLP and GNN (Zhang et al. 2020), while MSAPool (Multiple Strategy-based Attention Pooling) interprets the results of GCN calculations are reflecting a local perspective, and the MLP calculations are interpreted as capturing a global perspective (Xu et al. 2022). Before projection, UGPool executes a simple message passing to gather local feature information and calculate node scores \({{\varvec{s}}}^{l}={\widehat{{\varvec{A}}}}^{l}{{\varvec{H}}}^{l}{{\varvec{p}}}^{l}/||{{\varvec{p}}}^{l}||\), where \({\widehat{{\varvec{A}}}}^{l}\) denotes the normalized adjacency matrix (Qin et al. 2020).
Non-parametric and parametric approaches are not mutually exclusive, and some pooling operators support both parametric and non-parametric modules (Nouranizadeh et al. 2021; Stanovic Stevan and Gaüzère 2022). KnnPool first projects the nodes using MLP and GCN, and then selects nodes by computing the distance between the node and the cluster center (Chen et al. 2022a). Topology-Aware Pooling (TAP) considers two voting processes: local voting and global voting (Gao et al. 2021a). Local voting is based on the average of node and neighboring node similarity metrics, whereas global voting employs projection vectors. MVPool (Multi-view Graph Pooling) offers three views: a structure specific view based on node degree centrality, a feature specific view based on node features and MLP, and a structure and feature specific view based on a variation of PageRank (Zhang et al. 2021b). Gao et al. proposed a structure-aware kernel representation for evaluating node similarity in a graph topology view and implemented it as a parametric and learnable approach (Gao et al. 2021b).
Step 2, Node Selection: After obtaining the node information scores, the following step is to choose which nodes should be kept. In this process, nodes are re-ordered based on their fitness scores, and then a subset of the top-ranked nodes is selected. Equation (12) represents this process, where \(idx\) denotes the indices of selected nodes, the pooling ratio \(r\) and the number of graph nodes \(N\) decide the number of selected nodes, which is usually greater than or equal to 1:
The Multidimensional score space with flIpscore and Dropscore operations (MID) focus on the node selection module and address the issue of neglecting node features and graph structure diversity by incorporating flipscore and dropscore operations (Liu et al. 2023). Specifically, the flipscore operation yields the absolute value of all elements in the multi-dimensional score matrix, while the dropscore operation randomly discards a certain proportion of nodes in the graph when selecting the top k nodes.
Step 3, Learning Hypergraph: In selection pooling, learning hypergraphs involves two aspects: the representation of the hypernodes and the adjacency matrix, which can usually be accomplished by extracting the original representation matrix and the adjacency matrix. This process can be described by Eqs. (13 and 14), in which \({\varvec{H}}(idx,:)\) and \({{\varvec{A}}}^{l}(idx,idx)\) perform the row or (and) column extraction. Concomitantly, the evaluation scores can act as gated coefficients, influencing the update of each supernode’s representation as shown in Eq. (15), where \(\odot \) denotes the element-wise broadcast operation by row. This reflects the idea that more important nodes should be kept more integrally, even if they are all selected supernodes.
From the observation, the selection pooling operators are relatively computationally constant, and the sent and received data are almost identical. As a consequence, the procedures in each step can be abstracted and optimized by a neural architecture search Pooling Architecture Search (PAS) (Wei et al. 2021). Chen et al. proposed the graph self-correction (GSC) mechanism to reduce information loss during graph coarsening by compensating information generated by feedback procedures, where compensating information is calculated using complement graph fusion and coarsened graph back-projection (Chen et al. 2022c). There is also a subset of parametric methods that use mutual information to train auxiliary encoder modules: Coarsened Graph Infomax Pooling (CGIPool) maximizes the mutual information between the input and the coarsened graph (Pang et al. 2021b), and Vertex Infomax Pooling (VIPool) determines the importance of a node by estimating the mutual information between nodes and their neighborhoods (Li et al. 2020b). All of the approaches described above provide the scoring vectors that could be used to visualize the node ordering. Node Decimation Pooling (NDP) is an intriguing node selection pooling operator that decimates nodes by eliminating one of the two sides of the MAXCUT partition and connects the remaining nodes using a link creation procedure (Bianchi et al. 2022b). Without ranking the fitness scores of individual nodes, the NDP evenly divides all nodes into two clusters, where nodes from one cluster are retained while nodes from the other cluster are decimated.
3.2.3 Edge pooling
Edge pooling is a class of pooling operations that perform graph coarsening by mining the edge features. Edge pooling can be divided into two categories based on the edge-operating strategy: edge contraction and edge deletion. The difference between these two edge pooling categories is comparable to that between clustering pooling and selection pooling.
3.2.3.1 Edge contraction
Edge contraction typically necessitates the following steps: identifying edges that can be contracted, performing edge contraction (merging nodes), and keeping graph connectivity. EdgePool is the first edge-contraction-based edge pooling operator that can be integrated into existing GNN systems (Diehl et al. 2019; Diehl 2019). The score of the edge \({e}_{ij}\) between nodes \({v}_{i}\) and \({v}_{j}\) is learned by node features, as in:
where \({\varvec{W}}\) and \({\varvec{b}}\) are learnable parameters, and \(\sigma (\cdot )\) could be \(Tanh(\cdot )\) or \(Softmax(\cdot )\). All edges are ranked by their scores, and the highest-scoring edge not yet connected to a contracted node is selected sequentially. While the edge contraction is viewed as a readout between two nodes, the merged node’s features are the sum of the two nodes’ features as \({{\varvec{h}}}_{{v}_{ij}}={s}_{i,j}({{\varvec{h}}}_{{v}_{i}}+{{\varvec{h}}}_{{v}_{j}})\) (Diehl et al. 2019; Yuan et al. 2020).
3.2.3.2 Edge deletion
Equation (16) can also be used to score edges in edge deletion pooling (Galland and marc lelarge 2021). The hyperedge pooling operator is a common pooling operator based on edge deletion, which propagates node projection scores through PageRank and averages the scores of two endpoints as the edge’s evaluation scores (Zhang et al. 2021c). Another evaluation viewpoint considers the observation that the greater the difference between the endpoints, the more information they can obtain from each other, and the greater the importance of this edge, as shown in Eq. (17) (Gao et al. 2020). After obtaining the edge’s evaluation score, the processes of edge deletion pooling are similar to node selection pooling, with the exception that the operation subject is switched from the node to the edge, as shown in Eqs. (18 and 19), where \(top\_rank(\cdot ,\cdot )\) returns the indices of selected edges and \(E(idx)\) means removing edges that are out of \(idx\) (Zhang et al. 2021c; Yu et al. 2022).
Dual Hypergraph Transformation (DHT) is a unique technique for edge pooling that transforms edges into hypergraph nodes (Jo et al. 2021). Applying the clustering and dropping methods on dual hypergraphs, both coarsened graphs and global edge representation are generated in DHT. In addition, graph contraction and graph deletion are not mutually exclusive, and they can be described under a unified framework (Bravo-Hermsdorff and Gunderson 2019).
3.2.4 Hybrid pooling
Hybrid pooling is a hierarchical pooling framework that concurrently employs two or more pooling strategies from clustering pooling, selection pooling, and edge pooling. Many rule-based central node pooling operators can be classified as hybrid pooling methods, as they create the cluster structure by scoring and sorting central nodes or clusters to reduce the number of supernodes. Approaches characterized as sequential selection and clustering involve first selecting the nodes as clustering centers, followed by clustering the remaining nodes. (Noutahi et al. 2019). Interleaved clustering and selection can describe certain methods where the initial formation of local clusters is conducted first, followed by the selection of clusters, and then the subsequent refinement of cluster assignments along with cluster readout (Ranjan et al. 2020; Su et al. 2021; Sun et al. 2021; Yu et al. 2021; Li et al. 2022c).
Another type of node selection pooling is the sequential selection-clustering hybrid pooling method that aggregates the information of discarded nodes into supernodes, followed by performing an intra-cluster readout. The primary motivation of these pooling operators is to address the challenge that selection pooling usually discards a significant amount of information, including node attributes and topology information, as they coarsened graphs by removing a large number of nodes. A natural option is to aggregate the node information from local neighbors into the supernode (Huang et al. 2019; Zhang et al. 2020; Qin et al. 2020; Li et al. 2020b, a; Bi et al. 2021). ProxPool decomposes the relearning of the supernode representation into three levels, resembling a method for constructing a cluster assignment matrix: neighborhoods within fixed hops, sparse neighborhoods retaining only closely related non-reserved nodes, and sparse soft assignment matrix based on affinity treating each supernode as the cluster’s central node (Gao et al. 2021b).
Selecting both edges and points are also introduced by researchers. LookHops and Hierarchical Triplet Attention Pooling (HTAP) both compute the indices of important points and edges independently and simultaneously (Gao et al. 2020), whereas HTAP only selects the important edges for the important nodes (Bi et al. 2021). Zhou et al. proposed cross-view graph pooling (Co-Pooling), which involves incorporating pooled representations learned from node and edge views, as well as exchanging the cut proximity matrix and the indices of selected nodes as edge-node view interactions (Zhou and Yin 2023). Accurate Structure-Aware Graph Pooling (ASPool) combines the concepts of clustering pooling, selection pooling, and edge pooling to create coarsened graphs by removing edges to calibrate the graph structure, forming local clusters, scoring and selection through a two-stage procedure, and merging selection results (Yu et al. 2022). WGDPool integrates edge weight to enhance graph representations, learning separate node and edge embeddings that converge into a comprehensive graph representation (Xiao et al. 2024).
3.3 Graph unpooling
Graph unpooling serves as the inverse operation of graph pooling, which is similar to how deconvolution is the reverse process of convolution, and upsampling is the inverse operation of downsampling. These unpooling operators may have other names, such as the Graph Refining Layer (Hu et al. 2019) or the Up-Sampling Layer (Zhang et al. 2021b). The graph unpooling operator converts the coarsened graph back to the fine graph, which conducts the upsampling procedure. This U-shaped network structure enables the network to handle layer-level and node-level tasks concurrently and is generally bound to the unpooling operator. The unpooling concept was first proposed in graph U Nets with its unpooling operator as gUnpool (Gao and Ji 2019, 2022). Generally, each unpooling operator has two steps: restoring node location and restoring node representation.
To retrieve the graph to its original structure, the location of the node selected in the corresponding pooling layer needs to be recorded, and the nodes are repositioned using this information. This operation can be formalized as:
where \({\varvec{U}}\) is the restored graph representation matrix, \({0}_{{N}^{l+1}\times {d}^{l}}\) is the restored graph’s initial empty representation matrix, and \(distribute({0}_{{N}^{l+1}\times {d}^{l}},{{\varvec{U}}}^{l},idx)\) is the operation that distributes \({{\varvec{U}}}^{l}\) row vectors into an empty matrix based on indices stored in \(idx\) (Gao and Ji 2019, 2022). As shown in Eq. (20), the indices of selected nodes are stored for unpooling. Currently, the inverse of selection pooling operators are the most popular unpooling operators (Gao and Ji 2019, 2022; Li et al. 2020b; Zhang et al. 2021b; Chen et al. 2022a; Zou et al. 2022; Lu et al. 2022). Furthermore, if the node mapping from the original graph to the coarsened graph is saved in graph pooling, edge pooling can directly restore through inverse mapping, and the restored node features can be calculated through Eq. (21) (Diehl 2019; Yuan et al. 2020). The mapping relationship is preserved within the cluster assignment matrix \({\varvec{S}}\) for cluster clustering operators, allowing it to be utilized to restore the graph structure. In addition, skip connections could be used to improve node representations, as shown in Eq. (22) (Hu et al. 2019).
The main distinction between different unpooling operators is how the node representation is restored. After initializing the restored graph’s node representation matrix, node representation can be interpolated using the graph convolution layer (Li et al. 2020b). The attention-based graph unpooling (attnUnpool) layer initializes added nodes with an attention operator that attends to its neighbors (Gao and Ji 2022). MeanUnPooling, inspired by bilinear interpolation in the CNN model, restores node features by averaging features of neighbor nodes without any hyperparameters (Lu et al. 2022). Zhong et al. perform the unpooling by creating a top-down message-passing mechanism that provides the restored nodes with meso/macro level knowledge (Zhong et al. 2022). The parameterized unpooling layer (UL), on the other hand, uses MLP to produce probabilities to determine whether nodes and edges should be restored, as well as to construct features for the restored nodes and edges (Guo et al. 2023).
3.4 Evaluation frameworks and open problems
3.4.1 Evaluation of graph pooling
In this section, we will discuss some ground-breaking research on evaluating pooling operators, including whether and how pooling operators improve graph classification.
3.4.1.1 Benchmark datasets
Table 1 presents benchmark datasets for graph classification, notably the TUDataset with over 120 benchmarks for graph data learning, accessible at www.graphlearning.io (Morris et al. 2020). Popular bioinformatics datasets include D&D, NCI1, NCI109, and others, while social network and collaboration datasets like IMDB-BINARY and COLLAB are also frequently used (Yanardag and Vishwanathan 2015; Kersting et al. 2016; Morris et al. 2020). Accuracy is the primary evaluation metric for classification datasets in the TUDataset. The Open Graph Benchmark offers large-scale datasets across various domains, available at https://ogb.stanford.edu, supporting diverse ML tasks (Hu et al. 2020, 2021c). It includes datasets like Ogbg-molhiv (HIV), Ogbg-ppa (PPA), Ogbg-molbbbp (BBBP), Ogbg-moltox21 (Tox21), and Ogbg-moltoxcast (ToxCast), evaluated using ROC-AUC or accuracy (Gao et al. 2021b; Baek et al. 2021; Jo et al. 2021; Chen et al. 2022b). Additionally, Dwivedi et al. introduced a benchmark framework that encompasses a diverse range of mathematical and practical graphs for fair model comparisons (Dwivedi et al. 2023). Their framework is hosted at https://github.com/graphdeeplearning/benchmarking-gnns.
3.4.1.2 Libraries
With the growth of the field, several GNN-specific libraries has emerged, spanning languages like Python and Julia, and platforms such as PyTorch, TensorFlow, JAX, and Flux. Table 2 presents a summary of these libraries alongside their supported pooling operators. These libraries, initially designed for simpler node operations, are evolving through recent innovations to offer user-friendly interfaces for diverse GNN models, optimize sparse operations on GPUs, and facilitate scaling to expansive graphs and multi-GPU environments. They provide versatile APIs for hierarchical pooling and readout, with some offering an array of pooling options. Additionally, TUDataset is accessible via PyTorch Geometric (PyG), Deep Graph Library (DGL), and Spektral, while OGB is accessible via PyG, and DGL.
3.4.1.3 Comparison and discussion
Cheung et al. conducted empirical evaluations of various graph pooling methods, including SortPooling (Zhang et al. 2018), DiffPool (Ying et al. 2018), gPool (Gao and Ji 2019), and SAGPool (Lee et al. 2019), within graph classification tasks using GCNs (Cheung et al. 2019). They found that DiffPool outperformed non-pooling networks and other pooling, while the performance of other pooling operators is unstable, and gPool performs poorly without the encoder structure. The optimal evaluation protocol for graph pooling—whether a uniform GNN architecture or one tailored to each pooling—remains contested (Bodnar et al. 2021). To promote experimental integrity, a standardized and reproducible experimental environment was suggested, which includes nested cross-validation (CV), publicly accessible data splits, and hyper-parameter tuning procedures (Errica et al. 2020). In a controlled experiment, the authors reevaluated five GNNs across nine datasets against structure-agnostic baselines that rely solely on node features and global readouts (Hamilton et al. 2017; Simonovsky and Komodakis 2017; Zhang et al. 2018; Ying et al. 2018; Xu et al. 2019). The findings indicate discrepancies between the actual performance of each model and previously reported results. Notably, graph pooling methods did not consistently outperform the structure-agnostic baselines, despite their purported advantages in leveraging graph structures. Bianchi et al. focused on how pooling operators influence the expressiveness of GNNs and proposed a universal criterion to measure a pooling operator’s effectiveness based on its ability to retain graph information (Bianchi and Lachi 2024). Their experimental evaluation on graph classification benchmarks revealed that expressive pooling operators outperformed, and most sparse pooling methods were not only less effective due to their limited expressiveness but also did not offer significant speed advantage.
In addition to structurally agnostic baseline methods and reliable experimental settings, comparative experiments with variants designed based on randomization serve as another way to validate the model’s real effectiveness (Mesquita et al. 2020). Mesquita et al. examined the idea of capturing local information and conducted extensive experiments with variations, named randomization and complement, on the need for locality-preserving representations (Mesquita et al. 2020). Grattarola et al. offered three evaluation criteria for pooling operators: preservation of (1) node attributes, (2) topological structure, and (3) information for downstream tasks (Grattarola et al. 2022). They applied these criteria using three experimental metrics to evaluate eight pooling methods, focusing on reconstruction of point cloud coordinates, structural similarity between the original and coarsened graphs, and classification performance on benchmark datasets (Monti et al. 2017; Cangea et al. 2018; Ying et al. 2018; Gao and Ji 2019; Lee et al. 2019; Noutahi et al. 2019; Bacciu and Di Sotto 2019; Maria Bianchi et al. 2020; Bianchi et al. 2022b; Grattarola et al. 2022). The findings reveal that trainable methods have an advantage in preserving the structure and task-specific information. Furthermore, the authors note that trainable global pooling performs better, which is consistent with our ranking of the learning ability of various types of readout functions. Zhou et al. randomly added and removed edges from a real dataset to test the robustness of graph topology on graph classification in existing methods (Zhou and Yin 2023). Surprisingly, processed random edges did not significantly reduce graph classification accuracy, even when all edges were removed (Zhou and Yin 2023). These discoveries have encouraged researchers to conduct more ablation studies to validate the effectiveness of novel pooling operators.
The interpretability of the pooling operators enhances our understanding of graph pooling. Visualization of hierarchical clustering has been an attractive solution to visibly demonstrate model findings until the explanation of graph pooling and the possible meanings of the captured structures are clearly defined (Ying et al. 2018; Noutahi et al. 2019; Maria Bianchi et al. 2020). CommPOOL generalizes hierarchical graph neural network interpretation to three questions (Tang et al. 2021): How can the hierarchical structures of a graph be captured in an interpretable manner? How can the graph representation be scaled down while preserving the structures using an interpretable process? What results from the pooling operation? To examine the community structure captured by CommPOOL, the authors employed random simulation graphs and protein data with node labels as community ground truth. The Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI), and Adjusted Rand Index (ARI) between the model-predicted community labels and the ground truth are used to quantify the pooling operator’s capability to capture the community structure (Maria Bianchi et al. 2020; Zhang et al. 2021b; Roy et al. 2021; Tang et al. 2021; Wang et al. 2022).
3.4.1.4 Evaluation framework
We categorized the evaluation of pooling methods into three tiers: (1) fair experimental settings with repeatable CV, data partitioning, and hyperparameter selection procedures; (2) comprehensive ablation studies and comparison experiment; and (3) heuristic or quantitative model interpretation analysis with theoretical insights. All methods were systematically reviewed and detailed in Table S10 (Supplementary File 1). In general, most methods include a detailed experimental setup. For developing an optimum evaluation procedure, the hyperparameter setup or hyperparameter search process is also detailed in the research settings. For the evaluation criterion of comprehensive ablation studies and comparison experiments, it is essential to conduct parameter analysis and comparison experiments with model variations based on candidate techniques. In addition to comparing against variants with specific modules removed, ablation studies may also consider comparisons with variants generated through randomization. Studies validating methods’ runtime and memory usage underscore that consistent time and space conditions provide an objective efficiency metric. More efficient feature encoding approaches mean more information may be represented with the same number of neurons or parameters. The performance of convergence is another comparison for neural network models. Visualizations, case studies, model explanations, permutation invariance proofs, connections to existing methods, theoretical discourse, and other insightful discussions constituted the third evaluation tier. Popular forms of visualization include examples of selection or cluster structures, cluster visualization of graphs in low-dimensional space, or the coloring diagrams of importance scores. Liu et al. and Xiao et al. use image segmentation as an example of interpretability studies to demonstrate the interpretability of node assignment (Liu et al. 2022b; Xiao et al. 2024). Permutation invariance, essential in global pooling, is also demonstrated across various hierarchical pooling operators. Many methods include a time or space complexity analysis as a theoretical assessment of their effectiveness for sparsity or other properties.
3.4.2 Complexity and connectivity
The sparsity of the generated coarsened graph differs significantly between clustering pooling and selection pooling. Differentiable clustering pooling operators often describe the clustering results using a cluster assignment matrix. Because this matrix is typically dense, the adjacency matrix of the generated hypergraph will also be dense, regardless of whether the initial adjacency matrix is sparse. This dense structure imposes unsustainable computing and storage requirements as the size of the input graph rises, preventing the deployment of such pooling methods to large graphs and deeper networks. The updated adjacency matrix is produced in the node selection pooling operators by extracting the initial adjacency matrix, which effectively preserves the sparsity of the graph structure. However, this strategy misses the connectivity among the supernodes, which may result in isolated nodes that are not adjacent to any node in the hypergraph. These isolated nodes may exhibit local extremum-like effects during subsequent message propagation, weakening the validity of node evaluation. Consequently, specific modifications of the graph’s adjacency matrix are required to preserve the graph structure’s sparse properties and robust connectivity.
Diversified sparsification strategies for hypergraph adjacency matrices have been proposed for acceptable space overhead and complexity. A sparsification technique is to restrict the number or range of node and cluster assignment connections, i.e., local clustering assignment generates a sparse assignment matrix (Xie et al. 2020; Ranjan et al. 2020; Gao et al. 2021b; Hou et al. 2024). Local clustering, in general, means that only nodes in a multi-hop neighborhood can be assigned to the same cluster (Ranjan et al. 2020; Gao et al. 2021b; Yu et al. 2021; Li et al. 2022c). In contrast, in differentiable clustering pooling with no restrictions, any two points could be assigned to the same cluster. Meanwhile, the Sparsemax function is used as the normalization function to enable a sparse assignment (Noutahi et al. 2019; Gao et al. 2021b; Zhang et al. 2021b). The Sparsemax function is a normalization transformation that adaptively sets a threshold for input vectors and transfers the elements below the threshold to zeros after normalization (Martins and Astudillo 2016). To construct sparse attention mechanisms, the Sparsemax function can be used to replace the Softmax function (Zhang et al. 2021b). As a workaround, Liu et al. use the Gumbel-Softmax to perform soft sampling in the node neighborhood, resulting in a lower edge density for the sampled adjacent matrix in Hierarchical Adaptive Pooing (HAP) (Liu et al. 2021). Liu et al. also use Gumbel-Softmax to convert the soft bridge matrix (i.e., assignment matrix) into a hard-assign matrix in the SMIP (Liu et al. 2022b).
In selection pooling, the local neighbors of a node can also be utilized to maintain the hypergraph connectivity and sparsity (Ma and Chen 2021). In general, if the neighborhoods of two nodes in the original graph have dense interconnections, they tend to have larger edge weights in the coarsened graph (Sun et al. 2021; Gao et al. 2022a). The distance between the center node and its one-hop neighbor nodes in the original graph could be employed to compute the node distance in the coarsened graph, as shown in Eqs. (23 and 24) (Huang et al. 2019). Gao et al. suggest using the \({2}^{nd}\) graph power to boost graph connectivity, an operation that builds links between nodes separated by no more than two hops (Gao and Ji 2019). Eqs. (25 and 26) describe this strategy. Eq. (27) shows that UGPool uses a similar adjacency matrix updating method (Qin et al. 2020). Another option is to incorporate graph connectivity into measures for evaluating nodes, such as node degree after normalization, and to prefer densely connected nodes with better connectivity (Gao et al. 2021a).
Although the abovementioned methods could enhance the graph’s connectivity, they may still result in a few isolated nodes. A more sophisticated method would be to relearn the connections for the coarsened graph and combine this with sparsification to guarantee that the learned adjacency matrix is sparse and well-connected (Zhang et al. 2019a, 2021b; Bianchi et al. 2022b). The structure learning mechanism refines the graph structure and eliminates undesired noise information by using a sparse attention mechanism on the original graph structure (Zhang et al. 2019a, 2021b). The structure learning mechanism consists of the following steps: constructing a single-layer neural network \({\varvec{a}}\) to transform the node representation, calculating the similarity score of two nodes using the attention mechanism \(AM({v}_{i},{v}_{j})\) as shown in Eq. (28), and normalization and sparsification using the \(sparsemax(\cdot )\) function. The adjacency matrix value for node pairs, \({A}_{ij}^{l}\), is integrated into the structure learning layer to give attention to larger similarity scores between directly connected nodes. It also attempts to learn potential pairwise relationships between unconnected nodes for which \(\lambda \) is a trade-off parameter. NDP uses Kron reduction to generate new Laplacian matrices and recover them into adjacency matrices for coarsened graph link construction, as well as a threshold to truncate the adjacency matrix for graph sparsification (Bianchi et al. 2022b).
3.4.3 Adaptivity
Another issue in hierarchical pooling is adaptivity, which relates to the pooling method’s capacity to handle input graphs of varying sizes. Extensive graph classification differs from graph signal processing in that the unclassified graphs may exhibit diverse graph structures and node features. This necessitates the network’s capacity to handle different graphs in batches and transform them into fixed-sized outputs. Specifically, the pooling operator maps the original graph to a fixed number of supernodes, thus achieving an alignment of the graph structure (Bai et al. 2021; Lee et al. 2021). The adaptivity also links to the network’s classification capacity on how spatial graph convolution with pooling operators expands to graph structures not observed in the training set, where the number of supernodes influences the adaptivity of the GNNs. When the number of supernodes is independent of the size of the input graph, the model extracts the same number of substructures for large and small graphs; when they are positively correlated, complex graphs can be represented by moderately complicated coarsened graphs. The pooling ratio \(r\) is a crucial hyperparameter that reflects the correlation between the size of the supernode and the original graph in implementation. In selection pooling, the pooling ratio is intuitively defined by the number \(k\) of selected nodes in the top-rank function. The pooling ratio in clustering pooling methods usually equals the number of clusters.
Currently, the most common solution among pooling operators is to determine the number of supernodes by selecting a certain percentage of the maximum number of nodes in the graph data (Cangea et al. 2018; Ying et al. 2018; Gao and Ji 2019; Lee et al. 2019; Ma et al. 2019; Huang et al. 2019; Yuan and Ji 2020; Ranjan et al. 2020; Zhang et al. 2020, 2021b, c; Qin et al. 2020; Li et al. 2020b, a, 2022b; Bandyopadhyay et al. 2020; Gao et al. 2020, 2021b, 2022a, 2022b; Aggarwal and Murty 2021; Liu et al. 2021; Sun et al. 2021; Yang et al. 2021a; Bodnar et al. 2021; Pang et al. 2021b; Yu et al. 2021, 2022; Pham et al. 2021; Bi et al. 2021; Tang et al. 2021; Su et al. 2021; Wang et al. 2022; Xu et al. 2022; Duan et al. 2022; Zhou and Yin 2023). Previously, the number of supernodes was constant for all graphs, might not share across multiple layers, and might have varying pooling ratios (Ma et al. 2020; Maria Bianchi et al. 2020; Khasahmadi et al. 2020; Xie et al. 2020). Some pooling methods, such as edge contraction pooling (Diehl et al. 2019), and selection pooling (Bianchi et al. 2022b), achieve a fixed pooling ratio, generating a coarsened graph with 50% of the nodes at a time. Other hyperparameters that impact the adaptivity of the graph pooling operators are present in certain topology or heuristic rule-based methods, such as the number of pooling layers in SEP (Wu et al. 2022), and the maximum number of missing links for nodes in a clique in KPlexPool (Bacciu et al. 2021).
The pooling operators with optimal adaptivity should determine the number of supernodes for each sample based on its unique structure and features rather than artificially pre-selecting the number of supernodes for all samples. Inspired by the graph reduction algorithms, the graph coarsening layer in Hierarchical Graph Convolutional Network (H-GCN) merges nodes into structural equivalence groupings and structural similarity groupings until all nodes are marked, and subsequently, it constructs a cluster assignment matrix to elucidate the merging process (Hu et al. 2019). While AdamGNN’s adaptive graph pooling (AGP) operator merges low-diameter ego-networks adaptively and recursively to construct a supernode that contains these ego-networks (Zhong et al. 2022). Leveraging iterative adaptive community detection algorithms, such as the Louvain algorithm, is a detached strategy for fully adaptive graph pooling operators (Roy et al. 2021). To tackle the challenge of prior knowledge in node sampling, Sun et al. propose a novel reinforcement learning (RL) algorithm for adaptive updating of the pooling ratio \(r\) (Sun et al. 2021). When adapted to deep learning, the maximal independent vertex set (MIVS) and maximum weight independent set (MWIS) algorithms allow for node selection without a predetermined ratio (Nouranizadeh et al. 2021; Stanovic Stevan and Gaüzère 2022). A more adaptable and practical solution involves using a threshold to select the optimal number of supernodes for each sample dynamically. Noutahi et al. provide an alternative method in Laplacian Pooling (LaPool) that dynamically selects nodes with stronger signal variation than their neighbors, offering the unique flexibility of defining clustering dynamically when training graphs sequentially (Noutahi et al. 2019). More generally, when using pooling operators that use evaluation values to pick nodes, it is advisable to assign a threshold \(\widetilde{s}\) such that only nodes with evaluation value \({s}_{v}>\widetilde{s}\) are preserved (Knyazev et al. 2019).
3.4.4 Additional loss
The classification loss function, such as the cross-entropy loss function, is the most popular loss function for graph pooling operators. Nevertheless, when the training objective involves extra constraints or the model faces convergence challenges, incorporating additional loss functions becomes essential. Diffpool (Ying et al. 2018) defines the link prediction objective and entropy regularization as auxiliary loss functions, as shown in:
where \({\Vert \cdot \Vert }_{F}\). denotes the Frobenius norm and \(EF(\cdot )\) denotes the entropy function. The link prediction objective follows the intuition that nearby nodes need to be pooled together. The entropy regularization highlights another important characteristic of pooling GNNs: ensuring that each node’s cluster assignment closely resembles a one-hot vector, thus clearly defining each cluster or subgraph membership. Other differentiable pooling operators that share similar architectures also incorporate these objectives to aid in training (Pham et al. 2021; Gao et al. 2021a). To boost the training process, AttPool attaches an MLP to each pooling layer, which takes the graph embedding as input and predicts the graph labels (Huang et al. 2019). The losses and predictions at different levels are summarized to get the total classification loss and the final prediction. Su et al. also designed pooling information loss to maintain node representation distributions as consistently as possible both before and after pooling, as seen below (Su et al. 2021):
When the pooling operator introduces more strategies for extracting features, the additional loss functions become more diversified. Since StructPool learns cluster assignment relations via conditional random fields, finding the optimal assignment is equivalent to minimizing Gibbs energy (Yuan and Ji 2020). Pooling methods based on graph capsule networks usually use a margin loss function to calculate classification loss and a reconstruction loss to constrain the capsule reconstruction to closely match the class-conditional distribution (Xinyi and Chen 2018; Yang et al. 2021a). Maximizing mutual information is another popular strategy of auxiliary training targets that uses mutual information neural estimation (Li et al. 2020b; Bandyopadhyay et al. 2020; Sun et al. 2021; Pang et al. 2021b; Roy et al. 2021; Lee et al. 2021). Additional losses, including the matrix decomposition loss function (Bacciu and Di Sotto 2019), the reconstruction loss (Zhong et al. 2022), the Kullback–Leibler (KL) loss (Knyazev et al. 2019; Khasahmadi et al. 2020; Tang et al. 2021), the deep clustering embedding loss function (Bi et al. 2021), and the spectral clustering loss (Maria Bianchi et al. 2020) are employed as training objectives to enhance the model’s capacity in representation. WGDPool utilizes a differentiable k-means clustering method and a multi-item parameterized loss function including cut, orthogonality, clustering, and reconstruction losses (Xiao et al. 2024).
3.4.5 Attention mechanisms
Attention, a quantitative measure for edges and nodes, has found extensive utility in GNNs and graph pooling operators (Vaswani et al. 2017; Knyazev et al. 2019). Existing pooling methods have integrated various attention mechanisms to adapt to diverse design requirements. GMT captures node interactions using a multi-headed attention mechanism with queries \(\mathcal{Q}\in {\mathbb{R}}^{ N\times {d}_{q}}\), keys \(\mathcal{K}\in {\mathbb{R}}^{ N\times {d}_{k}}\), and values \(\mathcal{V} \in {\mathbb{R}}^{ N\times {d}_{v}}\) as inputs and \(Att(\mathcal{Q},\mathcal{K},\mathcal{V})=w(\mathcal{Q}{\mathcal{K}}^{T})\mathcal{V}\), where \(w(\cdot )\) is an activation function (Vaswani et al. 2017; Baek et al. 2021). Guo et al. use linear transformations to generate key and value matrices for the attention operator, and they also create a gate vector that regulates the information flow of node features (Gao and Ji 2022). MSAPool uses the multi-headed attention method to discover task-relevant parts of the input data and learn each node’s global significance after pooling (Xu et al. 2022). In Region and Relation based Pooling (R2POOL), the dot-product rule of self-attention is utilized to calculate the similarity of each query vector with each key vector to identify node bi-directional pairwise similarities and relative significance at the graph scale (Aggarwal and Murty 2021).
In general, the attention weights between two nodes are usually generated as follows:
The attention scores between two nodes are used to evaluate the similarity or correlation between two nodes. When only one node is taken into account, the attention weights of the nodes can be calculated as follows:
where \(\sigma (\cdot )\) could be the Sigmoid function or LeakyReLU function (Xinyi and Chen 2018; Li et al. 2019; Bi et al. 2021). Many pooling methods use an attention mechanism, which can be implemented by concatenating the representations of two nodes or different representations of one node, and then feeding it into a feedforward neural network (Xinyi and Chen 2018; Fan et al. 2020; Ranjan et al. 2020; Liu et al. 2021; Sun et al. 2021; Zhang et al. 2021b; Yu et al. 2021, 2022; Bi et al. 2021; Itoh et al. 2022; Li et al. 2022c). A simplified version is to use a projection vector to map the node representation to the attention score directly (Huang et al. 2019; Yuan and Ji 2020; Gao et al. 2020; Su et al. 2021; Lu et al. 2022; Wang and Ji 2023). Methods that compute attention scores solely based on node representations may not fully exploit the local structural information of nodes, thus some methods propose using GNNs to calculate node attention scores (Lee et al. 2019; Meltzer et al. 2019; Aggarwal and Murty 2021; Pang et al. 2021b; Duan et al. 2022). CGIPool computes a 1D attention score vector for each node in the input graph using parallel graph neural networks: \(Att({{\varvec{H}}}^{l},{{\varvec{A}}}^{l})=\sigma (GNN({{\varvec{H}}}^{l},{{\varvec{A}}}^{l}))\) (Pang et al. 2021b). For subgraph matching, H2MN (Hierarchical Hypergraph Matching Networks) calculates cross-graph attention coefficients among the hyperedges between graph pairs using cosine similarity scores (Zhang et al. 2021c). Similarly, LaPool learns the node-to-cluster assignment matrix via a soft-attention method measured by cosine similarity (Noutahi et al. 2019).
In addition to weighted summation-based readout functions described in Sect. 3.1.3, the attention mechanism is also used to compute the cluster assignment between nodes and supernodes in clustering pooling operators. Based on Memory Augmented Neural Networks (MANNs), Graph Memory Network (GMN) uses the clustering-friendly Student’s t-distribution to measure the normalized similarity for query-key pairs between nodes and clusters as the soft assignment probabilities (Khasahmadi et al. 2020). On the other hand, HAP employs an attention method similar to Graph Attention networks (GATs), in which pre-generated global cluster content vectors and node content vectors are concatenated and fed into a nonlinear network to learn the Master-Orthogonal-Attention (MOA) scores (Veličković et al. 2018; Liu et al. 2021). There are two popular ways for computing the attention mechanism in selection pooling operators: linear mapping (\({\varvec{\mu}}\)) of individual node features and GNN approaches based on nodes and node neighbors, as shown in Eqs. (34 and 35) (Knyazev et al. 2019). The former is the scalar projection and can be extended to MLP with a non-linear function \(\sigma (\cdot )\) like Eqs. (36 or 33) (Zhang et al. 2021b), and milt-head attention with mapping matrix \(\boldsymbol{\rm M}\) (Bi et al. 2021), whereas the latter is used by the selection pool operators including SAGPool (Lee et al. 2019), soft-mask GNN (SMG) (Yang et al. 2021b) and R2POOL (Aggarwal and Murty 2021). All of these approaches are concentrated on a single node, and most of them only have one feature perspective. As a result, there is a tendency toward combining increasingly sophisticated attention mechanisms, such as node-to-node attention (Duan et al. 2022), and key-value-query-based attention (Aggarwal and Murty 2021; Gao and Ji 2022).
In summary, the attention mechanism in graph pooling operators is used to compute the assignment matrix of nodes and clusters (Xinyi and Chen 2018; Noutahi et al. 2019; Huang et al. 2019; Khasahmadi et al. 2020; Ranjan et al. 2020; Su et al. 2021; Baek et al. 2021; Liu et al. 2021; Sun et al. 2021; Yu et al. 2021; Li et al. 2022c), the local or global importance scores of nodes to select nodes (Lee et al. 2019; Huang et al. 2019; Gao et al. 2020; Aggarwal and Murty 2021; Pang et al. 2021b; Duan et al. 2022; Gao and Ji 2022), the similarity or correlation between nodes to learn the graph structure (Yuan and Ji 2020; Zhang et al. 2021b; Bi et al. 2021), or as a gating mechanism to control the integration of information of individual nodes (Meltzer et al. 2019; Fan et al. 2020; Baek et al. 2021; Yu et al. 2021, 2022; Zhang et al. 2021c; Xu et al. 2022; Itoh et al. 2022; Lu et al. 2022; Duan et al. 2022; Li et al. 2022c; Wang and Ji 2023). Yet, the breadth of applications has led researchers to subconsciously incorporate attention mechanisms when designing novel graph pooling operators while dismissing the applicability and potential functions. Knyazev et al. showcased the considerable potential of learned attention in GNNs and graph pooling layers, provided that it closely approximates optimality. However, attaining this proximity to optimality can be challenging due to sensitivity to initialization (Knyazev et al. 2019). The observation of the negligibility of the attention effect or even harmful under typical conditions suggests that the attention mechanism is still a significant open problem in graph pooling operators. Further development of our understanding of attention, as well as substantiating its effectiveness through rigorous, fair, and comparable experimental results, is imperative.
4 Applications in omics studies
Biological networks and molecular structures are two well-established graph modeling topics in bioinformatics data analysis (Zhang et al. 2021a). When modeling molecular structure as a graph, atoms or chemical substructures are usually treated as nodes, while bonds are treated as edges. In biological network modeling, the modeled entities are usually utilized as nodes, and the edges connecting nodes indicate the known association between pairs of entities. Unlike molecular structures, which have innated graph structures from nature, biological networks need to extract entities and model interactions between them. The entities represented by the biological network nodes cover molecule compounds, biomolecules, cells, and tissues, and the data used for modeling graphs across various omics, ranges from chemical structures to sequencing, expressions, and medical images (Jin et al. 2021). GNNs can be used to formulate and aggregate entity relationships in the graph data structure, and the graph embedding obtained by pooling the learned representations of all nodes in the graph can be used as a robust low-dimensional feature to preserve topological relationships between entities in biological networks (Wang et al. 2020b, 2021a). The bioinformatics applications of graph pooling operators in omics covered in this survey can be divided into three categories: genomics (Sect. 4.1), radiomics and other medical imaging (Sect. 4.2), and proteomics (Sect. 4.3). We conducted a selective review of applications on these omics data, with remainder in other omics, such as metabolomics or multi-omics, presented on GitHub.
4.1 Genomics
Data: The advent of high-throughput next-generation genomic technologies has catalyzed a surge in genomics data, propelling studies of DNA that encompass its structure, modification, and expression. Extensively studied biological networks involving genes, such as PPI, GRN, co-expression or correlation networks, disease networks, and other multi-omics networks, often developed from cohort studies that include multiple patients (Sulaimany et al. 2018). To obtain graphs that represent individual states, an intuitive approach is to augment gene networks derived from a population with individual characteristics as graph signals (Ramirez et al. 2020, 2021; Chereda et al. 2021). Pfeifer et al. incorporated gene expression and DNA methylation data as node features within a shared PPI network (Pfeifer et al. 2022). Despite identical graph topologies for all patients, the variable node features distinctively represent each patient’s unique cancer molecular profile. A Single-Sample Network (SSN) is a biomolecular network tailored from individual data and a reference set to delineate a person’s unique disease condition, providing insights into the personal response to pathophysiological changes (Liu et al. 2016). LIONESS (Linear Interpolation to Obtain Network Estimates for Single Samples) represents an alternative method for constructing single-sample networks, in which edge scores of an individual network are calculated by taking the difference between the edge scores of an aggregate network, which includes all samples, and a network reconstructed without the sample of interest (Kuijjer et al. 2019). Single-sample network methods yield gene networks with unique topologies tailored to each individual sample. Single-cell technologies have generated extensive omics data at the cell level, shifting the understanding of disease from the individual or tissue level towards a cell level, and intensifying insights into heterogeneity between cells. Constructing sample-specific graph signals on background networks (Wang et al. 2021b), or single-sample network by sample-specific network construction methods (Dai et al. 2019), are also applicable to single-cell data.
Tasks: The main goal of graph pooling in genomics networks is to improve classification, including stage, survival, grade and subtype classification. Additionally, such studies often employ model interpretation methods to identify cancer driver genes, network biomarkers or genes associated with diseases. In graph-level gene network analysis, the workflow includes building individual networks, crafting supervised GNN-based classification models, interpreting results, and assessing biological relevance. Given that single-cell data often encompasses multiple omics datasets, this involves tasks such as modality prediction, modality matching, and joint embedding (Wen et al. 2022). Graph-level single-cell network analysis involves both supervised tasks, like cell classification, and unsupervised tasks, such as cell clustering (Wang et al. 2021b; Hu et al. 2024).
Methods: Table 3 shows representative methods with graph pooling operators for processing biological networks from genomics. Ramirez et al. designed an edge pooling method that greedily selects pairs of nodes to merge through feature summation (Ramirez et al. 2020, 2021). Chereda et al. employed one-dimensional (1D) pooling in CNNs such as max pooling or average pooling, to process graph embedding matrices and cluster multiple neighbor nodes into a supernode (Chereda et al. 2021). The sigGCN employs a maxpooling layer to cluster a specified number of nodes into a supernode (Wang et al. 2021b). Hou et al. developed a truncated differentiable clustering pooling operator that efficiently condenses the cluster assignment matrix, creating finite gene clusters without overlap (Hou et al. 2024). Hu et al. employed differentiable clustering pooling to first identify and then align tissue cellular neighborhoods (TCNs) across spatial maps, generating an embedding that maintains the integrity of TCN partition information (Hu et al. 2024). Liang et al. leveraged SAGPool on pathway-based gene networks for cancer prognosis, utilizing interpretive algorithms to pinpoint survival-related pathways (Liang et al. 2022).
Discussion: The most frequently used graph pooling in genomics networks is clustering pooling in hierarchical pooling, followed by global pooling operators. Ramirez’s research pioneered a data-driven model utilizing GNNs for classifying cancer subtypes (Ramirez et al. 2020). The model was trained on 11,071 samples, 33 cancer types and four distinct network types, achieving a cancer subtype prediction accuracy that surpasses or matches previously documented ML algorithms. Wang et al. evaluated sigGCN against four traditional ML methods over seven datasets and their findings revealed equivalence on smaller datasets, while sigGCN outperformed the traditional ML on larger, more complex datasets (Wang et al. 2021b). Ramirez et al. applied the same model to cancer subtype classification and survival prediction tasks, demonstrating the scalability of pooling operators across different datasets and tasks (Ramirez et al. 2020, 2021). Considering that neural network decisions require explanation prior to clinical consideration, nearly all genomics studies now incorporate interpretative analysis to provide both model insights and clinically relevant information (Chereda et al. 2021; Liang et al. 2022). Additionally, Hou et al. used different modules for clustering and classification, while Hu et al. employed similar architectures for multiple tasks, showcasing the potential of GNNs with pooling operators for multi-task learning (Hou et al. 2024; Hu et al. 2024). However, these studies rarely consider the leveraging auxiliary information from related tasks, thus failing to enable mutual task enhancement. In genomics research, a crucial factor is the single-sample network construction method. Most current methods are applied to homogeneous graphs, with limited focus on heterogeneous data. Li et al. aimed to adapt clustering pooling for heterogeneous graphs, first dimensionally aligning heterogeneous nodes via a linear layer, then clustering multiple nodes into a supernode on homogeneous graphs (Li and Nabavi 2024). However, there is still a lack of extensive experimental evaluation of various pooling operators across multiple data types, graph construction methods, and tasks.
4.2 Radiomics
Data: Radiomics involves the rapid collection of extensive medical images from technologies like computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) (Li et al. 2022a). Antonelli et al. introduced ‘omics imaging’ to describe the combination of biomedical imaging with omics data, analyzing both histological and radiomic images (Antonelli et al. 2019). Medical images are gold standards in the medical practice for the diagnosis, classification, and prognosis of clinical diseases. Medical images offer a wealth of data on biological entities and their interconnections, making it possible to extract graphs using diverse approaches. Brain connectivity networks, for example, map the brain’s graph structure, while tissue images facilitate the creation of cellular or tissue entity graphs. By representing the brain as a functional connectivity graph, functional magnetic resonance imaging (fMRI) has made significant progress in comprehending the brain. In this immensely complex system, nodes are defined as voxel or brain regions of interest (ROIs), and edges are defined as functional connectivity between those ROIs, usually calculated as pairwise correlations of fMRI time series (Li et al. 2021b). Medical images of other tissue regions, such as histopathology images based on cell or tissue structure patterns (Adnan et al. 2020; Martin-Gonzalez Paula and Crispin-Ortuzar 2021; DI et al. 2022; Pati et al. 2022; Gao et al. 2022c), CT and MRI of other lesion sites (Huang et al. 2021; Pang et al. 2021a), can also be constructed into graph structures. A node as a cell in the graph represents an image region, and the edges are defined by their spatial distance. The rapid digitization of pathology slides into high-resolution whole-slide images (WSIs), which has facilitated whole-slide scanning systems, has ushered a revolution in computer-aided diagnostics. Building WSI networks involves detecting entities at tissue or cellular levels, patch-based sampling, or dynamically forming networks from CNN-extracted higher-order features (DI et al. 2022; Pati et al. 2022; Gao et al. 2022c). Single-sample network methods like SSN and LIONESS are also suitable for building networks from WSIs’ high-order features, capturing the intricate interactions within (Kuijjer et al. 2019; Duroux et al. 2023).
Tasks: The main goal of using graph pooling in radiomics networks is to improve classification, with labels derived from clinical data. Additionally, graph pooling is expected to link the classification results with the pathological annotations and clinical features, thereby offering pathological explanations and clinical insights for each sample. Understanding mental diseases such as schizophrenia and Alzheimer’s disease is an essential objective of applying GNNs to brain connectivity networks, hence one basic task of GNNs with graph pooling is disease diagnosis, i.e., separating controls and cases via brain networks (Li et al. 2020c, 2021b; Sebenius et al. 2021; Hu et al. 2021a; Gopinath et al. 2022; Song et al. 2022; Zhao et al. 2022). Another objective is to efficiently differentiate and explain brain states through model explanation, which helps to further research into the brain’s operating mechanisms (Li et al. 2021b; Gopinath et al. 2022; Gao et al. 2022b; Zhang et al. 2023b). In general, the overall process of brain network analysis involves brain map construction, GNN-based classification, and possible subsequent analysis, such as ROI extraction and visualization (Li et al. 2021b). This advancement of medical imaging, in particular the WSIs, has enabled the application of artificial intelligence to address a variety of pathology tasks, encompassing tumor detection, tumor staging, and survival analysis (Adnan et al. 2020; Martin-Gonzalez Paula and Crispin-Ortuzar 2021; DI et al. 2022; Pati et al. 2022; Gao et al. 2022c). The two-stage pipeline of GNN on WSI generally starts from building the network and extracting features and then building the GNN for classification or regression prediction. Diverging from supervised tasks, Zheng et al. and Özen et al. concentrate on the unsupervised GNN-Hash for graph encoding, establishing the retrieval indexes of WSIs to enhance ROI retrieval for auxiliary diagnosis (Zheng et al. 2019; Özen et al. 2020). Wang et al. focus on weakly supervised learning, employing image-level labels instead of pixel-level annotations to aid pathologists with WSI Gleason grading (Wang et al. 2020a).
Methods: Table 4 shows representative methods with graph pooling operators for processing biological networks from medical images. The most frequently used medical image network classification method is selection pooling in hierarchical pooling, followed by attention-based global pooling operators. To investigate human brain states, Zhang et al. propose a novel domain knowledge-informed self-attention graph pooling-based graph convolutional neural network (DH-SAGPool) that keeps certain significant nodes by calculating the score of each node in the graph as \({\varvec{s}}=\sigma ({\varvec{A}}{\varvec{H}}{\varvec{W}})\) (Zhang et al. 2023b). Tang et al. propose a Hierarchical Signed Graph Pooling (HGP) module consisting of four steps: (1) calculate Information Scores (ISs); (2) selection of top-K informative hubs; (3) feature aggregation; and (4) graph pooling, where the IS contains balanced and unbalanced components to measure information from balanced and unbalanced node sets (Tang et al. 2022). Focus on the contribution of each node to the final prediction result, all nodes in the model GAT-LI share a mapping vector to assign weights to node representations, and existing methods such as GNNExplainer are integrated to understand the model and highlight important features (Ying et al. 2019; Hu et al. 2021a). Li et al. propose maximum mean discrepancy (MMD) loss and binary cross entropy (BCE) loss to accentuate the distinction, aiming to overcome the limitation of existing approaches where ranking scores for the rejected nodes and the remaining nodes may not be discernible (Li et al. 2020c). Moreover, group-level consistency (GLC) loss is intended and commonly used to force the pooling operator to select similar significant nodes or ROIs for different input instances (Li et al. 2020c, 2021b; Gao et al. 2022b).
Various GNN architectures have shown considerable applicability in medical image network classification. Inspired by the DiffPool, Gopinath et al. propose an end-to-end learnable pooling strategy for the subject-specific aggregation of cortical features (Gopinath et al. 2022). This model is structured hierarchically, with sequential Graph Convolution + Pooling (GC + P) blocks and two FC layers. In addition, UGPool utilized the JK net architecture in graph classification on brain connectivity experiments (Qin et al. 2020). Notably, a GNN architecture between hierarchical and straightforward structures is often used for medical image classification, which usually consists of several GCN layers sequentially linked with a hierarchical pooling layer and a global pooling layer, as shown in Fig. 4. In this architecture, a single hierarchical pooling operator, typically selection pooling operator, is utilized to assess important nodes. The importance coefficients of nodes represent the preferences of the model and provide insights into the diverse roles played by different regions represented by nodes during classification.
Discussion: Graph pooling operators are important in medical image analysis at the following levels: global pooling operators are used to obtain a representation of the graph for classification (Adnan et al. 2020; Huang et al. 2021; Hu et al. 2021a; DI et al. 2022; Pati et al. 2022; Gao et al. 2022c); hierarchical pooling operators are used to extracting features at different levels (Qin et al. 2020; Huang et al. 2021; Pang et al. 2021a; Tang et al. 2022; Gopinath et al. 2022; Song et al. 2022); and attention-based global pooling and selection pooling is used to evaluate the contribution of nodes to the classification in order to find important nodes (Li et al. 2020c, 2021b; Adnan et al. 2020; Martin-Gonzalez Paula and Crispin-Ortuzar 2021; Sebenius et al. 2021; Hu et al. 2021a; Tang et al. 2022; Song et al. 2022; Gao et al. 2022b; Zhao et al. 2022). CGC-Net employs DiffPool as its graph clustering module, visualizing the cell graphs after clustering (Zhou et al. 2019). The first advantage of graph pooling operators lies in their outstanding prediction capabilities, achieving success across various data tasks. DH-SAGPool, for instance, has demonstrated exceptional classification performance surpassing existing methods in brain state classification experiments, ranging from binary to seven-class classification tasks (Zhang et al. 2023b). The effect of pooling in brain connectivity network tasks is sensitive to parameters, where a pooling ratio below a threshold encourages the model to use fewer parameters and the pooling operator to increase robustness to noise, and perform effective dimensionality reduction (Sebenius et al. 2021). However, an extremely low pooling ratio could result in to severe information loss. Another advantage of pooling operators lies in providing interpretability to the model, thus inspiring interpretation for specific biological or medical questions. The node selection pooling layers give BrainGNN intrinsic interpretability, enabling the discovery of significant brain regions that contribute informative features for the prediction task at different levels (Li et al. 2021b).
Despite its widespread use as a method of identifying key nodes for model interpretation intuitively, selection pooling has a shortcoming in terms of quantitative evaluation and theoretical support (Li et al. 2021b). Another challenge faced by graph pooling operators when attempting to understand specific medical or biological problems is the domain-agnostic property that the explanations provided by computational models may not be equivalent to interpretation in the context of medicine or biology (Karim et al. 2023). Intuitively, the nodes or features that significantly contribute to the model’s classification may not necessarily include biologically relevant factors or exhibit a strong correlation with expected biological effects (Hu et al. 2021a). The correlation between these two types of explanations often lacks sufficient experimental or theoretical support. A trend is emerging to enhance this correlation by incorporating extensive domain knowledge into data processing and model computation processes (Zhang et al. 2023b; Karim et al. 2023). Clustering-based hierarchical pooling is less frequently used in medical image analysis, although clustering pooling on voxels in SpineParseNet is crucial to constructing adaptive network topologies (Zhou et al. 2019; Pang et al. 2021a). On the other hand, BrainGNN reconciles node community partition and key node extraction by integrating structure-aware GCN and node selection graph pooling operators, offering insights for other applications (Li et al. 2021b).
4.3 Proteomics
Data and Tasks: Proteomics is the comprehensive analysis of the proteome, encompassing the structure, abundance, function, modifications, localization, and interactions of proteins (Li et al. 2022a). One of the most important tasks in computational drug discovery is protein–ligand binding affinity prediction. In the drug discovery field, ligands often refer to drug candidates, encompassing small molecules and biologics, which act as agonists or inhibitors in biological processes with the potential to treat diseases. Binding affinity, which quantifies the strength of the interaction between a protein and a ligand, is typically determined through rigorous and time-intensive experimental procedures. Proteins and ligands can be described as graphs based on their structures, with atoms typically serving as nodes and the edges between the nodes defined by interactions. Due to their multi-level structure, proteins can be represented as graphs in multiple levels. For instance, Nikolaienko et al. utilized receptor secondary structures as nodes of the graph for proteins (Nikolaienko et al. 2022). In general, ligands are small molecules, and when the ligand is also a protein, this computational task is comparable to PPI prediction (Réau et al. 2023; Huang et al. 2023).
Methods: Table 5 shows representative methods with graph pooling operators for molecular structures. The expansion of three-dimensional (3D) protein structure data and advancements in structural biology have facilitated the establishment of a conceptual framework for structure-based drug discovery (Senior et al. 2020; Jumper et al. 2021; Zhu et al. 2022). Li et al. propose the Structure-aware Interactive Graph Neural Network (SIGN) to learn the constructed complex graph for predicting protein–ligand binding affinity, it uses pairwise interactive pooling (PiPool) to calculate the interaction matrix between different types of atoms in proteins and their ligands for leveraging long-range interactions (Li et al. 2021a). PSG-BAR (Protein Structure Graph-Binding Affinity Regression) computes cross-attention of protein node and ligand representations and employs global pooling based on attention to compute virtual node representation for prediction (Pandey et al. 2022). Li et al. employ a linear layer to learn assignment weights for each node, serving as an attentive pooling layer (APL) to learn hierarchical structures and compress the linear graph of protein–ligand complexes (Li et al. 2023a). To execute graph readout, GraphSite uses Set2Set as a global pooling function to decrease the size of a graph to a single node (Vinyals et al. 2016; Shi et al. 2022). To capture the distribution of each dimension in node representations, graphDelta applies a fuzzy histogram technique, which predefines a set of bins along with associated membership functions and applies them to the node representations (Karlov et al. 2020). This yields a graph representation vector with a length equal to the product of the node representation length and the number of bins specified. ProteinGCN incorporates both local and global pooling based on grouping and cascading that organize nodes by the proteins’ inherent residue structures, effectively learning residue and decoy embeddings (Sanyal et al. 2020).
GraphBAR uses a parallel GNN architecture with multi-channel feature extraction followed by fusion (Son and Kim 2021), whereas GraphSite extends the standard convolution-readout architecture with JK connections (Shi et al. 2022). Notably, as illustrated in Fig. 5, many models adopt a unique two-channel GNN architecture to include two molecular structure networks in protein–ligand affinity or PPI predictions (Torng and Altman 2019; Shen et al. 2021; Jiang et al. 2021; Nikolaienko et al. 2022; Li et al. 2022b; Pandey et al. 2022; Réau et al. 2023; Xia et al. 2023; Huang et al. 2023). These models extract features and learn representations of proteins or ligands using two GNNs, and then fuse them using specific rules with a concatenation operation. DeepRank learns the representations of two interacting proteins using hierarchical pooling operators based on the Markov Cluster Algorithm (MCL) or Louvain community detection algorithm and a two-branch hierarchical GNN, then flattened representations, merged, and fed into the FC networks (Réau et al. 2023). The geometry-aware interactive graph neural network (GIANT) employs the decoupled cross pooling module to learn initial representations for protein and ligand molecules, and the fused global pooling module to fuse their representations and capture the interaction between proteins and ligands (Li et al. 2023b). Both of these pooling modules are implemented based on attention mechanisms and Gated Recurrent Unit (GRU).
Discussion: In general, simple function readout functions and their combinations are preferred in current studies when considering molecular structures for protein–ligand structure prediction (Torng and Altman 2019; Cho et al. 2020; Son and Kim 2021; Shen et al. 2021; Gligorijević et al. 2021; Jiang et al. 2021; Lai and Xu 2022; Li et al. 2022b; Yang et al. 2023b; Xia et al. 2023; Huang et al. 2023). Global pooling operators can complement the local embeddings learned by GCNs by capturing global information. In SIGN, for instance, the PiPool is developed in a semi-supervised manner to model global long-range interactions between atoms in proteins and ligands, thereby enhancing the effectiveness and generalizability of the model (Li et al. 2021a). Clustering pooling, on the other hand, is ideal for detecting clusters in proteins with closer 3D spatial proximity or stronger interactions, as it can integrate distance during the clustering process (Réau et al. 2023). Jiang et al. reviewed ML-based models’ performance on PDBBind’s V2016 and V2013 core sets for protein–ligand interaction prediction (Jiang et al. 2021). Concurrently, Li et al. benchmarked SIGN against multiple ML-based methods, assessing their spatial and long-range interaction capture (Li et al. 2021a). Findings revealed non-spatial methods as least effective, ML methods showed constrained generalization on external datasets, while GNN-based approaches, integrating both features, outperformed. GNNs with pooling exhibit a parameter count that is 1–2 orders of magnitude lower than that of 3D convolutional neural networks (Sanyal et al. 2020). Pooling operators also contribute to the interpretability in the research of molecular structure graphs. The PSG-BAR uses attention-based global pooling to identify surface residues as critical residues for protein–ligand binding by ranking attention scores (Pandey et al. 2022). Jiang et al. investigated intermolecular interactions by computing and visualizing the similarity of protein–ligand complex embeddings obtained through pooling (Jiang et al. 2021). They further explained how atomic pairwise interactions in protein–ligand complexes, as reflected by the weights of atom-residue pairs within the pooling module, influenced the final prediction. This analysis validated the consistency between the computational model and expert knowledge. In related tasks involving protein structure graphs, the parameter sensitivity of pooling operators remains a significant factor. In Struct2GO, a pooling operator based on self-attention mechanism is employed for protein function prediction (Jiao et al. 2023). However, both excessively high and low pooling ratios fail to achieve optimal performance.
5 Conclusion
Graph neural networks that process graphs with neural networks excel in various graph-related tasks. The GraphRec framework, for instance, successes in social recommendation by analyzing user-item graph interactions, surpassing baseline methods in real-world datasets (Fan et al. 2019). In bioinformatics, GNNs have proven instrumental for predicting associations between long non-coding RNAs (lncRNAs) and diseases, as well as inferring the relationship among lncRNAs, microRNAs (miRNAs), and diseases (Sheng et al. 2023a, b). Beyond node and edge-level analysis, GNNs extend their utility to graph-level tasks, exemplified by their ability to leverage protein structural information for accurate functional prediction, thereby opening new avenues in computational biology (Sanders et al. 2023). Graph pooling is a key module to bridge node representation learning and specific graph-level tasks. This review presents a comprehensive survey of pooling operators in GNNs and their applications in omics from multiple perspectives. The global pooling and hierarchical pooling operators are classified and summarized, along with the details of prevalent methods of hierarchical pooling operator, including clustering pooling, node selection pooling, edge pooling and hybrid pooling. Besides, we discussed existing benchmark datasets and fair evaluation frameworks for graph pooling. Representative applications of graph pooling operators in graphs of molecule structures and medical images for drug discovery and disease diagnosis are also summarized. Via several examples of brain connectivity network analysis and protein–ligand affinity prediction, we demonstrated how graph pooling could benefit omics applications with prediction performance and interpretability.
Despite significant progress in graph-level learning, there are still unresolved challenges for graph pooling. Lastly, we discuss some prospective research directions of graph pooling to encourage continued investigation in this field.
5.1 Large-scale graphs and graph foundation models
Despite achieving state-of-the-art performance on many small benchmark datasets, graph pooling operators face significant challenges when it comes to large-scale datasets. The requirements for expressive power, as well as the computational time and space costs, become more demanding. Currently emerging graph foundation models are frontiers integrating graph structure with large language models (Tang et al. 2023; Zhao et al. 2023; Tian et al. 2024). These models are expected to possess versatile graph reasoning capabilities, including understanding basic topological graph properties, reasoning over multi-hop neighborhoods, and capturing global properties and patterns (Zhang et al. 2023c). This aligns well with the advantages exhibited by graph pooling operators, which raise the prospect of pooling as an essential component in future graph foundation models for handling large-scale graph-level tasks. In omics research, scGPT, a foundational model for single-cell biology, explores the applicability of foundational models in advancing cell biology and genetics research (Cui et al. 2024). Current findings indicate that scGPT effectively extracts key biological insights related to genes and cells, excelling in downstream applications like multi-omics integration and gene network inference (Cui et al. 2024). Graph pooling operators are expected to enhance these downstream applications and be integrated into graph-based foundational models.
5.2 Expressive sparse graph pooling
Clustering pooling methods maintain the expressive power of message passing layers through dense cluster assignment matrices with row sums of one, after a suitable normalization (Ying et al. 2018; Bianchi and Lachi 2024). Conversely, sparse graph pooling operators, dependent on node selection, will yield assignment matrices where not all row sums are one, regardless of the scoring computation (Lee et al. 2019; Grattarola et al. 2022; Bianchi and Lachi 2024). Despite the presence of expressive sparse operators, a common limitation is the inability to directly specify the number of supernodes.
5.3 Graph embedding for complex graphs
All graph pooling operators discussed in this paper are assumed to work on homogeneous graphs, whether weighted or attributed. However, real-world graphs are often complex, encompassing both heterogeneous graphs (with various types of nodes and edges) and hypergraphs (where multiple nodes are linked together via hyperedges). Most existing methods for heterogeneous graph embedding focus on node embeddings, with few addressing graph-level embeddings. Consequently, heterogeneous graph pooling emerges as an intriguing solution for graph classification on heterogeneous graphs, with the challenge of preserving heterogeneity and capturing the distinct features of various heterogeneous structures (Yang et al. 2022; Bing et al. 2023). Similarly, there is a need to develop solutions capable of learning hypergraph representations, employing permutation-invariant functions that aggregate node/hyperedge representations in a meaningful way for downstream tasks (Antelmi et al. 2023). Heterogeneous graphs and hypergraphs are effective tools for representing interaction and relationships in omics data. Heterogeneous nodes represent multi-omics data, heterogeneous edges integrate various biological annotations, and hyperedges reveal high-order associations of biomolecular (Deng et al. 2024). Current omics research have to simplify complex graphs into homogeneous ones due to the lack of specialized graph pooling operators for heterogeneous graphs and hypergraphs.
5.4 Unsupervised tasks
Most graph pooling operators discussed in this paper are used for supervised tasks, with few exploring unsupervised tasks like graph generation (Baek et al. 2021; Guo et al. 2023) and node clustering (Wang et al. 2022; Tsitsulin et al. 2023). Hierarchical GNN architectures, associated with hierarchical pooling, naturally fits with hierarchical network structures and tasks such as hierarchical community detection (Su et al. 2022), suggesting a potential expansion of existing operators to more unsupervised tasks or the development of new operators for such tasks. In omics research, the abundance of unlabeled data and the challenge of obtaining labels often result in annotations lagging behind data generation. Consequently, unsupervised tasks are likely to become a major research focus, or unsupervised data may enhance supervised tasks. One approach is to use a general GNN architecture with different modules for different tasks like classification and clustering (Hou et al. 2024; Hu et al. 2024).
5.5 Interpretability
In domain-specific scenarios, the interpretability of graph pooling operators—and graph neural networks by extension—relies on two key aspects: 1) identifying correlations between model outputs and original input features and 2) revealing domain-specific insights for the relevant features identified by the model. Explanation components in operators for self-explaining and domain-knowledge driven interpretability should thus become fruitful future research directions (Yang et al. 2023a; Karim et al. 2023; Wysocka et al. 2023). In omics research, one available domain-specific interpretability reference is bio-centric model interpretability. It grounds interpretability in a biomedical context through three aspects: architecture-centric interpretability, output-centric interpretability, and post-hoc evaluation of biological plausibility (Wysocka et al. 2023). These are assessed via four components: integration of different data modalities, schema-level model representation, integration of domain knowledge, and post-hoc explainability methods (Wysocka et al. 2023). The schema-level representation requires GNNs to understand graph representations and transformations and communicate such transformation during the post-hoc inference, where clustering pooling operators are essential. Post-hoc explainability methods necessitate model architectures that mirror biological relationships, track information flow and find the importance of model’s components, highlighting the importance of selection pooling operators.
We hope that this paper can provide a useful framework for researchers who are interested in graph pooling. Although GNN with pooling now plays a key role in many biological tasks and produces outstanding outcomes, the pooling operator among them is still confined to a few methods. The exploration of varied graph pooling operators’ potential in omics studies is ongoing.
Data availability
All data used in this work can be found in the corresponding repository (https://github.com/Hou-WJ/Graph-Pooling-Operators-and-Bioinformatics-Applications).
References
Adnan M, Kalra S, Tizhoosh HR (2020) Representation learning of histopathology images using graph neural networks. In: IEEE computer society conference on computer vision and pattern recognition workshops. IEEE Computer Society, pp 4254–4261
Aggarwal M, Murty MN (2021) Region and relations based multi attention network for graph classification. In: 2020 25th International conference on pattern recognition (ICPR). IEEE, pp 8101–8108
Akhtar N, Ragavendran U (2020) Interpretation of intelligence in CNN-pooling processes: a methodological survey. Neural Comput Appl 32:879–898. https://doi.org/10.1007/s00521-019-04296-5
Antelmi A, Cordasco G, Polato M et al (2023) A survey on hypergraph representation learning. ACM Comput Surv 56:1–38. https://doi.org/10.1145/3605776
Antonelli L, Guarracino MR, Maddalena L, Sangiovanni M (2019) Integrating imaging and omics data: a review. Biomed Signal Process Control 52:264–280
Atwood J, Towsley D (2016) Diffusion-convolutional neural networks. In: Lee D, Sugiyama M, Luxburg U et al (eds) Advances in neural information processing systems. Curran Associates, Inc.
Bacciu D, Errica F, Micheli A, Podda M (2020) A gentle introduction to deep learning for graphs. Neural Netw 129:203–221
Bacciu D, Conte A, Grossi R et al (2021) K-plex cover pooling for graph neural networks. Data Min Knowl Discov 35:2200–2220. https://doi.org/10.1007/s10618-021-00779-z
Bacciu D, Conte A, Landolfi F (2023) Generalizing downsampling from regular data to graphs. Proc AAAI Conf Artif Intell 37:6718–6727. https://doi.org/10.1609/aaai.v37i6.25824
Bacciu D, Di Sotto L (2019) A non-negative factorization approach to node pooling in graph convolutional neural networks. In: International conference of the Italian association for artificial intelligence. pp 294–306
Baek J, Kang M, Hwang SJ (2021) Accurate learning of graph representations with graph multiset pooling. In: International conference on learning representations
Bai L, Jiao Y, Cui L et al (2021) Learning graph convolutional networks based on quantum vertex information propagation. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2021.3106804
Bandyopadhyay S, Aggarwal M, Murty MN (2020) Self-supervised hierarchical graph neural network for graph representation. In: 2020 IEEE international conference on big data (big data). IEEE, pp 603–608
Bi L, Sun X, Zhou F, Dong J (2021) Hierarchical Triplet Attention Pooling for Graph Classification. In: 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, pp 624–631
Bianchi FM, Lachi V (2024) The expressive power of pooling in graph neural networks. Adv Neural Inform Process Syst 36:1
Bianchi FM, Gallicchio C, Micheli A (2022a) Pyramidal reservoir graph neural network. Neurocomputing 470:389–404. https://doi.org/10.1016/j.neucom.2021.04.131
Bianchi FM, Grattarola D, Livi L, Alippi C (2022b) Hierarchical representation learning in graph neural networks with node decimation pooling. IEEE Trans Neural Netw Learn Syst 33:2195–2207. https://doi.org/10.1109/TNNLS.2020.3044146
Bing R, Yuan G, Zhu M et al (2023) Heterogeneous graph neural networks analysis: a survey of techniques, evaluations and applications. Artif Intell Rev 56:8003–8042. https://doi.org/10.1007/s10462-022-10375-2
Bodnar C, Cangea C, Liò P (2021) Deep graph mapper: seeing graphs through the neural lens. Front Big Data 4:680535. https://doi.org/10.3389/fdata.2021.680535
Borgwardt KM, Ong CS, Schönauer S et al (2005) Protein function prediction via graph kernels. Bioinformatics 21:i47–i56. https://doi.org/10.1093/bioinformatics/bti1007
Bravo-Hermsdorff G, Gunderson LM (2019) A unifying framework for spectrum-preserving graph sparsification and coarsening. In: Advances in neural information processing systems
Bronstein MM, Bruna J, Lecun Y et al (2017) Geometric deep learning: going beyond Euclidean data. IEEE Signal Process Mag 34:18–42
Bruna J, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and locally connected networks on graphs. In: International conference on learning representations
Buterez D, Janet JP, Kiddle SJ et al (2022) Graph neural networks with adaptive readouts. In: Koyejo S, Mohamed S, Agarwal A et al (eds) Advances in neural information processing systems. Curran Associates, Inc., pp 19746–19758
Cangea C, Veličković P, Jovanović N, et al. (2018) Towards sparse hierarchical graph classifiers. arXiv preprint arXiv:181101287
Chen C, Li K, Wei W et al (2022a) Hierarchical graph neural networks for few-shot learning. IEEE Trans Circuits Syst Video Technol 32:240–252. https://doi.org/10.1109/TCSVT.2021.3058098
Chen F, Pan S, Jiang J, et al. (2019a) DAGCN: dual attention graph convolutional networks. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Chen T, Bian S, Sun Y (2019b) Are powerful graph neural nets necessary? A dissection on graph classification. arXiv preprint arXiv:190504579
Chen L, Chen Z, Bruna J (2021) Learning the relevant substructures for tasks on graph data. In: ICASSP 2021–2021 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 8528–8532
Chen K, Song J, Liu S, et al. (2022b) Distribution knowledge embedding for graph pooling. In: IEEE Trans Knowl Data Eng. pp 7898–7908. https://doi.org/10.1109/TKDE.2022.3208063
Chen Y, Bian Y, Zhang J, et al. (2022c) Diversified multiscale graph learning with graph self-correction. In: Cloninger A, Doster T, Emerson T, et al. (eds) Proceedings of topological, algebraic, and geometric learning workshops 2022. PMLR, pp 48–54
Chereda H, Bleckmann A, Menck K et al (2021) Explaining decisions of graph convolutional neural networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer. Genome Med. https://doi.org/10.1186/s13073-021-00845-7
Cheung M, Shi J, Wright O et al (2020) Graph signal processing and deep learning: convolution, pooling, and topology. IEEE Signal Process Mag 37:139–149. https://doi.org/10.1109/MSP.2020.3014594
Cheung M, Shi J, Jiang L, et al. (2019) Pooling in graph convolutional neural networks. In: 2019 53rd Asilomar conference on signals, systems, and computers. IEEE, pp 462–466
Cho H, Lee EK, Choi IS (2020) Layer-wise relevance propagation of InteractionNet explains protein–ligand interactions at the atom level. Sci Rep 10:21155. https://doi.org/10.1038/s41598-020-78169-6
Cui H, Wang C, Maan H et al (2024) scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods. https://doi.org/10.1038/s41592-024-02201-0
Dai H, Li L, Zeng T, Chen L (2019) Cell-specific network constructed by single-cell RNA sequencing data. Nucleic Acids Res 47:e62. https://doi.org/10.1093/nar/gkz172
Debnath AK, Lopez de Compadre RL, Debnath G et al (1991) Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds: correlation with molecular orbital energies and hydrophobicity. J Med Chem 34:786–797
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in neural information processing systems
Deng C, Li H-D, Zhang L-S et al (2024) Identifying new cancer genes based on the integration of annotated gene sets via hypergraph neural networks. Bioinformatics 40:i511–i520. https://doi.org/10.1093/bioinformatics/btae257
Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29:1944–1957. https://doi.org/10.1109/TPAMI.2007.1115
Di D, Zhang J, Lei F et al (2022) Big-hypergraph factorization neural network for survival prediction from whole slide image. IEEE Trans Image Process 31:1149–1160. https://doi.org/10.1109/TIP.2021.3139229
Diehl F, Brunner T, Le MT, Knoll A (2019) Towards graph pooling by edge contraction. In: ICML 2019 workshop on learning and reasoning with graph-structured data
Diehl F (2019) Edge contraction pooling for graph neural networks. arXiv preprint arXiv:190510990
Dobson PD, Doig AJ (2003) Distinguishing enzyme structures from non-enzymes without alignments. J Mol Biol 330:771–783. https://doi.org/10.1016/S0022-2836(03)00628-4
Duan Y, Wang J, Ma H, Sun Y (2022) Residual convolutional graph neural network with subgraph attention pooling. Tsinghua Sci Technol 27:653–663. https://doi.org/10.26599/TST.2021.9010058
Duroux D, Wohlfart C, Van Steen K et al (2023) Graph-based multi-modality integration for prediction of cancer subtype and severity. Sci Rep 13:19653. https://doi.org/10.1038/s41598-023-46392-6
Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J et al. (2015) Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems
Dwivedi VP, Joshi CK, Luu AT et al (2023) Benchmarking graph neural networks. J Mach Learn Res 24:1–48
Errica F, Podda M, Bacciu D, Micheli A (2020) A fair comparison of graph neural networks for graph classification. In: International conference on learning representations
Fan X, Gong M, Xie Y et al (2020) Structured self-attention architecture for graph-level representation learning. Pattern Recognit 100:107084. https://doi.org/10.1016/j.patcog.2019.107084
Fan W, Ma Y, Li Q, et al. (2019) Graph neural networks for social recommendation. In: The World Wide Web conference. pp 417–426
Ferludin O, Eigenwillig A, Blais M et al. (2023) TF-GNN: graph neural networks in TensorFlow. CoRR abs/2207.03522:
Fey M, Lenssen JE (2019) Fast graph representation learning with PyTorch geometric. In: ICLR workshop on representation learning on graphs and manifolds
Fey M, Lenssen JE, Weichert F, Müller H (2018) SplineCNN: fast geometric deep learning with continuous B-Spline Kernels. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 869–877
Galland A, Marc Lelarge (2021) Graph pooling by edge cut. https://openreview.net/forum?id=om1guSP_ray
Gama F, Marques AG, Leus G, Ribeiro A (2019) Convolutional neural network architectures for signals supported on graphs. IEEE Trans Signal Process 67:1034–1049. https://doi.org/10.1109/TSP.2018.2887403
Gao H, Ji S (2022) Graph U-nets. IEEE Trans Pattern Anal Mach Intell 44:4948–4960. https://doi.org/10.1109/TPAMI.2021.3081010
Gao H, Liu Y, Ji S (2021a) Topology-aware graph pooling networks. IEEE Trans Pattern Anal Mach Intell 43:4512–4518. https://doi.org/10.1109/TPAMI.2021.3062794
Gao X, Dai W, Li C et al (2021b) Multiscale representation learning of graph data with node affinity. IEEE Trans Signal Inform Process Netw 7:30–44. https://doi.org/10.1109/TSIPN.2020.3044913
Gao X, Dai W, Li C et al (2022a) iPool—information-based pooling in hierarchical graph neural networks. IEEE Trans Neural Netw Learn Syst 33:5032–5044. https://doi.org/10.1109/TNNLS.2021.3067441
Gao Y, Tang Y, Zhang H et al (2022b) Sex differences of cerebellum and cerebrum: evidence from graph convolutional network. Interdiscip Sci 14:532–544. https://doi.org/10.1007/s12539-021-00498-5
Gao Z, Lu Z, Wang J et al (2022c) A convolutional neural network and graph convolutional network based framework for classification of breast histopathological images. IEEE J Biomed Health Inform 26:3163–3173. https://doi.org/10.1109/JBHI.2022.3153671
Gao H, Ji S (2019) Graph U-nets. In: Proceedings of the 36th international conference on machine learning. pp 2083--2092
Gao Z, Lin H, Li StanZ (2020) LookHops: light multi-order convolution and pooling for graph classification. arXiv preprint arXiv:201215741
Gilmer J, Schoenholz SS, Riley PF et al. (2017) Neural message passing for quantum chemistry. In: Proceedings of the 34th international conference on machine learning. pp 1263–1272
Gligorijević V, Renfrew PD, Kosciolek T et al (2021) Structure-based protein function prediction using graph convolutional networks. Nat Commun 12:3168. https://doi.org/10.1038/s41467-021-23303-9
Godwin* J, Keck* T, Battaglia P et al. (2020) Jraph: a library for graph neural networks in jax.
Gong W, Yan Q (2021) Graph-based deep learning frameworks for molecules and solid-state materials. Comput Mater Sci 195:110332. https://doi.org/10.1016/j.commatsci.2021.110332
Gopinath K, Desrosiers C, Lombaert H (2022) Learnable pooling in graph convolutional networks for brain surface analysis. IEEE Trans Pattern Anal Mach Intell 44:864–876. https://doi.org/10.1109/TPAMI.2020.3028391
Grattarola D, Alippi C (2021) Graph neural networks in TensorFlow and Keras with Spektral [Application Notes]. IEEE Comput Intell Mag 16:99–106. https://doi.org/10.1109/MCI.2020.3039072
Grattarola D, Zambon D, Bianchi FM, Alippi C (2022) Understanding pooling in graph neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/tnnls.2022.3190922
Guo Y, Zou D, Lerman G (2023) An unpooling layer for graph generation. In: Ruiz F, Dy J, van de Meent J-W (eds) Proceedings of the 26th International conference on artificial intelligence and statistics. PMLR, pp 3179–3209
Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
Helma C, King RD, Kramer S, Srinivasan A (2001) The predictive toxicology challenge 2000–2001. Bioinformatics 17:107–108
Henaff M, Bruna J, LeCun Y (2015) Deep convolutional networks on graph-structured data. arXiv preprint arXiv:150605163
Hetzel L, Fischer DS, Günnemann S, Theis FJ (2021) Graph representation learning for single-cell biology. Curr Opin Syst Biol 28:100347
Hou W, Wang Y, Zhao Z et al (2024) Hierarchical graph neural network with subgraph perturbations for key gene cluster discovery in cancer staging. Complex Intell Syst 10:111–128. https://doi.org/10.1007/s40747-023-01068-6
Hu J, Cao L, Li T et al (2021a) GAT-LI: a graph attention network based learning and interpreting method for functional brain network classification. BMC Bioinform 22:379. https://doi.org/10.1186/s12859-021-04295-1
Hu Y, Rong J, Xu Y et al (2024) Unsupervised and supervised discovery of tissue cellular neighborhoods from cell phenotypes. Nat Methods 21:267–278. https://doi.org/10.1038/s41592-023-02124-2
Hu F, Zhu Y, Wu S, et al. (2019) Hierarchical graph convolutional networks for semi-supervised node classification. In: Proceedings of the 28th international joint conference on artificial intelligence. pp 4532–4539
Hu W, Fey M, Zitnik M et al. (2020) Open graph benchmark: datasets for machine learning on graphs. In: Advances in neural information processing systems. pp 22118–22133
Hu J, Qian S, Fang Q et al. (2021b) Efficient graph deep learning in TensorFlow with tf_geometric. In: Shen HT, Zhuang Y, Smith JR, et al. (eds) MM ‘21: ACM multimedia conference, Virtual Event, China, October 20–24, 2021. ACM, pp 3775–3778
Hu W, Fey M, Ren H et al. (2021c) OGB-LSC: a large-scale challenge for machine learning on graphs. arXiv preprint arXiv:210309430
Huang Y, Wuchty S, Zhou Y, Zhang Z (2023) SGPPI: structure-aware prediction of protein–protein interactions in rigorous conditions with graph convolutional network. Brief Bioinform. https://doi.org/10.1093/bib/bbad020
Huang J, Li Z, Li N et al. (2019) Attpool: Towards hierarchical feature representation in graph convolutional networks via attention mechanism. In: Proceedings of the IEEE international conference on computer vision. Institute of Electrical and Electronics Engineers Inc., pp 6479–6488
Huang H, Cai M, Lin L et al. (2021) Graph-based pyramid global context reasoning with a saliency-aware projection for COVID-19 lung infections segmentation. In: ICASSP, IEEE international conference on acoustics, speech and signal processing—proceedings. Institute of Electrical and Electronics Engineers Inc., pp 1050–1054
Itoh TD, Kubo T, Ikeda K (2022) Multi-level attention pooling for graph neural networks: unifying graph representations with multiple localities. Neural Netw 145:356–373. https://doi.org/10.1016/j.neunet.2021.11.001
Jiang D, Hsieh CY, Wu Z et al (2021) InteractionGraphNet: a novel and efficient deep graph representation learning framework for accurate protein-ligand interaction predictions. J Med Chem 64:18209–18232. https://doi.org/10.1021/acs.jmedchem.1c01830
Jiang J, Lei F, Dai Q, Li Z (2020) Graph pooling in graph neural networks with node feature correlation. In: Proceedings of the 3rd international conference on data science and information technology. Association for Computing Machinery, pp 105–110
Jiao P, Wang B, Wang X et al (2023) Struct2GO: protein function prediction based on graph pooling algorithm and AlphaFold2 structure information. Bioinformatics. https://doi.org/10.1093/bioinformatics/btad637
Jin S, Zeng X, Xia F et al (2021) Application of deep learning methods in biological networks. Brief Bioinform 22:1902–1917
Jo J, Baek J, Lee S et al. (2021) Edge representation learning with hypergraphs. In: Advances in neural information processing systems
Jumper J, Evans R, Pritzel A et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. https://doi.org/10.1038/s41586-021-03819-2
Karim MR, Islam T, Shajalal M et al (2023) Explainable AI for bioinformatics: methods, tools and applications. Brief Bioinform. https://doi.org/10.1093/bib/bbad236
Karlov DS, Sosnin S, Fedorov MV, Popov P (2020) graphDelta: MPNN scoring function for the affinity prediction of protein-ligand complexes. ACS Omega 5:5150–5159. https://doi.org/10.1021/acsomega.9b04162
Kaur P, Singh A, Chana I (2021) Computational techniques and tools for omics data analysis: state-of-the-art, challenges, and future directions. Arch Computat Methods Eng 28:4595–4631. https://doi.org/10.1007/s11831-021-09547-0
Kazius J, McGuire R, Bursi R (2005) Derivation and validation of toxicophores for mutagenicity prediction. J Med Chem 48:312–320
Kersting K, Kriege NM, Morris C et al. (2016) Benchmark data sets for graph kernels. http://graphkernels.cs.tu-dortmund.de
Khasahmadi AH, Hassani K, Moradi P et al. (2020) Memory-based graph networks. In: International conference on learning representations
Knyazev B, Taylor GW, Amer MR (2019) Understanding attention and generalization in graph neural networks. In: Advances in neural information processing systems
Kriege N, Mutzel P (2012) Subgraph matching kernels for attributed graphs. In: Proceedings of the 29th international coference on international conference on machine learning. pp 291–298
Kuijjer ML, Tung MG, Yuan GC et al (2019) Estimating sample-specific regulatory networks. iScience 14:226–240. https://doi.org/10.1016/j.isci.2019.03.021
Lai B, Xu J (2022) Accurate protein function prediction via graph attention networks with predicted structure information. Brief Bioinform. https://doi.org/10.1093/bib/bbab502
Lazaros K, Koumadorakis DE, Vlamos P, Vrahatis AG (2024) Graph neural network approaches for single-cell data: a recent overview. Neural Comput Appl. https://doi.org/10.1007/s00521-024-09662-6
Lee JB, Rossi R, Kong X (2018) Graph classification using structural attention. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. ACM, New York, NY, USA, pp 1666–1674
Lee J, Lee I, Kang J (2019) Self-attention graph pooling. In: Proceedings of the 36th international conference on machine learning. pp 3734–3743
Lee D, Kim S, Lee S et al. (2021) Learnable structural semantic readout for graph classification. In: 2021 IEEE International conference on data mining (ICDM). IEEE, pp 1180–1185
Levie R, Monti F, Bresson X, Bronstein MM (2019) CayleyNets: graph convolutional neural networks with complex rational spectral filters. IEEE Trans Signal Process 67:97–109. https://doi.org/10.1109/TSP.2018.2879624
Li B, Nabavi S (2024) A multimodal graph neural network framework for cancer molecular subtype classification. BMC Bioinform 25:27. https://doi.org/10.1186/s12859-023-05622-4
Li X, Wu H (2021) Toward graph classification on structure property using adaptive motif based on graph convolutional network. J Supercomput 77:8767–8786. https://doi.org/10.1007/s11227-021-03628-4
Li R, Wang S, Zhu F, Huang J (2018) Adaptive graph convolutional neural networks. Proc AAAI Conf Artif Intell. https://doi.org/10.1609/aaai.v32i1.11691
Li X, Zhou Y, Dvornek N et al (2021b) BrainGNN: interpretable brain graph neural network for fMRI analysis. Med Image Anal 74:102233. https://doi.org/10.1016/j.media.2021.102233
Li R, Li L, Xu Y, Yang J (2022a) Machine learning meets omics: applications and perspectives. Brief Bioinform 23:bbab460
Li XS, Liu X, Lu L et al (2022b) Multiphysical graph neural network (MP-GNN) for COVID-19 drug design. Brief Bioinform 23:bbac231. https://doi.org/10.1093/bib/bbac231
Li ZP, Su HL, Zhu XB et al (2022c) Hierarchical graph pooling with self-adaptive cluster aggregation. IEEE Trans Cogn Dev Syst 14:1198–1207. https://doi.org/10.1109/TCDS.2021.3100883
Li M, Cao Y, Liu X, Ji H (2023a) Structure-aware graph attention diffusion network for protein–ligand binding affinity prediction. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2023.3314928
Li S, Zhou J, Xu T et al (2023b) GIANT: protein-ligand binding affinity prediction via geometry-aware interactive graph neural network. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2023.3314502
Li J, Meng H, Rong Y et al. (2019) Semi-supervised graph classification: a hierarchical graph perspective. In: The web conference 2019—proceedings of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc, pp 972–982
Li J, Ma Y, Wang Y et al. (2020a) Graph pooling with representativeness. In: Proceedings—IEEE international conference on data mining, ICDM. Institute of Electrical and Electronics Engineers Inc., pp 302–311
Li M, Chen S, Zhang Y, Tsang IW (2020b) Graph cross networks with vertex infomax pooling. In: Advances in neural information processing systems
Li X, Zhou Y, Dvornek NC et al. (2020c) Pooling regularized graph neural network for fMRI biomarker analysis. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer Science and Business Media Deutschland GmbH, pp 625–635
Li S, Zhou J, Xu T et al. (2021a) Structure-aware interactive graph neural networks for the prediction of protein-ligand binding affinity. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, pp 975–985
Liang B, Gong H, Lu L, Xu J (2022) Risk stratification and pathway analysis based on graph neural network and interpretable algorithm. BMC Bioinform 23:394. https://doi.org/10.1186/s12859-022-04950-1
Liang Y, Zhang Y, Gao D, Xu Q (2020) MxPool: multiplex pooling for hierarchical graph representation learning. arXiv preprint arXiv:200406846
Liao W, Bak-Jensen B, Pillai JR et al (2022) A review of graph neural networks and their applications in power systems. J Modern Power Syst Clean Energy 10:345–360. https://doi.org/10.35833/MPCE.2021.000058
Liu X, Wang Y, Ji H et al (2016) Personalized characterization of diseases using sample-specific networks. Nucleic Acids Res 44:e164. https://doi.org/10.1093/nar/gkw772
Liu N, Jian S, Li D et al (2021) Hierarchical adaptive pooling by capturing high-order dependency for graph representation learning. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2021.3133646
Liu C, Zhan Y, Yu B et al (2023) On exploring node-feature and graph-structure diversities for node drop graph pooling. Neural Netw 167:559–571. https://doi.org/10.1016/j.neunet.2023.08.046
Liu T, Fang ZY, Zhang Z et al (2024) A comprehensive overview of graph neural network-based approaches to clustering for spatial transcriptomics. Comput Struct Biotechnol J 23:106–128
Liu C, Zhan Y, Li C et al. (2022a) Graph pooling for graph neural networks: progress, challenges, and opportunities. arXiv preprint arXiv:220407321
Liu N, Jian S, Li D, Xu H (2022b) Unsupervised hierarchical graph pooling via substructure-sensitive mutual information maximization. In: Proceedings of the 31st ACM international conference on information & knowledge management. Association for Computing Machinery, New York, NY, USA, pp 1299–1308
Loukas A (2019) Graph reduction with spectral and cut guarantees. J Mach Learn Res 20:1–42
Lu M, Xiao Z, Li H et al (2022) Feature pyramid-based graph convolutional neural network for graph classification. J Syst Architect 128:102562. https://doi.org/10.1016/j.sysarc.2022.102562
Lucibello C (2021) GraphNeuralNetworks.jl: a geometric deep learning library for the Julia programming language
Luzhnica E, Day B, Lio P (2019) Clique pooling for graph classification. arXiv preprint arXiv:190400374
Ma T, Chen J (2021) Unsupervised learning of graph hierarchical abstractions with differentiable coarsening and optimal transport. Proc AAAI Conf Artif Intell 35:8856–8864. https://doi.org/10.1609/aaai.v35i10.17072
Ma Y, Wang S, Aggarwal CC, Tang J (2019) Graph convolutional networks with EigenPooling. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. ACM, New York, NY, USA, pp 723–731
Ma Z, Xuan J, Wang YG et al. (2020) Path integral based convolution and pooling for graph neural networks. In: Advances in neural information processing systems. pp 16421–16433
Makarov I, Kiselev D, Nikitinsky N, Subelj L (2021) Survey on graph embeddings and their applications to machine learning problems on graphs. PeerJ Comput Sci 7:1–62. https://doi.org/10.7717/peerj-cs.357
Maria Bianchi F, Grattarola D, Alippi C (2020) Spectral clustering with graph neural networks for graph pooling. In: Proceedings of the 37th international conference on machine learning. pp 874–883
Martins AFT, Astudillo RF (2016) From softmax to sparsemax: a sparse model of attention and multi-label classification. In: Balcan MF, Weinberger KQ (eds) International conference on machine learning. JMLR-Journal Machine Learning Research, 1269 Law St, San Diego, CA, United States, pp 1614–1623
Meltzer P, Mallea MDG, Bentley PJ (2019) PiNet: a permutation invariant graph neural network for graph classification. arXiv preprint arXiv:190503046
Mesquita D, Souza AH, Kaski S (2020) Rethinking pooling in graph neural networks. In: Advances in neural information processing systems. pp 2220–2231
Monti F, Boscaini D, Masci J et al. (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5425–5434
Morris C, Ritzert M, Fey M et al. (2019) Weisfeiler and Leman go neural: higher-order graph neural networks. In: Proceedings of the AAAI conference on artificial intelligence. pp 4602–4609
Morris C, Kriege NM, Bause F et al. (2020) TUDataset: a collection of benchmark datasets for learning with graphs. In: ICML 2020 workshop on graph representation learning and beyond (GRL+ 2020)
Murphy RL, Srinivasan B, Rao V, Ribeiro B (2019) Relational pooling for graph representations. In: Proceedings of the 36th international conference on machine learning. pp 4663–4673
Muzio G, O’Bray L, Borgwardt K (2021) Biological network analysis with deep learning. Brief Bioinform 22:1515–1530
Navarin N, Tran D Van, Sperduti A (2019) Universal readout for graph convolutional neural networks. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–7
Nikolaienko T, Gurbych O, Druchok M (2022) Complex machine learning model needs complex testing: examining predictability of molecular binding affinity by a graph neural network. J Comput Chem 43:728–739. https://doi.org/10.1002/jcc.26831
Nouranizadeh A, Matinkia M, Rahmati M, Safabakhsh R (2021) Maximum entropy weighted independent set pooling for graph neural networks. arXiv preprint arXiv:210701410
Noutahi E, Beaini D, Horwood J et al. (2019) Towards interpretable sparse graph representation learning with Laplacian pooling. arXiv preprint arXiv:190511577
Ronneberger Olaf and Fischer P and BT (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab Nassir and Hornegger J and WWM and FAF (eds) Medical image computing and computer-assisted intervention—MICCAI 2015. Springer, Cham, pp 234–241
Orsini F, Frasconi P, De Raedt L (2015) Graph invariant kernels. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence. In: IJCAI-Int Joint Conf Artif Intell, pp 3756–3762
Özen Y, Aksoy S, Kösemehmetoğlu K et al. (2020) Self-supervised learning with graph neural networks for region of interest retrieval in histopathology. In: Proceedings—international conference on pattern recognition. Institute of Electrical and Electronics Engineers Inc., pp 6329–6334
Pandey M, Radaeva M, Mslati H et al (2022) Ligand binding prediction using protein structure graphs and residual graph attention networks. Molecules 27:5114. https://doi.org/10.3390/molecules27165114
Pang S, Pang C, Zhao L et al (2021a) SpineParseNet: spine parsing for volumetric MR image by a two-stage segmentation framework with semantic image representation. IEEE Trans Med Imaging 40:262–273. https://doi.org/10.1109/TMI.2020.3025087
Pang Y, Zhao Y, Li D (2021b) Graph pooling via coarsened graph infomax. In: SIGIR 2021—proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. Association for Computing Machinery, Inc, pp 2177–2181
Papp PA, Martinkus K, Faber L, Wattenhofer R (2021) DropGNN: random dropouts increase the expressiveness of graph neural networks. In: Ranzato M, Beygelzimer A, Dauphin Y, et al. (eds) Advances in neural information processing systems. Curran Associates, Inc., pp 21997–22009
Pati P, Jaume G, Foncubierta-Rodríguez A et al (2022) Hierarchical graph representations in digital pathology. Med Image Anal 75:102264. https://doi.org/10.1016/j.media.2021.102264
Martin-Gonzalez Paula and Crispin-Ortuzar M and MF (2021) Predictive modelling of highly multiplexed tumour tissue images by graph neural networks. In: Reyes Mauricio and Henriques Abreu P and CJ and HM and ZG and RP and TL (eds) Interpretability of machine intelligence in medical image computing, and topological data analysis and its applications for medical data. Springer, Cham, pp 98–107
Pfeifer B, Saranti A, Holzinger A (2022) GNN-SubNet: disease subnetwork detection with explainable graph neural networks. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac478
Qin J, Liu L, Shen H, Hu D (2020) Uniform pooling for graph networks. Appl Sci 10:6287. https://doi.org/10.3390/app10186287
Rahmani S, Baghbani A, Bouguila N, Patterson Z (2023) Graph neural networks for intelligent transportation systems: a survey. IEEE Trans Intell Transp Syst 24:8846–8885. https://doi.org/10.1109/TITS.2023.3257759
Ramirez R, Chiu YC, Hererra A et al (2020) Classification of cancer types using graph convolutional neural networks. Front Phys. https://doi.org/10.3389/fphy.2020.00203
Ramirez R, Chiu YC, Zhang SY et al (2021) Prediction and interpretation of cancer survival using graph convolution neural networks. Methods 192:120–130. https://doi.org/10.1016/j.ymeth.2021.01.004
Ranjan E, Sanyal S, Talukdar P (2020) ASAP: adaptive structure aware pooling for learning hierarchical graph representations. In: Proceedings of the AAAI conference on artificial intelligence. pp 5470–5477
Réau M, Renaud N, Xue LC, Bonvin AMJJ (2023) DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac759
Reiser P, Neubert M, Eberhard A et al (2022) Graph neural networks for materials science and chemistry. Commun Mater 3:93. https://doi.org/10.1038/s43246-022-00315-6
Riesen K, Bunke H (2008) IAM graph database repository for graph based pattern recognition and machine learning. In: Structural, syntactic, and statistical pattern recognition: Joint IAPR International workshop, SSPR & SPR 2008, Orlando, USA, December 4–6, 2008. Proceedings. Springer, pp 287–297
Roy KK, Roy A, Mahbubur Rahman AKM et al. (2021) Structure-aware hierarchical graph pooling using information bottleneck. In: 2021 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Sánchez D, Servadei L, Kiprit GN et al (2023) A comprehensive survey on electronic design automation and graph neural networks: theory and applications. ACM Trans Des Autom Electron Syst. https://doi.org/10.1145/3543853
Sanders C, Roth A, Liebig T (2023) Curvature-based pooling within graph neural networks. arXiv preprint arXiv:230816516
Sanyal S, Anishchenko I, Dagar A et al (2020) ProteinGCN: protein model quality assessment using graph convolutional networks. BioRxiv. https://doi.org/10.1101/2020.04.06.028266
Schomburg I, Chang A, Ebeling C et al (2004) BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res 32:D431–D433. https://doi.org/10.1093/nar/gkh081
Sebenius I, Campbell A, Morgan SE et al. (2021) Multimodal graph coarsening for interpretable, MRI-based brain graph neural network. In: IEEE international workshop on machine learning for signal processing, MLSP. IEEE Computer Society
Senior AW, Evans R, Jumper J et al (2020) Improved protein structure prediction using potentials from deep learning. Nature 577:706–710. https://doi.org/10.1038/s41586-019-1923-7
Shen H, Zhang Y, Zheng C et al (2021) A cascade graph convolutional network for predicting protein–ligand binding affinity. Int J Mol Sci 22:4023. https://doi.org/10.3390/ijms22084023
Sheng N, Huang L, Lu Y et al (2023a) Data resources and computational methods for lncRNA-disease association prediction. Comput Biol Med 153:106527. https://doi.org/10.1016/j.compbiomed.2022.106527
Sheng N, Wang Y, Huang L et al (2023b) Multi-task prediction-based graph contrastive learning for inferring the relationship among lncRNAs, miRNAs and diseases. Brief Bioinform. https://doi.org/10.1093/bib/bbad276
Shervashidze N, Schweitzer P, van Leeuwen EJ et al (2011) Weisfeiler–Lehman graph kernels. J Mach Learn Res 12:2539–2561
Shi W, Singha M, Pu L et al (2022) GraphSite: ligand binding site classification with deep graph learning. Biomolecules 12:1053. https://doi.org/10.3390/biom12081053
Shuman DI, Faraji MJ, Vandergheynst P (2016) A multiscale pyramid transform for graph signals. IEEE Trans Signal Process 64:2119–2134. https://doi.org/10.1109/TSP.2015.2512529
Simonovsky M, Komodakis N (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings—30th IEEE conference on computer vision and pattern recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc., pp 29–38
Son J, Kim D (2021) Development of a graph convolutional neural network model for efficient prediction of protein-ligand binding affinities. PLoS ONE 16:e0249404. https://doi.org/10.1371/journal.pone.0249404
Song X, Zhou F, Frangi AF et al (2022) Multi-center and multi-channel pooling GCN for early AD diagnosis based on dual-modality fused brain network. IEEE Trans Med Imaging. https://doi.org/10.1109/TMI.2022.3187141
Song Y, Huang S, Wang X et al. (2024) Graph parsing networks. In: The twelfth international conference on learning representations
Stanovic Stevan and Gaüzère B and BL (2022) Maximal independent vertex set applied to graph pooling. In: Krzyzak Adam and Suen CY and TA and NN (eds) Structural, syntactic, and statistical pattern recognition. Springer, Cham, pp 11–21
Su Z, Hu Z, Li Y (2021) Hierarchical graph representation learning with local capsule pooling. ACM multimedia Asia. ACM, New York, pp 1–7
Su X, Xue S, Liu F et al (2022) A comprehensive survey on community detection with deep learning. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3137396
Sulaimany S, Khansari M, Masoudi-Nejad A et al (2018) Link prediction potentials for biological networks. Int J Data Min Bioinform 20:161–184
Sun Q, Li J, Peng H et al. (2021) SUGAR: subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. In: Proceedings of the web conference 2021. ACM, New York, NY, USA, pp 2081–2091
Tang H, Ma G, He L et al (2021) CommPOOL: an interpretable graph pooling framework for hierarchical graph representation learning. Neural Netw 143:669–677. https://doi.org/10.1016/j.neunet.2021.07.028
Tang H, Ma G, Guo L et al (2022) Contrastive Brain Network Learning via Hierarchical Signed Graph Pooling Model. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3220220
Tang J, Yang Y, Wei W et al. (2023) GraphGPT: graph instruction tuning for large language models
Tian Y, Song H, Wang Z et al (2024) Graph neural prompting with large language models. Proc AAAI Conf Artif Intell 38:19080–19088. https://doi.org/10.1609/aaai.v38i17.29875
Torng W, Altman RB (2019) Graph convolutional neural networks for predicting drug-target interactions. J Chem Inform Model 59:4131–4149. https://doi.org/10.1021/acs.jcim.9b00628
Tsitsulin A, Palowitch J, Perozzi B, Müller E (2023) Graph clustering with graph neural networks. J Mach Learn Res 24:1–21
Van PH, Thanh DH, Moore P (2021) Hierarchical pooling in graph neural networks to enhance classification performance in large datasets. Sensors 21:6070. https://doi.org/10.3390/s21186070
Vaswani A, Shazeer N, Parmar N et al. (2017) Attention is all you need. In: Advances in neural information processing systems
Veličković P, Cucurull G, Casanova A et al. (2018) Graph attention networks. In: International conference on learning representations
Vinyals O, Bengio S, Kudlur M (2016) Order matters: sequence to sequence for sets. In: International conference on learning representations
Wale N, Watson IA, Karypis G (2008) Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl Inform Syst 14:347–375. https://doi.org/10.1007/s10115-007-0103-5
Wang Z, Ji S (2023) Second-order pooling for graph neural networks. IEEE Trans Pattern Anal Mach Intell 45:6870–6880. https://doi.org/10.1109/TPAMI.2020.2999032
Wang J, Ma A, Ma Q et al (2020b) Inductive inference of gene regulatory network using supervised and semi-supervised graph neural networks. Comput Struct Biotechnol J 18:3335–3343. https://doi.org/10.1016/j.csbj.2020.10.022
Wang J, Ma A, Chang Y et al (2021a) scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses. Nat Commun 12:1882. https://doi.org/10.1038/s41467-021-22197-x
Wang T, Bai J, Nabavi S (2021b) Single-cell classification using graph convolutional networks. BMC Bioinform. https://doi.org/10.1186/s12859-021-04278-2
Wang Y, Chang D, Fu Z, Zhao Y (2022) Seeing all from a few: nodes selection using graph pooling for graph clustering. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3210370
Wang M, Zheng D, Ye Z et al. (2019) Deep graph library: a graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:190901315
Wang J, Chen RJ, Lu MY et al. (2020a) Weakly supervised prostate Tma classification via graph convolutional networks. In: 2020 IEEE 17th international symposium on biomedical imaging (ISBI). pp 239–243
Wang YG, Li M, Ma Z et al. (2020c) Haar graph pooling. In: Proceedings of the 37th international conference on machine learning. pp 9952–9962
Wei L, Zhao H, Yao Q, He Z (2021) Pooling architecture search for graph classification. In: Proceedings of the 30th ACM international conference on information & knowledge management. ACM, New York, NY, USA, pp 2091–2100
Wen H, Ding J, Jin W et al. (2022) Graph neural networks for multimodal single-cell data integration. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, pp 4153–4163
Wu Z, Ramsundar B, Feinberg EN et al (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9:513–530
Wu Z, Pan S, Chen F et al (2021b) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32:4–24. https://doi.org/10.1109/TNNLS.2020.2978386
Wu J, He J, Xu J (2019) Demo-net: degree-specific graph neural networks for node and graph classification. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, pp 406–415
Wu Z, Jain P, Wright M et al. (2021a) Representing long-range context for graph neural networks with global attention. In: Ranzato M, Beygelzimer A, Dauphin Y, et al. (eds) Advances in neural information processing systems. Curran Associates, Inc., pp 13266–13279
Wu J, Chen X, Xu K, Li S (2022) Structural entropy guided graph hierarchical pooling. In: Chaudhuri K, Jegelka S, Song L, et al. (eds) Proceedings of the 39th international conference on machine learning. PMLR, pp 24017–24030
Wysocka M, Wysocki O, Zufferey M et al (2023) A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinform. https://doi.org/10.1186/s12859-023-05262-8
Xia C, Feng S-H, Xia Y et al (2023) Leveraging scaffold information to predict protein–ligand binding affinity with an empirical graph neural network. Brief Bioinform. https://doi.org/10.1093/bib/bbac603
Xiao Z, Chen H, Xiao L et al (2024) WGDPool: a broad scope extraction for weighted graph data. Expert Syst Appl 249:123678. https://doi.org/10.1016/j.eswa.2024.123678
Xie Y, Yao C, Gong M et al (2020) Graph convolutional networks with multi-level coarsening for graph classification. Knowl Based Syst 194:105578. https://doi.org/10.1016/j.knosys.2020.105578
Xinyi Z, Chen L (2018) Capsule graph neural network. In: International conference on learning representations
Xu Y, Wang J, Guang M et al (2022) Multistructure graph classification method with attention-based pooling. IEEE Trans Comput Soc Syst. https://doi.org/10.1109/TCSS.2022.3169219
Xu K, Li C, Tian Y et al. (2018) Representation learning on graphs with jumping knowledge networks. In: Proceedings of the 35th international conference on machine learning. pp 5453–5462
Xu K, Hu W, Leskovec J, Jegelka S (2019) How powerful are graph neural networks? In: International conference on learning representations
Yanardag P, Vishwanathan SVN (2015) Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, pp 1365–1374
Yang J, Zhao P, Rong Y et al (2021a) Hierarchical graph capsule network. Proc AAAI Conf Artif Intell 35:10603–10611. https://doi.org/10.1609/aaai.v35i12.17268
Yang C, Xiao Y, Zhang Y et al (2022) Heterogeneous network representation learning: a unified framework with survey and benchmark. IEEE Trans Knowl Data Eng 34:4854–4873. https://doi.org/10.1109/TKDE.2020.3045924
Yang Z, Zhong W, Lv Q et al (2023b) Geometric interaction graph neural network for predicting protein-ligand binding affinities from 3D structures (GIGN). J Phys Chem Lett. https://doi.org/10.1021/acs.jpclett.2c03906
Yang M, Shen Y, Qi H, Yin B (2021b) Soft-mask: adaptive substructure extractions for graph neural networks. In: Proceedings of the web conference 2021. ACM, New York, NY, USA, pp 2058–2068
Yang Z, Zhang G, Wu J et al. (2023a) A comprehensive survey of graph-level learning. arXiv preprint arXiv:230105860. https://doi.org/10.48550/arXiv.2301.05860
Ye Z, Kumar YJ, Sing GO et al (2022) A comprehensive survey of graph neural networks for knowledge graphs. IEEE Access 10:75729–75741. https://doi.org/10.1109/ACCESS.2022.3191784
Ying Z, You J, Morris C et al. (2018) Hierarchical graph representation learning with differentiable pooling. In: Advances in neural information processing systems
Ying Z, Bourgeois D, You J et al. (2019) GNNExplainer: generating explanations for graph neural networks. In: Wallach H, Larochelle H, Beygelzimer A et al. (eds) Advances in neural information processing systems. Curran Associates, Inc.
Ying C, Zhao X, Yu T (2024) Boosting graph pooling with persistent homology. arXiv preprint arXiv:240216346
Yu H, Yuan J, Yao Y, Wang C (2022) Not all edges are peers: accurate structure-aware graph pooling networks. Neural Netw 156:58–66. https://doi.org/10.1016/j.neunet.2022.09.004
Yu H, Yuan J, Cheng H et al. (2021) GSAPool: gated structure aware pooling for graph representation learning. In: 2021 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Yuan H, Ji S (2020) StructPool: structured graph pooling via conditional random fields. In: International conference on learning representations
Yuan YJ, Lai YK, Yang J et al. (2020) Mesh variational autoencoders with edge contraction pooling. In: IEEE computer society conference on computer vision and pattern recognition workshops. IEEE Computer Society, pp 1105–1112
Zhang Z, Zhao Y, Liao X et al (2019b) Deep learning in omics: a survey and guideline. Brief Funct Genomics 18:41–57. https://doi.org/10.1093/bfgp/ely030
Zhang XM, Liang L, Liu L, Tang MJ (2021a) Graph neural networks and their current applications in bioinformatics. Front Genet 12:690049. https://doi.org/10.3389/fgene.2021.690049
Zhang Z, Cui P, Zhu W (2022) Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng 34:249–270. https://doi.org/10.1109/TKDE.2020.2981333
Zhang P, Xia C, Shen H-B (2023a) High-accuracy protein model quality assessment using attention graph neural networks. Brief Bioinform. https://doi.org/10.1093/bib/bbac614
Zhang S, Wang J, Yu S et al (2023b) An explainable deep learning framework for characterizing and interpreting human brain states. Med Image Anal 83:102665. https://doi.org/10.1016/j.media.2022.102665
Zhang M, Cui Z, Neumann M, Chen Y (2018) An end-to-end deep learning architecture for graph classification. In: Proceedings of the AAAI conference on artificial intelligence
Zhang Z, Bu J, Ester M et al. (2019a) Hierarchical graph pooling with structure learning. arXiv preprint arXiv:191105954
Zhang L, Wang X, Li H et al. (2020) Structure-feature based graph self-adaptive pooling. In: Proceedings of the web conference 2020. ACM, New York, NY, USA, pp 3098–3104
Zhang Z, Bu J, Ester M et al. (2021b) Hierarchical multi-view graph pooling with structure learning. IEEE Trans Knowl Data Eng 545–559. https://doi.org/10.1109/TKDE.2021.3090664
Zhang Z, Bu J, Ester M et al. (2021c) H2MN: graph similarity learning with hierarchical hypergraph matching networks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, pp 2274–2284
Zhang Z, Li H, Zhang Z et al. (2023c) Graph meets LLMs: towards large graph models
Zhao F, Li N, Pan H et al (2022) Multi-view feature enhancement based on self-attention mechanism graph convolutional network for autism spectrum disorder diagnosis. Front Hum Neurosci 16:918969. https://doi.org/10.3389/fnhum.2022.918969
Zhao Y, Yang F, Fang Y et al. (2020) Predicting lymph node metastasis using histopathological images based on multiple instance learning with deep graph convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 4837–4846
Zhao Q, Ren W, Li T et al. (2023) GraphGPT: graph learning with generative pre-trained transformers
Zheng X, Zhou B, Li M et al (2023) MathNet: Haar-like wavelet multiresolution analysis for graph representation learning. Knowl Based Syst 273:110609. https://doi.org/10.1016/j.knosys.2023.110609
Zheng Y, Jiang B, Shi J et al. (2019) Encoding histopathological WSIs using GNN for scalable diagnostically relevant regions retrieval. In: Shen Dinggang and Liu T and PTM and SLH and EC and ZS and YP-T and KA (ed) Medical image computing and computer assisted intervention—MICCAI 2019. Springer, Cham, pp 550–558
Zhong Z, Li C-T, Pang J (2022) Multi-grained semantics-aware graph neural networks. IEEE Trans Knowl Data Eng 7251–7262. https://doi.org/10.1109/TKDE.2022.3195004
Zhou J, Cui G, Hu S et al (2020a) Graph neural networks: a review of methods and applications. AI Open 1:57–81
Zhou Y, Zheng H, Huang X et al (2022) Graph neural networks: taxonomy, advances, and trends. ACM Trans Intell Syst Technol 13:1–54. https://doi.org/10.1145/3495161
Zhou Y, Graham S, Koohbanani NA et al. (2019) CGC-Net: cell graph convolutional network for grading of colorectal cancer histology images. In: Proceedings of the IEEE/CVF international conference on computer vision workshops
Zhou K, Song Q, Huang X et al. (2020b) Multi-channel graph neural networks. In: Proceedings of the 29th international joint conference on artificial intelligence, IJCAI 2020. pp 1352–1358
Zhou X, Yin J, Tsang IW (2023) Edge but not least: cross-view graph pooling. In: Machine learning and knowledge discovery in databases. pp 344–359
Zhu J, Wang J, Han W, Xu D (2022) Neural relational inference to learn long-range allosteric interactions in proteins from molecular dynamics simulations. Nat Commun 13:1661. https://doi.org/10.1038/s41467-022-29331-3
Zou X, Li K, Chen C (2022) Multilevel attention based U-shape graph neural network for point clouds learning. IEEE Trans Industr Inform 18:448–456. https://doi.org/10.1109/TII.2020.3046627
Funding
This study was funded by National Natural Science Foundation of China, 62072212, Development Project of Jilin Province of China, 20220508125RC, 20230201065GX, Jilin Provincial Key Laboratory of Big Data Intelligent Cognition, 20210504003GH.
Author information
Authors and Affiliations
Contributions
Y.W. and W.H. wrote the main manuscript text and W.H. prepared figures and tables. N.S. and Z.Z. contributed to the concept and design of the article and provided the relevant information and ideas. J.L. contributed to the literature collection and organization. L.H., Y.W. and J.W. contributed to the supervision and resources and reviewed the manuscript. W.H. and J.W. revised and edited the manuscript for submission.
Corresponding authors
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, Y., Hou, W., Sheng, N. et al. Graph pooling in graph neural networks: methods and their applications in omics studies. Artif Intell Rev 57, 294 (2024). https://doi.org/10.1007/s10462-024-10918-9
Accepted:
Published:
DOI: https://doi.org/10.1007/s10462-024-10918-9