Yang 20 A
Yang 20 A
Abstract
Graph neural networks (GNNs) have attracted an increasing attention in recent years.
However, most existing state-of-the-art graph learning methods only focus on node features
and largely ignore the edge features that contain rich information about graphs in modern
applications. In this paper, we propose a novel model to incorporate Node and Edge
features in graph Neural Networks (NENN) based on a hierarchical dual-level attention
mechanism. NENN consists of node-level attention layer and edge-level attention layer. The
two types of layers of NENN are alternately stacked to learn and aggregate embeddings for
nodes and edges. Specifically, the node-level attention layer aims to learn the importance
of the node based neighbors and edge based neighbors for each node, while the edge-level
attention layer is able to learn the importance of the node based neighbors and edge based
neighbors for each edge. Leveraging the proposed NENN, the node and edge embeddings
can be mutually reinforced. Extensive experiments on academic citation and molecular
networks have verified the effectiveness of our proposed graph embedding model.
Keywords: Graph Neural Network, Attention Mechanism, Graph Convolutional Network
1. Introduction
Convolutional Neural Networks (CNNs) have become very useful and successful techniques
to process various data with regular grid-like structure Krizhevsky et al. (2012); Simonyan
and Zisserman (2014); Redmon et al. (2016); He et al. (2018); Jégou et al. (2017). However,
the non-Euclidean graphs containing all kinds of nodes and edges are ubiquitous in the real
world, such as social networks, bioprotein networks and citation networks, which can not
be easily represented due to the complex and irregular structure.
In recent years, there is a growing interest in graph presentation learning methods. Rep-
resentation learning is to map high-dimensional features from a graph to a low-dimensional
vectors, so that many downstream problems can be easily solved, including node classifi-
cation Ribeiro et al. (2017); Jacob et al. (2014); Donnat et al. (2018), link prediction Berg
et al. (2017); Zhang and Chen (2018); Hasanzadeh et al. (2019) and graph classification Ying
et al. (2018); Murphy et al. (2019); Lee et al. (2019).
Inspired by spectral graph theory, Bruna et al. (2013) generalizes CNNs to the graph
domain based on the feature decomposition of graph Laplace matrix. In order to reduce the
overhead of the decomposition, ChebNet Defferrard et al. (2016) is proposed to approximate
convolution kernel with Chebyshev polynomials. As a pilot work, Kipf and Welling (2017)
proposes a graph convolutional network (GCN) as the first order approximation of ChebNet,
which greatly simplifies the convolution filters by limiting the receptive field to the 1-
hop neighbors for each node. Finally, the GCN model is successfully applied to semi-
supervised node classification and achieves state-of-the-art performance. The basic idea
behind GCN is to map a high-dimensional node representation to a low-dimensional vector
by transforming, propagating, aggregating and updating node features across edges in a
graph. Nevertheless, GCN model is essentially a spectral approach working on transductive
learning tasks. As a result, GCN can not run on large and dynamic graphs effectively. To
address the limitations of GCN, GraphSAGE Hamilton et al. (2017) extends GCN from a
transductive approach to an inductive one using a spatial-based method to train embeddings
for previously unseen nodes. GraphSAGE restricts neighborhood sampling to learn how to
aggregate node features rather than train fixed node embeddings. In addition, GraphSAGE
also proposes a mini batch training algorithm, which solves the problem that GCN cannot
be applied to large graphs. Graph Attention Network (GAT) Veličković et al. (2018), a
newfangled attention-based graph neural network, trains weight coefficients associated with
neighbors for each node to learn node embeddings. It has demonstrated the effectiveness in
graph embedding and shown the superiority over the previous methods.
Despite the success of existing graph neural networks, there are two enormous chal-
lenges. On the one hand, almost all previous literatures only leverage the node features
and completely ignore the edge features that are completely likely to contain important
information. For example, in molecule networks, a node represents an atom while an edge
represents a bond connecting two atoms. A bond usually has some simple edge features
(e.g., bond type, atom pair type, bond order, conjugated, ring status, aromaticity), which
are closely related to atom features. On the other hand, how to measure the importance of
neighborhood as well as the connecting edges or nodes is not fully considered.
In order to address the aforementioned challenges, we propose a novel graph neural
network, named NENN, which incorporates node and edge features based on a dual-level
attention mechanism, including node-level and edge-level attentions. Specifically, we aim to
to learn the importance of node based neighbors and edge based neighbors and aggregate
embeddings for each node in the node-level attention layer. Similarly, the embedding of
each edge is generated in the edge-level attention layer.
We conduct extensive experiments on node classification, graph classification and graph
regression to verify the effectiveness of the proposed NENN. For node classification, we use
the benchmark citation network datasets: Cora, Citeseer Sen et al. (2008) and Pubmed Galileo
Mark Namata and Huang (2012). For graph classification and graph regression, we demon-
strate the proposed NENN is able to effectively generate node and edge embeddings by
incorporating node and edge features on multiple molecular datasets: Tox21 Wu et al.
(2018), HIV Wu et al. (2018), Freesolv Mobley and Guthrie (2014), and Lipophilicity Wu
et al. (2018). The results show that the proposed NENN outperforms relevant baselines by
a significant margin.
In a nutshell, our main contributions are summarized as follows:
• We propose a novel graph neural network (NENN) that incorporates both node and
edge features, which can learn node embeddings as well as edge embeddigns simulta-
neously.
594
NENN
• To the best of our knowledge, this is the first attempt to take the influences of neigh-
bors for nodes and edges into consideration based on a hierarchical dual-level attention
mechanism , including node-level and edge-level attentions.
• We transform the roles of nodes and edges and extend them to the neighbor set of
each other, which strengthens the connection between edges and nodes.
• Various graph-related tasks, including graph classification, graph regression, and node
classification, are used to verify the scalability and generality for NENN.
2. Related Work
The real-world data usually appears in the form of non-Euclidea graphs. On of the most
severe challenges in graph representations is to efficiently exploit the node and topology
information. At present, graph representation learning methods can be roughly divided
into three parts: matrix factorization, random walks and graph neural networks.
Factorization-based approaches. The key idea of matrix factorization is that the
relation matrix (e.g. adjacency matrix and Laplace matrix) is decomposed to yield the low-
dimensional representations. For example, Grarep Cao et al. (2015) reduces the dimension
of the relation matrix by SVD decomposition to get the k-step network vertex represen-
tation for weighted graphs. HOPE Ou et al. (2016) preserves high-order proximities and
captures the asymmetric transitivity for directed graphs. However, these methods can not
efficiently process large-scale graphs since they have huge performance overheads for matrix
factorization.
Random walk. Just as its name implies, a random walk on a graph starts with a
node and recursively connects with a randomly selected neighbor until a threshold. Deep-
Walk Perozzi et al. (2014) is the first attempt to learn latent representations leveraging
truncated random walks. node2vec Grover and Leskovec (2016) further designs a biased
random walk to efficiently explore diverse neighborhoods based on DFS and BFS strate-
gies. By recursively compressing the input graph to smaller but structurally similar graphs,
HARP Chen et al. (2018) captures the global topological information about the input
graph and generates presentations on this smaller graph during a random walk. Never-
theless, random walk methods are not the most successful methods so far. The shallow
embedding methods use unshareable parameters and functions, which makes it impossible
to embed nodes of a large-scale graphs to a low-dimensional space. Besides, they are only
applied to learn embeddings for fixed graphs in the transductive setting, consequently, do
not naturally generalize to unseen nodes.
Graph neural networks. In order to address the limitations of the previous meth-
ods, in recent years, graph neural networks are proposed to learn node embeddings. Graph
neural network has many branches so far, but we mainly focus on the methods based on
595
Yang Li
graph convolution. Inspired by the successes of CNNs in image recognition, GCN Kipf
and Welling (2017) stacks graph convolutional layers to aggregate local information from
neighbors and encodes nodes into vectors. GraphSAGE Hamilton et al. (2017) is a novel
inductive approach that generates latent embeddings for unseen nodes. Attention mech-
anisms have been widely applied to many tasks in deep learning. GAT Veličković et al.
(2018) is introduced to learn the importance coefficients for nodes and its neighbors rather
than treats the neighborhood information equally. However, the above graph neural net-
works not only ignore the essential edge features but also fail to differentiate the influences
of their connecting edges.
Edge-related Work. To address the above issues, some models are proposed. Message
passing neural network (MPNN) Justin Gilmer (2017), a generalized model, consisting of
two phases: multiple message passing phases and readout phase, is proposed to predict
molecular properties. Although MPNN adds edge information to the message passing phase,
its passing mechanism can not recognize the correlation between nodes and edges. However,
the node based neighbors and the edge based neighbors defined in two types of convolution
layers of NENN can integrate the adjacency relationship between edges and nodes, which
captures the local structure information about the graph. RGCN Schlichtkrull et al. (2018)
uses a simple forward-pass rule:
∑ ∑ 1
= σ Wr(l) hj + W0 hi
(l+1) (l) (l) (l)
hi (1)
c
r i,r
r∈R j∈Ni
where Wr represents the weight matrix of relation r ∈ R. This simple aggregation is too
hard to calculate and performs poorly. EGNN Gong and Cheng (2019) introduces attention
mechanism to explore edge features. In EGNN, except for the original edge features used
in the first layer, the attention coefficient between the two nodes of in l-th layer is used as
the edge features in (l + 1)-th layer, which greatly leads to the loss of edge information.
Consequently, EGNN inherently reinforces the embedding of nodes with edge features, while
edge embeddings are not enhanced by node embeddings. As a result, it does not suitable
to edge embedding learning and link prediction. Compared with EGNN, NENN adopts
a hierarchical dual-level attention mechanism where the the role of edge and node are
alternated, to keep the edge features as a vector rather than a one-dimensional attention
coefficient. In addition, NENN considers the relationship between edges and nodes and the
graph structure fairly and comprehensively, which enables NENN to learn node, edge and
even graph representations. CensNet Jiang et al. (2019) embeds both nodes and edges to
a latent feature space by using line graph of the original undirected graph. However, the
CensNet uses approximated spectral graph convolution in the layer-wise propagation, which
makes the CensNet can not process large graphs and directed graphs. On the contrary, the
basic idea behind of the proposed NENN is based on spatial domain, which enables various
graphs to be processed. Although both CensNet and NENN adopt alternate convolution,
NENN and CensNet use two completely different ideas. CensNet uses the original graph and
its line graph Harary and Norman (1960) to perform alternate convolution, while NENN
extends the neighboring nodes to the neighbors of an edge and the neighboring edges to
the neighbors of a node. In addition, NENN also adopts the attention mechanism to learn
596
NENN
Input Graph Node-level Attention Layer Edge-level Attention Layer Output Graph
more
layers
Figure 1: The overall framework of the proposed NENN for node embedding generation.
The dual-level attention layers (i.e., node-level attention layer and edge-level at-
tention layer) of NENN are alternately stacked to learn node and edge embed-
dings.
an importance coefficient from neighbors for each node or edge , which enables NENN to
learn more efficient embeddings.
Definition 1 (Node Based Neighbors) Given a graph G(V, E), the node based neigh-
bors Ni of node i are defined as the set of nodes which connect with node i. In the same
way, the node set Nj connected by edge j represents the node based neighbors of edge j.
Specially, the node based neighbors of node i include itself.
Example 1 As shown in Figure 1, nodes are represented by circles while edges are rep-
resented by squares. In node-level attention layer, the node based neighbors N1 of the red
node whose feature vector is x1 denote the nodes {1, 2, 3, 5, 6, 7} which consists of one-hop
neighboring nodes and node 1.
Definition 2 (Edge Based Neighbors) Given a graph G(V, E), the edge based neighbors
Ei of node i are defined as the set of edges connecting with node i. Similarly, the edge based
neighbors Ej of edge j are defined as the set of edges connecting with edge j. Specially, the
edge based neighbors of edge j include itself.
597
Yang Li
Example 2 As shown in Figure 1, in edge-level attention layer, the edge based neighbors
of the brown edge (i.e., e6 ) denote the neighboring edges {1, 2, 5, 6, 7}. In addition, the node
based neighbors of the brown edge (i.e., e6 ) denote the one-hop neighboring node set {1, 6}.
Figure 1 shows the overall process of the proposed NENN for node embeddings gener-
ation. The proposed NENN consists of two types of attention layers, node-level attention
layer and edge-level attention layer. The dual-level attention layers are alternately stacked
to learn node and edge embeddings. In the node-level attention layer, we aim to learn
the node based neighbors importance αij n and edge based neighbors importance αe for
ij
each node. In the edge-level attention layer, we aim to learn the node based neighbors
importance βijn and edge based neighbors importance β e for each edge. With the learned
ij
importance coefficients, we can aggregate and update node and edge embeddings in order.
that linearly transform the input features into high-level features. The importance coeffi-
cient enij means how important node j is to node i, while eeij represents the influence of edge
j to node i.
Then the structure information is integrated into the proposed NENN via masked atten-
tion, which means the embedding of node i depends only on neighboring nodes j or edges k.
Next, the importance coefficient of node j to node i is normalized via the softmax function:
( ( [ ]))
exp σ aTn Wn xi ||Wn xj
n
αij = sof tmaxj (enij ) = ∑ ( ( [ ])) (4)
exp σ aT W x ||W x
k∈Ni n n i n k
l+1
where || is the concatenation operation, an ∈ R2dv is the parameter vector of a single-
layer feed-forward network and σ denotes the activation function (e.g. LeakyReLU).
e of node i and edge k can be derived as :
Then, the importance coefficient αik
( ( [ ]))
exp σ aTe Wn xi ||We ek
e
αik = sof tmaxk (eeik ) = ∑ ( ( [ ])) (5)
exp σ aT W x ||W e
j∈Ei e n i e j
598
NENN
l+1 l+1
where ae ∈ Rdv +de is the parameter vector of a single-layer feed-forward network.
n and αe , the embedding x
After learning the importance αij ik Ni of node i’s node based
neighbors can be aggregated with the corresponding importance coefficients:
( )
xNi = σ Wn · MEAN({αij n
xj , ∀j ∈ Ni }) (6)
where MEAN is a mean aggregator. Then, the embedding xEi of node i’s edge based
neighbors can be aggregated as follows:
( )
xEi = σ We · MEAN({αik
e
ek , ∀k ∈ Ei }) (7)
Finally, with the edge based neighbors’ embedding xEi and node based neighbors’ em-
bedding xNi , the embedding of node i in the (l + 1)-th layer can be combined:
l+1
where qe ∈ R2de is an attention vector. The importance coefficient βij
n of node j
l+1 l+1
where qn ∈ Rdv +de is an attention vector.
Then, edge i’s middle node based neighbors embedding eEi and edge based neighbors
eNi can be generated by a mean aggregator:
( )
eEi = σ We · MEAN({βik e
ek , ∀k ∈ Ei }) (11)
( )
eNi = σ Wn · MEAN({βij
n
xj , ∀j ∈ Ni }) (12)
599
Yang Li
xi ← xi , ∀i ∈ V ′(0) for l = 0 · · · L do
(0)
1
2 find the node based neighbors Ni and edge based neighbors Ei if layer l is a node-level
attention layer or l = L then
3 for each node i ∈ V ′(l) do
4 n (l) and αe (l) Calculate the embedding
Calculate the importance coefficient αij ik
(l) (l)
of node based neighbors xNi and edge based neighbors xEi
(l+1) (l) (l)
xi ← CONCAT(xNi , xEi )
5 end
6 end
7 if layer l is an edge-level attention layer then
8 for each edge i ∈ E ′(l) do
9 Calculate the importance coefficient βijn (l) and β e (l) Calculate the embedding
ik
(l) (l)
of node based neighbors eNi and edge based neighbors eEi
(l+1) (l) (l)
ei ← CONCAT(eNi , eEi )
10 end
11 end
12 end
(L) (L−1) (L−1)
13 xi ← CONCAT(xNi , xEi )
600
NENN
Table 2: Dataset statistics of molecular networks for graph classification and regression.
4. Experiments
We evaluate the proposed NENN on three benchmark tasks: (i) semi-supervised node classi-
fication on citation networks Cora, Citeseer and Pubmed; (ii) multi-task graph classification
on multiple molecular datasets Tox21 and HIV; (iii) graph regression on molecular datasets
Lipophilicity and Freesolv.
601
Yang Li
cluding AR, AhR, AR-LBD, etc. The HIV Wu et al. (2018) dataset originates from the
Drug Therapeutics Program AIDS Antiviral Screen, which measures the ability of HIV
replication for 41127 compounds. The two datasets are usded to activity prediction (i.e.
binary graph classification) that labels compounds as either “active” or “inactive”.
Lipophilicity and Freesolv. Lipophilicity Wu et al. (2018) is a public dataset used to
measure the affects both membrane permeability and solubility, which provides experimen-
tal results of octanol/water distribution coefficient (logD at pH 7.4) of 4,200 compounds.
Also, the Free Solvation Database (Freesolv) Mobley and Guthrie (2014) is a common
dataset providing experimental and calculated results of hydration free energies for 642
small molecules in water. According to the characteristics of the two datasets, we conduct
extensive graph regression experiments to predict solvation energies or solubility.
4.2. Baselines
We compare with some state-of-the-art baselines to verify the effectiveness of our node-
level attention and edge-level attention of the proposed NENN. In addition, we also used
four variants of NENN to verify the effect of convolution or attention mechanism on the
edge-level layer or node-level layer.
• GCN Kipf and Welling (2017): A transductive graph convolutional network which
aggregates local information from neighbors and encodes nodes into low-dimensional
vectors.
• GraphSAGE Hamilton et al. (2017): A novel inductive approach that generates
latent embeddings for unseen nodes. Instead of learning a fixed representation for
each node, GraphSAGE learns a function that aggregate features from its neighbors
by sampling its neighbors.
• GAT Veličković et al. (2018): A graph convolution network based on attention mech-
anism. GAT learns an importance coefficient for each neighbor of a node instead of
ignoring the differences of neighbors like GCN.
• MPNN Justin Gilmer (2017): A message passing neural network based on message
passing and readout. MPNN is proposed to predict molecular properties.
• CensNet Jiang et al. (2019): A spectral domain based graph convolutional network.
CensNet learns the node and edge feature embeddings simultaneously based on orig-
inal graph and its line graph.
602
NENN
• EGNN Gong and Cheng (2019): A graph neural network based on attention mech-
anism. EGNN explores edge features, but in the later layer, the edge feature vectors
are converted to attention coefficients, which leads to the loss of edge information.
603
Yang Li
1.6 Weave
GCN
0.8 CensNet
1.4 EGNN
NENN-NCEC
0.7 NENN-NCEA
NENN-NAEC
1.2 NENN-NAEA
0.6
Weave
0.5 GCN 1.0
CensNet
EGNN
0.4 NENN-NCEC 0.8
NENN-NCEA
NENN-NAEC
0.3 NENN-NAEA
0.6
0 25 50 75 100 125 150 175 200 60 80 100 120 140 160 180 200
Epoch Epoch
(a) (b)
Figure 3: AUC in validation set for HIV networks (left) and RMSE in validation set for
Lipophilicity networks (right).
training, validation and test dataset are split into with a ratio of 8:1:1. We run our models
5 times and report mean performance for each experiment.
604
NENN
more in graph classification than in node classification. This also proves the validity of
NENN that integrates higher-dimensional edge features into representation learning pro-
duce.
5. Conclusion
In this paper, we introduce an efficient embedding architecture, named NENN, which incor-
porates node and features to enhance the node and edge embeddings across neural network
layer. The proposed NENN alternately stacks node-level attention layer and edge-level at-
tention layer to learn the importance of node based neighbors and edge based neighbors.
Leveraging the proposed NENN, the node and edge embeddings can be mutually reinforced.
Extensive experiments on semi-supervised node classification, graph classification and graph
regression demonstrate the effectiveness of NENN.
605
Yang Li
References
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig
Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensor-
flow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint
arXiv:1603.04467, 2016.
Jehad Ali, Rehanullah Khan, Nasir Ahmad, and Imran Maqsood. Random forests and
decision trees. International Journal of Computer Science Issues (IJCSI), 9(5):272, 2012.
Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph convolutional matrix
completion. arXiv preprint arXiv:1706.02263, 2017.
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and
locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013.
Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with
global structural information. In Proceedings of the 24th ACM international on conference
on information and knowledge management, pages 891–900. ACM, 2015.
Tianfeng Chai and Roland R Draxler. Root mean square error (rmse) or mean absolute
error (mae)?–arguments against avoiding rmse in the literature. Geoscientific model
development, 7(3):1247–1250, 2014.
Haochen Chen, Bryan Perozzi, Yifan Hu, and Steven Skiena. Harp: Hierarchical representa-
tion learning for networks. In Thirty-Second AAAI Conference on Artificial Intelligence,
2018.
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural net-
works on graphs with fast localized spectral filtering. In Advances in neural information
processing systems, pages 3844–3852, 2016.
Claire Donnat, Marinka Zitnik, David Hallac, and Jure Leskovec. Learning structural node
embeddings via diffusion wavelets. In Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, pages 1320–1329, 2018.
Lise Getoor Galileo Mark Namata, Ben London and Bert Huang. Query-driven active
survey-ing for collective classification. 2012.
Liyu Gong and Qiang Cheng. Exploiting edge features for graph neural networks. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages
9211–9219, 2019.
Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In
Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery
and data mining, pages 855–864. ACM, 2016.
William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on
large graphs. In NIPS, 2017.
606
NENN
James A Hanley and Barbara J McNeil. The meaning and use of the area under a receiver
operating characteristic (roc) curve. Radiology, 1982.
Frank Harary and Robert Z Norman. Some properties of line digraphs. Rendiconti del
Circolo Matematico di Palermo, 9(2):161–168, 1960.
Anfeng He, Chong Luo, Xinmei Tian, and Wenjun Zeng. A twofold siamese network for
real-time object tracking. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 4834–4843, 2018.
Yann Jacob, Ludovic Denoyer, and Patrick Gallinari. Learning latent representations of
nodes for classifying in heterogeneous social networks. In Proceedings of the 7th ACM
international conference on Web search and data mining, pages 373–382, 2014.
Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, and Yoshua Bengio. The
one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops, pages 11–19, 2017.
Xiaodong Jiang, Pengsheng Ji, and Sheng Li. Censnet: convolution with edge-node switch-
ing in graph neural networks. In Proceedings of the 28th International Joint Conference
on Artificial Intelligence, pages 2656–2662. AAAI Press, 2019.
Patrick F. Riley Oriol Vinyals George E. Dahl Justin Gilmer, Samuel S. Schoenholz. Neural
message passing for quantum chemistry. In International Conference on Machine Learning
(ICML), 2017.
Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley. Molecular
graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular
design, 30(8):595–608, 2016.
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980, 2014.
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional
networks. In International Conference on Learning Representations (ICLR), 2017.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing systems,
pages 1097–1105, 2012.
Junhyun Lee, Inyeop Lee, and Jaewoo Kang. Self-attention graph pooling. arXiv preprint
arXiv:1904.08082, 2019.
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of
machine learning research, 9(Nov):2579–2605, 2008.
607
Yang Li
David L Mobley and J Peter Guthrie. Freesolv: a database of experimental and calculated
hydration free energies, with input files. Journal of computer-aided molecular design, 28
(7):711–720, 2014.
Ryan L Murphy, Balasubramaniam Srinivasan, Vinayak Rao, and Bruno Ribeiro. Relational
pooling for graph representations. arXiv preprint arXiv:1903.02541, 2019.
Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity
preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 1105–1114. ACM, 2016.
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social
representations. In Proceedings of the 20th ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 701–710. ACM, 2014.
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once:
Unified, real-time object detection. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 779–788, 2016.
Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne Vanden Berg, and Max
Welling. Modeling relational data with graph convolutional networks. In European Se-
mantic Web Conference, 2018.
Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina
Eliassi-Rad. Collective classification in network data. AI magazine, 29(3):93–93, 2008.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556, 2014.
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and
Yoshua Bengio. Graph Attention Networks. International Conference on Learning Rep-
resentations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ. accepted as
poster.
Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse,
Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molec-
ular machine learning. Chemical science, 9(2):513–530, 2018.
Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure
Leskovec. Hierarchical graph representation learning with differentiable pooling. In Ad-
vances in neural information processing systems, pages 4800–4810, 2018.
Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. In Advances
in Neural Information Processing Systems, pages 5165–5175, 2018.
608