Understanding Auditory Evoked Brain Signal Via Physics-Informed Embedding Network With Multi-Task Transformer

Understanding Auditory Evoked Brain Signal
via Physics-informed Embedding Network with

Multi-Task Transformer
Wanli Ma1 , Xuegang Tang2 , Jin Gu3 , Ying Wang4 , and Yuling Xia1
arXiv:2406.02014v1 [q-bio.NC] 4 Jun 2024
1
School of Mathematics, Southwest Jiaotong University, Sichuan, Chengdu, China
[email protected] [email protected]
2
School of Computer, SWJTU-Leeds Joint School, Sichuan, Chengdu, China
[email protected]
3
School of Computing and Artificial Intelligence, Southwest Jiaotong University,
Sichuan, Chengdu, China
[email protected]
4
Chengdu University of Technology Oxford Brookes College, Sichuan, Chengdu,
China
[email protected]
Abstract. In the fields of brain-computer interaction and cognitive neu-

roscience, effective decoding of auditory signals from task-based func-
tional magnetic resonance imaging (fMRI) is key to understanding how
the brain processes complex auditory information. Although existing
methods have enhanced decoding capabilities, limitations remain in in-
formation utilization and model representation. To overcome these chal-
lenges, we propose an innovative multi-task learning model, Physics-
informed Embedding Network with Multi-Task Transformer (PEMT-
Net), which enhances decoding performance through physics-informed
embedding and deep learning techniques. PEMT-Net consists of two
principal components: feature augmentation and classification. For fea-
ture augmentation, we propose a novel approach by creating neural em-
bedding graphs via node embedding, utilizing random walk to simulate
the physical diffusion of neural information. This method captures both
local and non-local information overflow and propose a position encoding
based on relative physical coordinates. In the classification segment, we
propose adaptive embedding fusion to maximally capture linear and non-
linear characteristics. Furthermore, we propose an innovative parameter
sharing mechanism to optimize the retention and learning of extracted
features. Experiments on a specific dataset demonstrate PEMT-Net’s sig-
nificant performance in multi-task auditory signal decoding, surpassing
existing methods and offering new insights into the brain’s mechanisms
for processing complex auditory information.
Keywords: Physics-informed · Embedding representation · Auditory

information · Multi-task · Transformer.
2 Wanli Ma et al.
1 Introduction
Brain-computer interaction (BCI), which utilizes subtle changes in brain activity
and transforms these neural signals into executable computational instructions,
shows great potential for deepening our understanding of brain mechanisms.
Although this field is still in its infancy, it has already shown promising appli-
cations in clinical medicine, biomedicine, and other fields. Current research has
focused on visual and spatial cognition[9] or motor imagery, highlighting the
important need for auditory-focused research which has not been done much.
Task-state functional magnetic resonance imaging (ts-fMRI), which is charac-
terized by high spatial resolution and non-invasiveness, is pivotal in cognitive
neuroscience. Recently, functional connectivity (FC) analysis has become one of
the most commonly used methods for describing brain functioning[11], and pre-
vious studies have shown that functional connectivity is a potential brain marker
for predicting cognitive and behavioral traits[12,13,14], However, the complex-
ity of brain networks poses a great challenge to traditional machine learning
models such as support vector machines and logistic regression; moreover, the
reliance of these algorithms on manual feature extraction limits their ability to
learn intrinsic data features, which in turn affects the model’s generalization
ability[17].
Data augmentation is a key technique to overcome data scarcity and im-
prove model robustness and training efficiency[29] to improve model training
effectiveness[28]. Fahimi et al. proposed a deep convolutional generative adver-
sarial network-based framework for data augmentation, which achieved better
results[30]. However, traditional data enhancement techniques may introduce
artificial variations that do not necessarily match the natural pattern of neu-
ral activity[31]. In addition, they may not be able to effectively capture the
complex linear and nonlinear intricacies of the complex relationships between
different brain regions[32]. Deep learning methods have become cutting-edge ap-
proaches for analyzing fMRI datasets, successfully extracting meaningful infor-
mation from complex connectivity patterns in neuroimaging studies of cognitive
abilities [19,20]. Gao et al. proposed a group quadratic graph convolutional net-
work that improves the ability of individual neurons to represent complex data
[33]. However, methods like this do not maximize the capture and utilization
of linear and nonlinear features. In addition, the complexity of brain signals
and the intricate topology of brain networks pose a great challenge to model
generalization[35]. Models trained in isolated tasks often have difficulty adapting
to new unknown conditions due to the inability to cover all neural connections
and interaction patterns.
To address the aforementioned challenges, we propose a novel model that
innovatively leverages physical information to enhance embedding representa-
tions, fostering deep information sharing across tasks and thereby improving the
model’s generalizability. Our main contributions are as follows:
(1)Proposing PEMT-Net, a groundbreaking multi-task neural network model
that leverages physical information embedding and simulates the neural diffu-
sion process with random walks, adeptly capturing extensive neural interactions
Title Suppressed Due to Excessive Length 3
and introducing a novel method for constructing neural embedding maps from
graph features.(2)Our innovative approach includes a unique physical location
encoding strategy that delineates the spatial relationships of brain regions based
on their correlation, enabling precise physical positioning within the neural net-
work’s structure.(3)By integrating adaptive embedding fusion technology and
fine-grained multi-task parameter sharing, PEMT-Net achieves a deep under-
standing of both linear and non-linear features, substantially improving model
generalizability across diverse cognitive tasks.
2 Methods
2.1 Overview Methods
In this study, we propose a physics-informed embedding network with multi-task

transformer (PEMT-Net)-an innovative model for accurately decoding auditory
signals, as shown in Fig. 1, which consists of two main parts:(1) feature aug-
mentation, and (1) classification. In Part 1, neural diffusion process is utilized to
distill the neural embedding representation of the region of interest (ROI) from
the physical dimension, followed by the fusion of degree centrality and location
coding computed by Fruchterman-Reingold algorithm to form rich spatial fea-
ture information. In Part 2, adaptive embedding fusion is used to enhance the
feature representation, and a soft parameter-sharing mechanism is introduced to
support efficient multi-task learning by front-loading the parameter transforma-
tion layer in the Transformer encoder. The flowchart of our model is shown in
Fig. 1 below.
2.2 Problem definition
In order to express our actual work in mathematical expression, let G = {G1 , G2 ,
. . . , GT }, where Gj = V j , E j , W j . A subgraph Gj represents a subject’s brain
region connectivity at a fine granularity. Every Region of interests (ROIs) is
j j
considered
j j
as ja node u ∈ V , and everyj edge among them is represented as
e u , v ⊂ E , with a weight wuj vj ∈ W .
2.3 PEMT-Net Model
Neural Diffusion Process. We apply the core idea of node2vec[1]. As shown in

Fig.1 A, the inputs of our first layer are bags of nodes and their weight edges,
and the model is trained only on each bag respectively to capture the neural
diffusion situation inside the brains. For every node uj ∈ V j , let Nsj uj ⊂ V j
be a neighborhood set of node uj , which is formed by applying a neighborhood
sampling strategy denoted as S. The goal of this feature learning framework is
to learn a mapping function f j : V j → Rd . This is achieved by maximizing the
4 Wanli Ma et al.
A B 𝑃𝑜𝑠𝑒 𝑢 & = 𝑋' , 𝑌'

Construction of Edges:
F#$% C
w F!""#
&
𝑃𝑜𝑠𝑒 𝑣 = 𝑋( , 𝑌(
𝑃𝐸! = _, 𝑋! , 𝑌!
Information diffusion capture

... Adaptive
Original fMRI Time Series
(Contain Different Tasks) (𝑃𝐸' ) 𝐾 times
Fusion Layer
(𝑃𝐸' ) Task 1
. .
. .
.
. .
.
.. .
(𝑃𝐸' )
. 𝐾 times Adaptive
𝑁" 𝑢 Fusion Layer
... 𝑢 𝑃𝐸! = 𝐷! , _, _ Task 2

.
.
Nodes Embedding 𝐷! = 𝑁𝑢𝑚 𝑁" 𝑢 .
Graph Construction Neural Embeddings Position Encoding Adaptive Feature Fusion
Add & Norm with Residual

Add & Norm with Residual
Multi-head Self-Attention
Accuracy
Output Projection
Param Transform
Prediction Labels
Neural Network
Output
Attention Layer
Feed Forward
Precision
.
.
.
F1-score
Recall
𝑁 layers
Evaluations Multi-task Transformer Module
Fig. 1. Flowchart of our PEMT-Net Model
co-occurrence probability of nodes in a sequence of nodes using the Skip-gram

model, and the objective function is defined as:
X
log P r Nsj uj f j v j

max
j
(1)
f
v j ∈V j
where the conditional probability is defined by softmax function:

j j
exp f j uj · f j v j
Pr u v =P j j j
(2)
n∈V j exp (f (n) · f (v ))
Our algorithm prominently features weighted random walk, leveraging the

graph’s edge weights to produce a node sequence. This mechanism aims to
determine the subsequent node at every step, considering the current node’s
neighbors and the edge weights to maintain a balance between exploration and
exploitation[36]. This is achieved through the adjustment of parameters p and
q, which are instrumental in regulating this balance. Assume that the node is
located at node uj at time t, the probability of moving to node v j next is defined
as

αpq uj , v j · wuj vj
P uj , v j = P

j j
(3)
n∈Nsj (uj ) αpq (u , v ) · wuj n

where αpq uj , v j is a preference weight adjusted to p and q that balances the
propensity to return to previous nodes with the propensity to explore new nodes,
which is defined as
1
p
 if v = prev,
j j
αpq (u , v ) = 1 if v j ∈ Us (v j ), (4)

1 (2)
q if v j ∈
/ Us (v j ), v j ∈ Us (v j ).
Finally, the node embedding obtained by the above method is denoted as

X ∈ RN um(V )×d .
Physical Position Encoding. As we describe in Fig. 1 B, we define two

types of location coding, based on the degree centrality of the nodes and the
coordinates of the nodes computed by the FR algorithm [2], respectively. Let
P E u = [Du , Xu , Yu ], where Du is the degree of node u ∈ V , which is calculated
by
N um(Ns (u))
Du = (5)
N um(V ) − 1
Xu , Yu are the coordinates of the FR algorithm.
In our module, the process of FR algorithm corresponds to weight. We define
repulsive force as normal:
k2
Frep (u, v) = − (6)
||pos(u) − pos(v)||
where pos(u), pos(v) denotes the position of node u, v ∈ V , but define attractive
force with weight:
||pos(u) − pos(v)||2
Fattr (u, v) = · wuv (7)
k
which means the higher the weight, the stronger the attraction, the nodes con-
nected by that edge will be pulled closer together. The algorithm is demonstrated
below:
In the end of this part, the original embedding is directly aggregated with
the physical location encoding, which is denoted as:
X ′ = X||{P E u }u∈V (8)
where notation || is the symbol of concatenation.
Adaptive Embedding Fusion. In this section, we propose a multi-round

adaptive embedding fusion method to enhance feature representation, which is
shown in Fig. 1 C. Consider the graph given above G = {Gj }j∈T , where weight
matrix W ∈ MN , and feature matrix X ′ ∈ MN,d+3 . In the case of k rounds of
the feature propagation process, the feature update of a node can be expressed
as:
X ′(k+1) = W × (X ′(k) ⊙ W ′ ) + X ′(k) (9)
6 Wanli Ma et al.
Algorithm 1 Fruchterman-Reingold Algorithm

Input: weight matrix {Wuv }j r
area
1: Initialization:{pos(v)} for all v ∈ V , k = , maxIterations, n ← 1
|V |
2: repeat
3: n←n+1
4: Repulsion Step: for each pair of nodes (u, v), u ̸= v:
pos(u)n − pos(v)n
∆pos(u)n+1 + = × Frep (u, v)
||pos(u)n − pos(v)n ||
5: Attraction Step: for each edge (u, v) ∈ E :
pos(u)n − pos(v)n
∆pos(u)n+1 − = × Fattr (u, v)
n n
pos(v) − pos(u)
∆pos(v)n+1 + = × Fattr (u, v)
6: Position Update: for each node v ∈ V :
∆pos(v)n+1
pos(v)n+1 + = × min (||∆pos(v)n ||, t)
||∆pos(v)n+1
7: Cooling: t− = ∆t
8: until t = maxIterations
Output: optimized graph layout {pos(v)} for all v ∈ V
where X ′(k+1) is the feature matrix after the k-th round of update, ⊙ denotes
element-by-element multiplication, and W ′ is the weight matrix calculated based
on the node degree for realizing adaptive embedding fusion.
The feature after k-th round propagation enhancement is S ∈ MN,k+1,d .
Multi-Task Transformer. In this section, we propose a transformer model

that can deal with multi-task missions, which is demonstrated in Fig. 1 D. The
core component of the transformer is self-attention mechanism [3]. As our input
matrix is S, then the output matrix of single-head attention is calculated as:
QK T

Attention(Q, K, V ) = softmax √ V (10)
dK
where Q = SH Q , K = SH K , V = SH V , and H Q ∈ Rd×dQ , H K ∈ Rd×dK , H V ∈

Rd×dV are projection matrices. And for multi-head attention, similarly,
M ultiHead(Q, K, V ) = Caoncat(head1 , . . . , headn )H O (11)
where
headi = Attention(QHiQ , KHiK , V HiV ) (12)
The self-attention mechanism in each encoder layer is followed by a feed-forward
neural network that further processes the output of the self-attention layer. This
network usually contains two linear transformations and a nonlinear activation
function of the following form:
FFN(x) = max (0, xW1 + b1 )W2 + b2 (13)

where W1 , W2 are weight matrix, b1 , b2 are terms of bias.

In addition, in order to adapt to multi-task learning and share parameters,
we include a parameter transformation layer in front of each encoder layer, which
allows different parts of the model to share the same base parameters in a trans-
formed form for soft parameter sharing. Let the parameters used in the encoder
be Θ. The parameter transformation layer in front of each encoder layer can be
represented by a linear transformation that corresponds to a weight matrix Wi
and a bias vector bi , where i denotes the index of the encoder layer. Thus, the
parameter transformation can be represented as:
Θi′ = Wi Θ + bi (14)
where Θi′ is the transformed parameter, which will be used in i-th encoder layer.
In this way, although the parameters used in the encoder layer are transformed,
the basis of these parameters is the same.
Furthermore, inside each encoder layer, Q, K and V in the self-attention
mechanism can be calculated using the transformed parameters:
Qi = Si WiQ Θi′ (15)

Ki = Si WiK Θi′ (16)
Vi = Si WiV Θi′ (17)
To process the data processed by the decoding layer, we apply the attention-
based readout layer involved in the model NAGphormer [7], which is represented
in Fig. 1 C. Up to this point, the physics-informed embedding information
extracted from the brain regions has been fully learned by our model.
3 Experiments
3.1 Data and Pre-processing
The proposed model was evaluated on the fMRI dataset we collected from
healthy subjects. In the fMRI experiment, subjects were asked to imagine and
listen to four categories auditory information (Human, Animal, Machine, Na-
ture) , and we obtained eight categories of auditory neural activity which would
be classified to test the performance of our model. For fMRI data pre-processing,
the first five volumes of each run were discarded [34]. All images were realigned
to remove movement artifact, then coregistered and standardized to MNI space
with a voxel size resampled to 3×3×3 mm using the T1 images. The normal-
ized images were smoothed with a 6-mm Gaussian kernel. Then we extract time
series to form functional connection.
3.2 Implementation Details

During the initial phase of exploration through random walks, we configure the
step size of each walk to be 10, covering a total distance of 100. This process
8 Wanli Ma et al.
yields embeddings with a dimensionality of 256. In the subsequent phase, the

model operates with a batch size of 32 and undergoes 7 iterations of adaptive
feature fusion. The Transformer architecture incorporates 8 attention heads, and
its hidden layer is set to a dimensionality of 512. Moreover, the model features
2 layers dedicated to encoding tasks (with parameter transformation layers).
Training is conducted over 200 epochs, with the data split as follows: 70% for
training, 15% for validation, and the remaining 15% for testing purposes. The
entire model training process is executed on a single M1 GPU. We used accuracy
(Pre), F1 score (F1), and Recall score (Recall) to evaluate the model and plotted
the confusion matrix to visualize the model classification more intuitively.
3.3 Results of Fine-Grained Classification
A C
Original embedded features
Updated embedded features
Fig. 2. A represents the distribution of the original features after dimensionality re-
duction by the t-SNE method; B represents the distribution of the high-dimensional
embeddings encoded with physical locations after dimensionality reduction by the t-
SNE method; C represents the error bar graphs of each metric for each method.
According to Fig. 2 A and B, we can observe that the original features

exhibit a mixed distribution, with nodes of different classification results chaot-
ically clustered together. In contrast, nodes constrained by physical space and
those with captured information overflow exhibit a regular distribution, generally
showing clustering of nodes within the same category and almost no clustering
between nodes of different categories. Therefore, our model incorporates node
embedding of brain neural information, simulating the physical process of infor-
mation diffusion, capable of capturing both local and non-local overflow effects,
thereby benefiting the subsequent classification prediction part of the model.
In this experiment, we performed a fine-grained categorization (8 categories)
to validate our model, where there were a total of 4 tasks that were evenly
distributed in each category. To benchmark our method, we introduce several
baselines: GraphSAGE, DeepWalk, Node2vec, and to verify the feasibility of our
idea, we did the following ablation experiments respectively: multi-transformer
(without physics-informed), PET-Net (without sharing parameters) and our pro-
posed model.
Table 1. Comparation of different methods
Methods Accuracy Precision Recall F1 Score

GraphSAGE 67.48 ± 0.53 67.56 ± 0.54 67.52 ± 0.53 67.47 ± 0.53
DeepWalk 71.69 ± 0.43 71.68 ± 0.30 71.70 ± 0.53 71.63 ± 0.37
Node2vec 79.31 ± 0.22 79.10 ± 0.19 79.31 ± 0.55 79.23 ± 0.38
Multi-Transformer 82.52 ± 0.59 85.12 ± 0.52 83.28 ± 0.45 82.74 ± 0.27
PET-Net 86.74 ± 0.64 89.07 ± 0.55 86.64 ± 0.49 86.17 ± 0.29
PEMT-Net 95.41 ± 0.38 95.32 ± 0.42 95.28 ± 0.36 95.26 ± 0.23
Table 1 and Fig. 2 C show all the results of the experiments in this
paper, where the upper half is the comparison experiment with baseline and
the lower half is the ablation experiment. It is easy to see that our proposed
method achieves the best. For baseline, the accuracy improvement is 16.1% -
27.93%. From the ablation experiments, we can see that compared to the Multi-
Transformer model, our accuracy is improved by 13.51%, which indicates that
our physically enhanced embedding representation has excellent results; com-
pared to the PET-Net model, our accuracy is ahead by 9.09%, which indicates
that the deep learning module involved successfully captures deep linear and
nonlinear features and preserves them in multi-task learning, improving the gen-
eralization of the model.
4 Conclusion
In this paper, we present a novel model for learning auditory signals in the brain.
The model innovatively combines physical modeling and deep learning. Exper-
iments demonstrate that the node representation can be significantly enhanced
by modeling the diffusion process of neural signals in the brain and obtaining
the physical location encoding based on the interrelationships, and the decod-
ing performance can be improved by a parameter transformation layer that can
adequately learn the inter-task activities and share the parameter representa-
tion. In the future, we will also simulate the diffusion process by better random
wandering, and further optimize the physical location encoding to improve the
generalization of the model so that it can be applied in other fields.
10 Wanli Ma et al.
5 Acknowledgement
This work is supported in part by the Foundation of XXXX.
References
1. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Pro-
ceedings of the 22nd ACM SIGKDD international conference on Knowledge dis-
covery and data mining. pp. 855–864 (2016)
2. Fruchterman, T.M., Reingold, E.M.: Graph drawing by force-directed placement.
Software: Practice and experience 21(11), 1129–1164 (1991)
3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L., Polosukhin, I.: Attention is all you need. Advances in neural information pro-
cessing systems 30 (2017)
4. Xiong, R., Yang, Y., He, D., Zheng, K., Zheng, S., Xing, C., Zhang, H., Lan,
Y., Wang, L., Liu, T.: On layer normalization in the transformer architecture. In:
International Conference on Machine Learning. pp. 10524–10533. PMLR (2020)
5. Willander, J., Baraldi, S.: Development of a new clarity of auditory imagery scale.
Behavior research methods 42, 785–790 (2010)
6. Zeidman, P., Mullally, S.L., Maguire, E.A.: Constructing, perceiving, and main-
taining scenes: hippocampal activity and connectivity. Cerebral Cortex 25(10),
3836–3855 (2015)
7. Chen, J., Gao, K., Li, G., He, K.: Nagphormer: A tokenized graph transformer for
node classification in large graphs. In: The Eleventh International Conference on
Learning Representations (2022)
8. Brusini, L., Stival, F., Setti, F., Menegatti, E., Menegaz, G., Storti, S.F.: A system-
atic review on motor-imagery brain-connectivity-based computer interfaces. IEEE
Transactions on Human-Machine Systems 51(6), 725–733 (2021)
9. Noordzij, M.L., Zuidhoek, S., Postma, A.: The influence of visual experience on
visual and spatial imagery. Perception 36(1), 101–112 (2007)
10. Gu, J., Zhang, H., Liu, B., Li, X., Wang, P., Wang, B.: An investigation of the
neural association between auditory imagery and perception of complex sounds.
Brain Structure and Function 224, 2925–2937 (2019)
11. Ernst, M., Torrisi, S., Balderston, N., Grillon, C., Hale, E.A.: fmri functional con-
nectivity applied to adolescent neurodevelopment. Annual review of clinical psy-
chology 11, 361–377 (2015)
12. Du, Y., Fu, Z., Calhoun, V.D.: Classification and prediction of brain disorders
using functional connectivity: promising but challenging. Frontiers in neuroscience
12, 525 (2018)
13. Guo, X., Yao, D., Cao, Q., Liu, L., Zhao, Q., Li, H., Huang, F., Wang, Y., Qian,
Q., Wang, Y., et al.: Shared and distinct resting functional connectivity in children
and adults with attention-deficit/hyperactivity disorder. Translational psychiatry
10(1), 65 (2020)
14. Wang, J., Xiao, L., Hu, W., Qu, G., Wilson, T.W., Stephen, J.M., Calhoun, V.D.,
Wang, Y.P.: Functional network estimation using multigraph learning with appli-
cation to brain maturation study. Human brain mapping 42(9), 2880–2892 (2021)
15. Orlichenko, A., Qu, G., Zhang, G., Patel, B., Wilson, T.W., Stephen, J.M., Cal-
houn, V.D., Wang, Y.P.: Latent similarity identifies important functional con-
nections for phenotype prediction. IEEE Transactions on Biomedical Engineering
(2022)
16. Qu, G., Hu, W., Xiao, L., Wang, J., Bai, Y., Patel, B., Zhang, K., Wang, Y.P.: Brain
functional connectivity analysis via graphical deep learning. IEEE Transactions on
Biomedical Engineering 69(5), 1696–1706 (2021)
17. Madsen, K.H., Krohne, L.G., Cai, X.l., Wang, Y., Chan, R.C.: Perspectives on
machine learning for classification of schizotypy using fmri data. Schizophrenia
Bulletin 44(suppl 2), S480–S490 (2018)
18. Yin, W., Li, L., Wu, F.X.: Deep learning for brain disorder diagnosis based on fmri
images. Neurocomputing 469, 332–345 (2022)
19. Li, X., Zhou, Y., Dvornek, N., Zhang, M., Gao, S., Zhuang, J., Scheinost, D.,
Staib, L.H., Ventola, P., Duncan, J.S.: Braingnn: Interpretable brain graph neural
network for fmri analysis. Medical Image Analysis 74, 102233 (2021)
20. Qu, G., Orlichenko, A., Wang, J., Zhang, G., Xiao, L., Zhang, K., Wilson, T.W.,
Stephen, J.M., Calhoun, V.D., Wang, Y.P.: Interpretable cognitive ability predic-
tion: A comprehensive gated graph transformer framework for analyzing functional
brain networks. IEEE Transactions on Medical Imaging (2023)
21. Lee, D., Park, B., Jang, C., Park, H.J.: Decoding brain states using functional
magnetic resonance imaging. Biomedical Engineering Letters 1, 82–88 (2011)
22. Li, J., Zhu, J., Zhang, B.: Discriminative deep random walk for network classifica-
tion. In: Proceedings of the 54th Annual Meeting of the Association for Computa-
tional Linguistics (Volume 1: Long Papers). pp. 1004–1013 (2016)
23. Sun, J., Ding, Y., Zhao, K., Xu, H., Zhang, Y., Gao, B.: Predicting alzheimer’s
disease based on network topological latent representations. Journal of Medical
Imaging and Health Informatics 10(3), 667–671 (2020)
24. Wu, Z., Jain, P., Wright, M., Mirhoseini, A., Gonzalez, J.E., Stoica, I.: Representing
long-range context for graph neural networks with global attention. Advances in
Neural Information Processing Systems 34, 13266–13279 (2021)
25. Kreuzer, D., Beaini, D., Hamilton, W., Létourneau, V., Tossou, P.: Rethinking
graph transformers with spectral attention. Advances in Neural Information Pro-
cessing Systems 34, 21618–21629 (2021)
26. Dwivedi, V.P., Bresson, X.: A generalization of transformer networks to graphs.
arXiv preprint arXiv:2012.09699 (2020)
27. Xu, Z., Bai, Y., Zhao, R., Hu, H., Ni, G., Ming, D.: Decoding selective auditory
attention with eeg using a transformer model. Methods 204, 410–417 (2022)
28. Shorten, C., Khoshgoftaar, T.: A survey on image data augmentation for deep
learning. Journal of Big Data 6 (07 2019). https://doi.org/10.1186/s40537-019-
0197-0
29. Abdelaziz Ismael, S.A., Mohammed, A., Hefny, H.: An enhanced
deep learning approach for brain cancer mri images classification us-
ing residual networks. Artificial Intelligence in Medicine 102, 101779
(2020). https://doi.org/https://doi.org/10.1016/j.artmed.2019.101779,
https://www.sciencedirect.com/science/article/pii/S0933365719306177
30. Fahimi, F., Dosen, S., Ang, K.K., Mrachacz-Kersting, N., Guan, C.: Generative
adversarial networks-based data augmentation for brain–computer interface. IEEE
Transactions on Neural Networks and Learning Systems 32(9), 4039–4051 (2021).
https://doi.org/10.1109/TNNLS.2020.3016666
31. He, C., Liu, J., Zhu, Y., Du, W.: Data augmentation for deep neural networks
model in eeg classification task: a review. Frontiers in Human Neuroscience 15,
765525 (2021)
32. Shen, R., Bubeck, S., Gunasekar, S.: Data augmentation as feature manipulation.
In: International conference on machine learning. pp. 19773–19808. PMLR (2022)
12 Wanli Ma et al.
33. Gao, Z., Shi, J., Wang, J.: Gq-gcn: Group quadratic graph convolutional network
for classification of histopathological images. In: Medical Image Computing and
Computer Assisted Intervention–MICCAI 2021: 24th International Conference,
Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24.
pp. 121–131. Springer (2021)
34. Zeidman, P., Mullally, S.L., Maguire, E.A.: Constructing, perceiving, and main-
taining scenes: hippocampal activity and connectivity. Cerebral Cortex 25(10),
3836–3855 (2015)
35. Gao, Z., Dang, W., Wang, X., Hong, X., Hou, L., Ma, K., Perc, M.: Complex
networks and deep learning for eeg signal analysis. Cognitive Neurodynamics 15,
369–388 (2021)
36. Xia, Y., McCracken, T., Liu, T., Chen, P., Metcalf, A., Fan, C.: Understanding the
disparities of pm2. 5 air pollution in urban areas via deep support vector regression.
Environmental Science & Technology

Understanding Auditory Evoked Brain Signal Via Physics-Informed Embedding Network With Multi-Task Transformer

Uploaded by

Copyright:

Available Formats

Understanding Auditory Evoked Brain Signal Via Physics-Informed Embedding Network With Multi-Task Transformer

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Understanding Auditory Evoked Brain Signal Via Physics-Informed Embedding Network With Multi-Task Transformer

Uploaded by

Copyright:

Available Formats

Understanding Auditory Evoked Brain Signal

via Physics-informed Embedding Network with

Abstract. In the fields of brain-computer interaction and cognitive neu-

Keywords: Physics-informed · Embedding representation · Auditory

2.1 Overview Methods

In this study, we propose a physics-informed embedding network with multi-task

2.2 Problem definition

2.3 PEMT-Net Model

Neural Diffusion Process. We apply the core idea of node2vec[1]. As shown in

A B 𝑃𝑜𝑠𝑒 𝑢 & = 𝑋' , 𝑌'

Information diffusion capture

... 𝑢 𝑃𝐸! = 𝐷! , _, _ Task 2

Graph Construction Neural Embeddings Position Encoding Adaptive Feature Fusion

Add & Norm with Residual

Evaluations Multi-task Transformer Module

Fig. 1. Flowchart of our PEMT-Net Model

co-occurrence probability of nodes in a sequence of nodes using the Skip-gram

where the conditional probability is defined by softmax function:

Our algorithm prominently features weighted random walk, leveraging the

Finally, the node embedding obtained by the above method is denoted as

Physical Position Encoding. As we describe in Fig. 1 B, we define two

X ′ = X||{P E u }u∈V (8)

where notation || is the symbol of concatenation.

Adaptive Embedding Fusion. In this section, we propose a multi-round

Algorithm 1 Fruchterman-Reingold Algorithm

Multi-Task Transformer. In this section, we propose a transformer model

where Q = SH Q , K = SH K , V = SH V , and H Q ∈ Rd×dQ , H K ∈ Rd×dK , H V ∈

M ultiHead(Q, K, V ) = Caoncat(head1 , . . . , headn )H O (11)

FFN(x) = max (0, xW1 + b1 )W2 + b2 (13)

where W1 , W2 are weight matrix, b1 , b2 are terms of bias.

Qi = Si WiQ Θi′ (15)

3.2 Implementation Details

yields embeddings with a dimensionality of 256. In the subsequent phase, the

3.3 Results of Fine-Grained Classification

Original embedded features

Updated embedded features

According to Fig. 2 A and B, we can observe that the original features

Table 1. Comparation of different methods

Methods Accuracy Precision Recall F1 Score

You might also like