Understanding Auditory Evoked Brain Signal Via Physics-Informed Embedding Network With Multi-Task Transformer
Understanding Auditory Evoked Brain Signal Via Physics-Informed Embedding Network With Multi-Task Transformer
Understanding Auditory Evoked Brain Signal Via Physics-Informed Embedding Network With Multi-Task Transformer
Wanli Ma1 , Xuegang Tang2 , Jin Gu3 , Ying Wang4 , and Yuling Xia1
arXiv:2406.02014v1 [q-bio.NC] 4 Jun 2024
1
School of Mathematics, Southwest Jiaotong University, Sichuan, Chengdu, China
[email protected] [email protected]
2
School of Computer, SWJTU-Leeds Joint School, Sichuan, Chengdu, China
[email protected]
3
School of Computing and Artificial Intelligence, Southwest Jiaotong University,
Sichuan, Chengdu, China
[email protected]
4
Chengdu University of Technology Oxford Brookes College, Sichuan, Chengdu,
China
[email protected]
1 Introduction
Brain-computer interaction (BCI), which utilizes subtle changes in brain activity
and transforms these neural signals into executable computational instructions,
shows great potential for deepening our understanding of brain mechanisms.
Although this field is still in its infancy, it has already shown promising appli-
cations in clinical medicine, biomedicine, and other fields. Current research has
focused on visual and spatial cognition[9] or motor imagery, highlighting the
important need for auditory-focused research which has not been done much.
Task-state functional magnetic resonance imaging (ts-fMRI), which is charac-
terized by high spatial resolution and non-invasiveness, is pivotal in cognitive
neuroscience. Recently, functional connectivity (FC) analysis has become one of
the most commonly used methods for describing brain functioning[11], and pre-
vious studies have shown that functional connectivity is a potential brain marker
for predicting cognitive and behavioral traits[12,13,14], However, the complex-
ity of brain networks poses a great challenge to traditional machine learning
models such as support vector machines and logistic regression; moreover, the
reliance of these algorithms on manual feature extraction limits their ability to
learn intrinsic data features, which in turn affects the model’s generalization
ability[17].
Data augmentation is a key technique to overcome data scarcity and im-
prove model robustness and training efficiency[29] to improve model training
effectiveness[28]. Fahimi et al. proposed a deep convolutional generative adver-
sarial network-based framework for data augmentation, which achieved better
results[30]. However, traditional data enhancement techniques may introduce
artificial variations that do not necessarily match the natural pattern of neu-
ral activity[31]. In addition, they may not be able to effectively capture the
complex linear and nonlinear intricacies of the complex relationships between
different brain regions[32]. Deep learning methods have become cutting-edge ap-
proaches for analyzing fMRI datasets, successfully extracting meaningful infor-
mation from complex connectivity patterns in neuroimaging studies of cognitive
abilities [19,20]. Gao et al. proposed a group quadratic graph convolutional net-
work that improves the ability of individual neurons to represent complex data
[33]. However, methods like this do not maximize the capture and utilization
of linear and nonlinear features. In addition, the complexity of brain signals
and the intricate topology of brain networks pose a great challenge to model
generalization[35]. Models trained in isolated tasks often have difficulty adapting
to new unknown conditions due to the inability to cover all neural connections
and interaction patterns.
To address the aforementioned challenges, we propose a novel model that
innovatively leverages physical information to enhance embedding representa-
tions, fostering deep information sharing across tasks and thereby improving the
model’s generalizability. Our main contributions are as follows:
(1)Proposing PEMT-Net, a groundbreaking multi-task neural network model
that leverages physical information embedding and simulates the neural diffu-
sion process with random walks, adeptly capturing extensive neural interactions
Title Suppressed Due to Excessive Length 3
and introducing a novel method for constructing neural embedding maps from
graph features.(2)Our innovative approach includes a unique physical location
encoding strategy that delineates the spatial relationships of brain regions based
on their correlation, enabling precise physical positioning within the neural net-
work’s structure.(3)By integrating adaptive embedding fusion technology and
fine-grained multi-task parameter sharing, PEMT-Net achieves a deep under-
standing of both linear and non-linear features, substantially improving model
generalizability across diverse cognitive tasks.
2 Methods
In order to express our actual work in mathematical expression, let G = {G1 , G2 ,
. . . , GT }, where Gj = V j , E j , W j . A subgraph Gj represents a subject’s brain
region connectivity at a fine granularity. Every Region of interests (ROIs) is
j j
considered
j j
as ja node u ∈ V , and everyj edge among them is represented as
e u , v ⊂ E , with a weight wuj vj ∈ W .
𝑃𝐸! = _, 𝑋! , 𝑌!
(𝑃𝐸' ) Task 1
. .
. .
.
. .
.
.. .
(𝑃𝐸' )
. 𝐾 times Adaptive
𝑁" 𝑢 Fusion Layer
Multi-head Self-Attention
Accuracy
Output Projection
Param Transform
Prediction Labels
Neural Network
Output
Attention Layer
Feed Forward
Precision
.
.
.
F1-score
Recall
𝑁 layers
which is defined as
1
p
if v = prev,
j j
αpq (u , v ) = 1 if v j ∈ Us (v j ), (4)
1 (2)
q if v j ∈
/ Us (v j ), v j ∈ Us (v j ).
k2
Frep (u, v) = − (6)
||pos(u) − pos(v)||
where pos(u), pos(v) denotes the position of node u, v ∈ V , but define attractive
force with weight:
||pos(u) − pos(v)||2
Fattr (u, v) = · wuv (7)
k
which means the higher the weight, the stronger the attraction, the nodes con-
nected by that edge will be pulled closer together. The algorithm is demonstrated
below:
In the end of this part, the original embedding is directly aggregated with
the physical location encoding, which is denoted as:
where X ′(k+1) is the feature matrix after the k-th round of update, ⊙ denotes
element-by-element multiplication, and W ′ is the weight matrix calculated based
on the node degree for realizing adaptive embedding fusion.
The feature after k-th round propagation enhancement is S ∈ MN,k+1,d .
where
headi = Attention(QHiQ , KHiK , V HiV ) (12)
The self-attention mechanism in each encoder layer is followed by a feed-forward
neural network that further processes the output of the self-attention layer. This
network usually contains two linear transformations and a nonlinear activation
function of the following form:
Θi′ = Wi Θ + bi (14)
where Θi′ is the transformed parameter, which will be used in i-th encoder layer.
In this way, although the parameters used in the encoder layer are transformed,
the basis of these parameters is the same.
Furthermore, inside each encoder layer, Q, K and V in the self-attention
mechanism can be calculated using the transformed parameters:
To process the data processed by the decoding layer, we apply the attention-
based readout layer involved in the model NAGphormer [7], which is represented
in Fig. 1 C. Up to this point, the physics-informed embedding information
extracted from the brain regions has been fully learned by our model.
3 Experiments
3.1 Data and Pre-processing
The proposed model was evaluated on the fMRI dataset we collected from
healthy subjects. In the fMRI experiment, subjects were asked to imagine and
listen to four categories auditory information (Human, Animal, Machine, Na-
ture) , and we obtained eight categories of auditory neural activity which would
be classified to test the performance of our model. For fMRI data pre-processing,
the first five volumes of each run were discarded [34]. All images were realigned
to remove movement artifact, then coregistered and standardized to MNI space
with a voxel size resampled to 3×3×3 mm using the T1 images. The normal-
ized images were smoothed with a 6-mm Gaussian kernel. Then we extract time
series to form functional connection.
A C
Fig. 2. A represents the distribution of the original features after dimensionality re-
duction by the t-SNE method; B represents the distribution of the high-dimensional
embeddings encoded with physical locations after dimensionality reduction by the t-
SNE method; C represents the error bar graphs of each metric for each method.
mation diffusion, capable of capturing both local and non-local overflow effects,
thereby benefiting the subsequent classification prediction part of the model.
In this experiment, we performed a fine-grained categorization (8 categories)
to validate our model, where there were a total of 4 tasks that were evenly
distributed in each category. To benchmark our method, we introduce several
baselines: GraphSAGE, DeepWalk, Node2vec, and to verify the feasibility of our
idea, we did the following ablation experiments respectively: multi-transformer
(without physics-informed), PET-Net (without sharing parameters) and our pro-
posed model.
Table 1 and Fig. 2 C show all the results of the experiments in this
paper, where the upper half is the comparison experiment with baseline and
the lower half is the ablation experiment. It is easy to see that our proposed
method achieves the best. For baseline, the accuracy improvement is 16.1% -
27.93%. From the ablation experiments, we can see that compared to the Multi-
Transformer model, our accuracy is improved by 13.51%, which indicates that
our physically enhanced embedding representation has excellent results; com-
pared to the PET-Net model, our accuracy is ahead by 9.09%, which indicates
that the deep learning module involved successfully captures deep linear and
nonlinear features and preserves them in multi-task learning, improving the gen-
eralization of the model.
4 Conclusion
In this paper, we present a novel model for learning auditory signals in the brain.
The model innovatively combines physical modeling and deep learning. Exper-
iments demonstrate that the node representation can be significantly enhanced
by modeling the diffusion process of neural signals in the brain and obtaining
the physical location encoding based on the interrelationships, and the decod-
ing performance can be improved by a parameter transformation layer that can
adequately learn the inter-task activities and share the parameter representa-
tion. In the future, we will also simulate the diffusion process by better random
wandering, and further optimize the physical location encoding to improve the
generalization of the model so that it can be applied in other fields.
10 Wanli Ma et al.
5 Acknowledgement
This work is supported in part by the Foundation of XXXX.
References
1. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Pro-
ceedings of the 22nd ACM SIGKDD international conference on Knowledge dis-
covery and data mining. pp. 855–864 (2016)
2. Fruchterman, T.M., Reingold, E.M.: Graph drawing by force-directed placement.
Software: Practice and experience 21(11), 1129–1164 (1991)
3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L., Polosukhin, I.: Attention is all you need. Advances in neural information pro-
cessing systems 30 (2017)
4. Xiong, R., Yang, Y., He, D., Zheng, K., Zheng, S., Xing, C., Zhang, H., Lan,
Y., Wang, L., Liu, T.: On layer normalization in the transformer architecture. In:
International Conference on Machine Learning. pp. 10524–10533. PMLR (2020)
5. Willander, J., Baraldi, S.: Development of a new clarity of auditory imagery scale.
Behavior research methods 42, 785–790 (2010)
6. Zeidman, P., Mullally, S.L., Maguire, E.A.: Constructing, perceiving, and main-
taining scenes: hippocampal activity and connectivity. Cerebral Cortex 25(10),
3836–3855 (2015)
7. Chen, J., Gao, K., Li, G., He, K.: Nagphormer: A tokenized graph transformer for
node classification in large graphs. In: The Eleventh International Conference on
Learning Representations (2022)
8. Brusini, L., Stival, F., Setti, F., Menegatti, E., Menegaz, G., Storti, S.F.: A system-
atic review on motor-imagery brain-connectivity-based computer interfaces. IEEE
Transactions on Human-Machine Systems 51(6), 725–733 (2021)
9. Noordzij, M.L., Zuidhoek, S., Postma, A.: The influence of visual experience on
visual and spatial imagery. Perception 36(1), 101–112 (2007)
10. Gu, J., Zhang, H., Liu, B., Li, X., Wang, P., Wang, B.: An investigation of the
neural association between auditory imagery and perception of complex sounds.
Brain Structure and Function 224, 2925–2937 (2019)
11. Ernst, M., Torrisi, S., Balderston, N., Grillon, C., Hale, E.A.: fmri functional con-
nectivity applied to adolescent neurodevelopment. Annual review of clinical psy-
chology 11, 361–377 (2015)
12. Du, Y., Fu, Z., Calhoun, V.D.: Classification and prediction of brain disorders
using functional connectivity: promising but challenging. Frontiers in neuroscience
12, 525 (2018)
13. Guo, X., Yao, D., Cao, Q., Liu, L., Zhao, Q., Li, H., Huang, F., Wang, Y., Qian,
Q., Wang, Y., et al.: Shared and distinct resting functional connectivity in children
and adults with attention-deficit/hyperactivity disorder. Translational psychiatry
10(1), 65 (2020)
14. Wang, J., Xiao, L., Hu, W., Qu, G., Wilson, T.W., Stephen, J.M., Calhoun, V.D.,
Wang, Y.P.: Functional network estimation using multigraph learning with appli-
cation to brain maturation study. Human brain mapping 42(9), 2880–2892 (2021)
15. Orlichenko, A., Qu, G., Zhang, G., Patel, B., Wilson, T.W., Stephen, J.M., Cal-
houn, V.D., Wang, Y.P.: Latent similarity identifies important functional con-
nections for phenotype prediction. IEEE Transactions on Biomedical Engineering
(2022)
Title Suppressed Due to Excessive Length 11
16. Qu, G., Hu, W., Xiao, L., Wang, J., Bai, Y., Patel, B., Zhang, K., Wang, Y.P.: Brain
functional connectivity analysis via graphical deep learning. IEEE Transactions on
Biomedical Engineering 69(5), 1696–1706 (2021)
17. Madsen, K.H., Krohne, L.G., Cai, X.l., Wang, Y., Chan, R.C.: Perspectives on
machine learning for classification of schizotypy using fmri data. Schizophrenia
Bulletin 44(suppl 2), S480–S490 (2018)
18. Yin, W., Li, L., Wu, F.X.: Deep learning for brain disorder diagnosis based on fmri
images. Neurocomputing 469, 332–345 (2022)
19. Li, X., Zhou, Y., Dvornek, N., Zhang, M., Gao, S., Zhuang, J., Scheinost, D.,
Staib, L.H., Ventola, P., Duncan, J.S.: Braingnn: Interpretable brain graph neural
network for fmri analysis. Medical Image Analysis 74, 102233 (2021)
20. Qu, G., Orlichenko, A., Wang, J., Zhang, G., Xiao, L., Zhang, K., Wilson, T.W.,
Stephen, J.M., Calhoun, V.D., Wang, Y.P.: Interpretable cognitive ability predic-
tion: A comprehensive gated graph transformer framework for analyzing functional
brain networks. IEEE Transactions on Medical Imaging (2023)
21. Lee, D., Park, B., Jang, C., Park, H.J.: Decoding brain states using functional
magnetic resonance imaging. Biomedical Engineering Letters 1, 82–88 (2011)
22. Li, J., Zhu, J., Zhang, B.: Discriminative deep random walk for network classifica-
tion. In: Proceedings of the 54th Annual Meeting of the Association for Computa-
tional Linguistics (Volume 1: Long Papers). pp. 1004–1013 (2016)
23. Sun, J., Ding, Y., Zhao, K., Xu, H., Zhang, Y., Gao, B.: Predicting alzheimer’s
disease based on network topological latent representations. Journal of Medical
Imaging and Health Informatics 10(3), 667–671 (2020)
24. Wu, Z., Jain, P., Wright, M., Mirhoseini, A., Gonzalez, J.E., Stoica, I.: Representing
long-range context for graph neural networks with global attention. Advances in
Neural Information Processing Systems 34, 13266–13279 (2021)
25. Kreuzer, D., Beaini, D., Hamilton, W., Létourneau, V., Tossou, P.: Rethinking
graph transformers with spectral attention. Advances in Neural Information Pro-
cessing Systems 34, 21618–21629 (2021)
26. Dwivedi, V.P., Bresson, X.: A generalization of transformer networks to graphs.
arXiv preprint arXiv:2012.09699 (2020)
27. Xu, Z., Bai, Y., Zhao, R., Hu, H., Ni, G., Ming, D.: Decoding selective auditory
attention with eeg using a transformer model. Methods 204, 410–417 (2022)
28. Shorten, C., Khoshgoftaar, T.: A survey on image data augmentation for deep
learning. Journal of Big Data 6 (07 2019). https://doi.org/10.1186/s40537-019-
0197-0
29. Abdelaziz Ismael, S.A., Mohammed, A., Hefny, H.: An enhanced
deep learning approach for brain cancer mri images classification us-
ing residual networks. Artificial Intelligence in Medicine 102, 101779
(2020). https://doi.org/https://doi.org/10.1016/j.artmed.2019.101779,
https://www.sciencedirect.com/science/article/pii/S0933365719306177
30. Fahimi, F., Dosen, S., Ang, K.K., Mrachacz-Kersting, N., Guan, C.: Generative
adversarial networks-based data augmentation for brain–computer interface. IEEE
Transactions on Neural Networks and Learning Systems 32(9), 4039–4051 (2021).
https://doi.org/10.1109/TNNLS.2020.3016666
31. He, C., Liu, J., Zhu, Y., Du, W.: Data augmentation for deep neural networks
model in eeg classification task: a review. Frontiers in Human Neuroscience 15,
765525 (2021)
32. Shen, R., Bubeck, S., Gunasekar, S.: Data augmentation as feature manipulation.
In: International conference on machine learning. pp. 19773–19808. PMLR (2022)
12 Wanli Ma et al.
33. Gao, Z., Shi, J., Wang, J.: Gq-gcn: Group quadratic graph convolutional network
for classification of histopathological images. In: Medical Image Computing and
Computer Assisted Intervention–MICCAI 2021: 24th International Conference,
Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24.
pp. 121–131. Springer (2021)
34. Zeidman, P., Mullally, S.L., Maguire, E.A.: Constructing, perceiving, and main-
taining scenes: hippocampal activity and connectivity. Cerebral Cortex 25(10),
3836–3855 (2015)
35. Gao, Z., Dang, W., Wang, X., Hong, X., Hou, L., Ma, K., Perc, M.: Complex
networks and deep learning for eeg signal analysis. Cognitive Neurodynamics 15,
369–388 (2021)
36. Xia, Y., McCracken, T., Liu, T., Chen, P., Metcalf, A., Fan, C.: Understanding the
disparities of pm2. 5 air pollution in urban areas via deep support vector regression.
Environmental Science & Technology