Inducing Inductive Bias in Vision Transformer for EEG Classification
ICASSP 2024-2024 IEEE International Conference on Acoustics …, 2024•ieeexplore.ieee.org
Human brain signals are highly complex and dynamic in nature. Electroencephalogram
(EEG) devices capture some of this complexity, both in space and in time, with a certain
resolution. Recently, transformer-based models have been explored in various applications
with different modalities of data. In this work, we introduce a transformer-based model for the
classification of EEG signals, inspired by the recent success of the Vision Transformer (ViT)
in image classification. Driven by the distinctive characteristics of the EEG data, we design a …
(EEG) devices capture some of this complexity, both in space and in time, with a certain
resolution. Recently, transformer-based models have been explored in various applications
with different modalities of data. In this work, we introduce a transformer-based model for the
classification of EEG signals, inspired by the recent success of the Vision Transformer (ViT)
in image classification. Driven by the distinctive characteristics of the EEG data, we design a …
Human brain signals are highly complex and dynamic in nature. Electroencephalogram (EEG) devices capture some of this complexity, both in space and in time, with a certain resolution. Recently, transformer-based models have been explored in various applications with different modalities of data. In this work, we introduce a transformer-based model for the classification of EEG signals, inspired by the recent success of the Vision Transformer (ViT) in image classification. Driven by the distinctive characteristics of the EEG data, we design a module that enables us to (1) extract spatio-temporal tokens inherent in the EEG signals and (2) integrate additional non-linearities to capture intricate and non-linear patterns in EEG signals. To that end, we introduce a new lightweight architectural component that combines our proposed attention model with convolution. This convolutional tokenization module forms the basis of our vision backbone referred to as Brain Signal Vision Transformer (BSVT). This architecture takes into account the spatial and temporal features in EEG datasets, leading to token embeddings that effectively capture the fusion of spatial-temporal information. Moreover, while transformer-based models typically perform well when provided with large datasets, here we show that our combination of the inherent inductive bias of Convolutional Neural Networks (CNN) with the transformer enables efficient training from scratch using relatively small datasets, with as few as 0.75M parameters. On the publicly available EEG dataset from Temple University Hospital (TUH Abnormal), our model achieves results comparable or superior to its counterpart ViT model with patchify stem. The implementation is available at https://github.com/IamRabin/BSVT.git
ieeexplore.ieee.org
Showing the best result for this search. See all results