Multi-View Spectrogram Transformer for Respiratory Sound Classification

W He, Y Yan, J Ren, R Bai… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
ICASSP 2024-2024 IEEE International Conference on Acoustics …, 2024ieeexplore.ieee.org
Deep neural networks have been applied to audio spectrograms for respiratory sound
classification. Existing models often treat the spectrogram as a synthetic image while
overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer
(MVST) is proposed to embed different views of time-frequency characteristics into the vision
transformer. Specifically, the proposed MVST splits the mel-spectrogram into different-sized
patches, representing the multi-view acoustic elements of a respiratory sound. The patches …
Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MVST splits the mel-spectrogram into different-sized patches, representing the multi-view acoustic elements of a respiratory sound. The patches and positional embeddings are fed into transformer encoders to extract the attentional information among patches through a self-attention mechanism. Finally, a gated fusion scheme is designed to automatically weigh the multi-view features to highlight the best one in a specific scenario. Experimental results on the ICBHI dataset demonstrate that the MVST significantly outperforms state-of-the-art methods for classifying respiratory sounds. The code is available at: https://github.com/wentaoheunnc/MVST.
ieeexplore.ieee.org
Showing the best result for this search. See all results