Deep Neural Networks and End-to-End Learning for Audio Compression

Rim, Daniela N.; Jang, Inseon; Choi, Heeyoul

Computer Science > Machine Learning

arXiv:2105.11681 (cs)

[Submitted on 25 May 2021 (v1), last revised 13 Jul 2021 (this version, v2)]

Title:Deep Neural Networks and End-to-End Learning for Audio Compression

Authors:Daniela N. Rim, Inseon Jang, Heeyoul Choi

View PDF

Abstract:Recent achievements in end-to-end deep learning have encouraged the exploration of tasks dealing with highly structured data with unified deep network models. Having such models for compressing audio signals has been challenging since it requires discrete representations that are not easy to train with end-to-end backpropagation. In this paper, we present an end-to-end deep learning approach that combines recurrent neural networks (RNNs) within the training strategy of variational autoencoders (VAEs) with a binary representation of the latent space. We apply a reparametrization trick for the Bernoulli distribution for the discrete representations, which allows smooth backpropagation. In addition, our approach allows the separation of the encoder and decoder, which is necessary for compression tasks. To our best knowledge, this is the first end-to-end learning for a single audio compression model with RNNs, and our model achieves a Signal to Distortion Ratio (SDR) of 20.54.

Subjects:	Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2105.11681 [cs.LG]
	(or arXiv:2105.11681v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2105.11681

Submission history

From: Daniela Noemi Rim [view email]
[v1] Tue, 25 May 2021 05:36:30 UTC (2,347 KB)
[v2] Tue, 13 Jul 2021 10:30:42 UTC (2,347 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-05

Change to browse by:

cs
cs.SD
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Heeyoul Choi

export BibTeX citation

Computer Science > Machine Learning

Title:Deep Neural Networks and End-to-End Learning for Audio Compression

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Deep Neural Networks and End-to-End Learning for Audio Compression

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators