This is experimental HTML to improve accessibility. We invite you to report rendering errors. Use Alt+Y to toggle on accessible reporting links and Alt+Shift+Y to toggle off. Learn more about this project and help improve conversions.
HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.
Report issue for preceding element
failed: extarrows
Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.
Report issue for preceding element
License: CC BY 4.0
arXiv:2403.00977v1 [cs.SD] 01 Mar 2024
Scaling Up Adaptive Filter Optimizers
Report issue for preceding element
Jonah Casebeer, , Nicholas J. Bryan, , Paris Smaragdis
J. Casebeer is with the Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (e-mail: jonah.casebeer@ieee.org).
N. J. Bryan is with Adobe Research, San Francisco, CA, 94103 USA (e-mail: njb@ieee.org)
P. Smaragdis is with the Department of Computer Science and Department, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (e-mail: paris@illinois.edu)
Report issue for preceding element
Abstract
Report issue for preceding element
We introduce a new online adaptive filtering method called supervised multi-step adaptive filters (SMS-AF). Our method uses neural networks to control or optimize linear multi-delay or multi-channel frequency-domain filters and can flexibly scale-up performance at the cost of increased compute – a property rarely addressed in the AF literature, but critical for many applications. To do so, we extend recent work with a set of improvements including feature pruning, a supervised loss, and multiple optimization steps per time-frame. These improvements work in a cohesive manner to unlock scaling. Furthermore, we show how our method relates to Kalman filtering and meta-adaptive filtering, making it seamlessly applicable to a diverse set of AF tasks. We evaluate our method on acoustic echo cancellation (AEC) and multi-channel speech enhancement tasks and compare against several baselines on standard synthetic and real-world datasets. Results show our method performance scales with inference cost and model capacity, yields multi-dB performance gains for both tasks, and is real-time capable on a single CPU core.
Report issue for preceding element
Index Terms:
Report issue for preceding element
adaptive filtering, supervised adaptive filtering, acoustic echo cancellation, beamforming, learning to learn
I Introduction
Report issue for preceding element
Adaptive filters (AF) play an indispensable role in a wide array of signal processing applications such as acoustic echo cancellation, equalization, and interference suppression.
AFs are parameterized by time-varying filter weights and require an update or optimization rule to control them over time.
Improving the performance of AFs continues to pose an intricate challenge, requiring a nuanced approach to optimizer design.
Consequently, AF algorithm designers have relied on mathematical insights to create tailored optimizers, starting from the foundational development of the least mean squares algorithm (LMS) [1] to the Kalman filter [2, 3, 4, 5].
Report issue for preceding element
Figure 1: Acoustic echo cancellation performance vs. model size, optimization steps per time-frame (opt. steps), and supervision levels. Bubble size shows real-time-factor (RTF) where smaller is faster, inner-shape shows opt. steps, and the vertical dotted line separates unsupervised (left) and supervised (right) approaches. SMS-AF, in bold on the far right, demonstrate robust scaling performance in terms of parameters, and RTF. Report issue for preceding element
In contrast, we have witnessed countless remarkable deep learning algorithm advancements in other domains through the principle of “scaling” [6, 7, 8, 9]. The scaling approach involves improving an existing method by deploying additional computational resources. Scaling methodologies are particularly enticing, as they tap into the increasing computational capabilities of modern smart devices, minimizing the need for labor-intensive manual tuning and intervention.
In the context of neural networks for online low-latency AFs that can benefit from scaling, we find two general approaches: 1) model-based methods that integrate deep neural networks (DNNs) into existing AF frameworks to update optimizer statistics [10, 11, 12], step-size [13, 14], or other quantities [15] and 2) model-free strategies that do not rely on an existing AF strategy, learn AF optimizers using meta-learning in an end-to-end fashion [16, 17, 18, 19], and yield state-of-the-art (SOTA) results [15, 19].
We focus on the latter, given their recent success, but note scaling such approaches have either been limited to high-latency regimes [18] or only marginally improves results [15].
Report issue for preceding element
We propose a new online AF method called supervised multi-step AF (SMS-AF). Our method integrates a series of algorithm improvements on top of recent meta-learning methods [17, 18] that together enable scaling performance by increasing model capacity and/or inference cost as shown in Fig. 1.
We evaluate our approach on the tasks of acoustic echo cancellation (AEC) and generalized sidelobe canceller (GSC) speech enhancement and compare to recent SOTA approaches. Results show that our scaling behavior translates to substantial performance gains in all metrics across tasks and datasets and delivers breakthrough AEC performance
Report issue for preceding element
Our contributions include:
1) A new general purpose AF method that allow us to reliably improve performance by simply using more computation,
2) Design insights for customizing our proposed method for the task of AEC and GSC,
3) Empirical exploration of AF optimizer scaling showing our approach scales vs. model size and optimizer step count, and
4) Insights as to how our approach generalizes Kalman filtering.
Report issue for preceding element
II Background
Report issue for preceding element
II-AAdaptive Filters
Report issue for preceding element
An AF is an optimization procedure that seeks to adapt filter parameters to fit an objective over time. AFs typically input a mixture , adjust a time-varying linear filter with parameters to remove noise via knowledge from a reference signal , produce estimate , and output an error signal . We focus on multi-delay and/or multi-channel frequency-domain filters (MDF) for low-latency processing.
The filter parameters are updated across time by minimizing a loss, , resulting in a per frame update rule,
Report issue for preceding element
(1)
can also be written as the output of an optimizer with input , and parameters .
Report issue for preceding element
II-BAdaptive Filter Optimizers
Report issue for preceding element
The AF optimizer, is key and can vary in levels of sophistication. In the simple case, the optimizer can be a hand-derived algorithm such as LMS. In this case, each parameter in is updated independently, so accepts the gradient with respect to the loss via , and is only parameterized by the step-size , resulting in simply scaling the gradient.
Most AFs operate via the following steps [4]:
Report issue for preceding element
(2)
(3)
(4)
where (2) applies the filter, (3) updates the optimizer, and (4) updates the filter parameters for the next frame. Optimizers typically use filter output , produced via , creating a feedback loop.
The Kalman filter (KF) extends the above via distinct “predict” and “update” steps. In the KF predict step, (2)-(4) are run as normal. In the KF update step, however, the filter output is reprocessed after (4) using the latest data:
Report issue for preceding element
(5)
II-CLearned Optimizers
Report issue for preceding element
Historically, AFs are hand-derived, given a loss and filter. In contrast, neural-AF optimizers can be trained via meta-learning (Meta-AF) [20, 16, 17]. Meta-AFs are trained to control MDF filters via recurrent neural networks (RNNs) with parameters that are trained to maximize AF performance on a large dataset via an unsupervised (or self-supervised) meta-objective and backpropagation through time (BPTT).
A common meta-loss is,
Report issue for preceding element
(6)
where , is the truncation length, is the hop size, and concatenates.
Report issue for preceding element
Two important extensions to Meta-AF include 1) higher-order Meta-AF (HO-Meta-AF) [18], which introduces learnable coupling modules to model groups of filter parameters, reduce complexity, and improve performance for high-latency single-block frequency-domain filters and 2) low-complexity neural Kalman filtering (NKF) [15], which extends Meta-AF with a KF, a supervised loss, and different training setup. We regard Meta-AF as SOTA for unsupervised AFs (see Table IV [17]) and NKF as SOTA for supervised AFs [15].
Report issue for preceding element
III Scaling Up Learned Optimizers
Report issue for preceding element
As the foundation of our SMS-AF method, we combine Meta-AFs [17] with a higher-order optimizer [18] with per-frequency inputs , and then extend it with three task-agnostic improvements and one task-specific change. Our training and inference methods are summarized in Alg. 1.
Report issue for preceding element
III-AScaling Up Feature Quality: Feature Pruning
Report issue for preceding element
Our first insight is to use only three features to control filter adaptation: knowledge of the filter input, final filter output, and filter state.
Compared to past work [17] that uses,
Report issue for preceding element
(7)
where is a gradient w.r.t. loss , we use
Report issue for preceding element
(8)
Pruning reduces complexity by lowering input dimension and memory requirements for the optimizer,
while eliminating inference-time gradients, as shown in line of Alg. 1.
Report issue for preceding element
III-BScaling Up Supervision: Supervised Loss
Report issue for preceding element
Our second insight is to use a high-quality supervised loss, instead of an unsupervised loss.
Previous methods have explored supervised losses such as frame-wise independent supervised losses for echoes [15] or oracle filter parameters [11]. These methods treat frequency bins, adjacent frames, and other channels as distinct optimization entities and have not scaled [15]. As such, we compute our supervised loss in the time-domain after all AF operations have been performed. This strategy is similar to (6), but with supervision. Our supervision is non-causal; the loss at depends on updates from , enabling the optimizer to learn anticipatory updates.
For AEC,
Report issue for preceding element
(9)
where is the true echo. For GSC, we use scale-invariant signal-to-distortion ratio (SI-SDR).
Better loss functions exclusively impact the training phase, without contributing to test-time complexity, making this change cost-free for inference. This corresponds to line of Alg. 1.
Report issue for preceding element
III-CScaling Up Feedback: Multi-Step Optimization
Report issue for preceding element
Our third insight is to leverage the iterative nature of optimizers by executing multiple optimization steps per time frame. By doing so, we offer our optimizers a more powerful feedback mechanism and use the most current parameters for the filter output.
Specifically, we run our optimizer update via
(2)-(4), (2)-(5), or looping over (2)-(5) multiple times. The first option follows Meta-AF, the second option follows a typical KF, and the third extends the KF. We denote the number of (2)-(4) iterations via .
Incorporating multi-step optimization in Alg. 1 involves three changes. First, initializing each frame’s filter and optimizer state with results from the last frame (lines ). Second, iteratively progressing through steps withing a frame (line ), while updating filter parameters/outputs, and optimizer state (lines ). Last, running a final filter forward pass using the latest parameters (line ).
We find this approach to be a compelling alternative to increasing the dimension of the optimizer, . Notably, it avoids increasing the parameter count, and it linearly scales complexity, in stark contrast to the quadratic complexity effects associated with .
When using overlap-save, we noticed artifacts due to rapid filter adaptation. So, we applied a straightforward solution: overlap-add with a synthesis window, but no analysis window.
Report issue for preceding element
III-EPerspectives
Report issue for preceding element
Our modifications are a notable departure from the Meta-AF methodology, but still aim to learn a neural optimizer for AFs end-to-end.
First, by pruning inputs features and introducing a supervised loss, we eliminate the need for explicit meta-learning, leading to a more streamlined BPTT training process.
Second, we replace the past unsupervised loss with a new, strong supervised signal and loss, helping us scale up.
Third, we leverage a multi-step optimization scheme. This creates a generalization of the Kalman filter, where all parameters are entirely learned, while retaining explicit predict and update steps. This also effectively deepens our optimizer networks by sharing parameters across layers in a depth-wise manner.
Report issue for preceding element
IV Experimental Design
Report issue for preceding element
IV-AExperiments
Report issue for preceding element
The goal of our experiments is to benchmark SMS-AF and demonstrate how it scales. To do so, we perform an initial within method ablation, then study scaling on AEC and GSC tasks and vary 1) optimizer model sizes with small (S), medium (M), and large (L) models 2) an unsupervised (U) or supervised (S) loss and 3) the number of predict (P) and update (U) steps per frame. Each experiment is labeled with an identifier (e.g. SSPU), indicating the size, supervision, and number of PU steps. We label baselines when applicable.
Report issue for preceding element
IV-BAEC Configuration
Report issue for preceding element
Our AEC signal model is , where stands for noise, and is speech. The goal is to recover the speech given the far-end , and mixture . This involves fitting a filter to mimic . We use a linear MDF filter with blocks, each of size , a hop of , and construct the output using overlap-add with a Hann window.
Our baselines are NLMS, KF [22], Meta-AF [17], HO-Meta-AF [18], and Neural-Kalman Filter [15]. We also test several HO-Meta-AF model sizes as well as multi-step NLMS, KF, and HO-Meta-AF. For training, we use the synthetic fold of the Microsoft AEC Challenge [23].
Each scene has double-talk, near-end noise and loud-speaker nonlinearities. We also evaluate on the real, crowd-sourced, blind test-set [23].
We use echo return loss enhancement (ERLE) [24] to measure echo reduction. To describe perceptual quality, we use AEC-MOS, a reference-free model that predicts a 5 point score [23]. On real data, we prefix with an R, use ERLE in single-talk and AEC-MOS in double-talk. To quantify complexity, we use mega FLOP (MFLOP) counts, single core real-time-factor (RTF) equal to processing over elapsed time, and model size.
Report issue for preceding element
IV-CGSC Configuration
Report issue for preceding element
For GSC, we use a single-block frequency-domain GSC beamformer.
The signal model at each of the microphones is , and is the impulse response from source to mic . The goal is to recover the clean speech given the input signal . This requires fitting a filter to remove the effects of noise, . We assume access to a steering vector
and compare against NLMS, recursive-lease-squares (RLS), and Meta-AF. We test multiple HO-Meta-AF model sizes as well as multi-step NLMS, RLS, and HO-Meta-AF. We used the CHIME-3 [25] dataset.
For overall quality, we compute scale-invariant signal-to-distortion ratio (SI-SDR) [26], and contrast
signal-to-interference ratio (SIR), and signal-to-artifact ratio (SAR) [27]. For perceptual quality, we use Short-Time Objective Intelligibility (STOI) [28].
Report issue for preceding element
IV-DOptimizer Configuration
Report issue for preceding element
For AEC and GSC, we use higher-order Meta-AF optimizers with banded coupling, and a group size of [18].
This amounts to a Conv1D layer, two GRU layers, and a transposed Conv1D layer.
To train, we use Adam with a batch of , a learning rate of , and randomize the truncation length with a maximum of . We apply gradient clipping and reduce the learning rate by half if the validation performance does not improve for epochs, and stop training after epochs with no improvement. We use log-MSE loss on the echo for AEC, and SI-SDR loss on the clean speech for GSC. All models are trained on one GPU. Note, our S, M, and L model sizes correspond to hidden state sizes of , , and with parameters counts of about K, K, and K.
We perform an initial within-method ablation on the task of AEC to understand our modifications.
First, we compare an MUP variant trained with the full feature vs. pruned feature set. The full feature set achieves an ERLE of dB (not shown), while the pruned set scores dB, over a dB gain.
We expand on the pruned model and add supervision, resulting in an ERLE improvement to dB, a gain of nearly dB.
We then use multiple steps per frame. Extending the supervised model with an update-step increases ERLE to dB, and doubling the iterations reaches dB.
To mitigate clicking artifacts, we then add our modified OLA scheme. This change reduces the ERLE by dB ERLE, but removes severe clicking artifacts.
Combined, this yields a dB ERLE gain.
Report issue for preceding element
V-BAEC Scaling Ablation and Benchmarking
Report issue for preceding element
Next, we explore scaling in AEC as shown in Fig. 1 and Table I. We attempt to scale up our baselines and then do so with our proposed model.
When scaling model size, we notice that scaling the unsupervised model from S to L (SUP to LUP) results in a peak gain of dB, to dB ERLE.
In contrast, scaling the supervised model from S to L (SSP to LSP) yields larger gains, peaking at dB.
When scaling optimization steps, we find the unsupervised models from SUP to SUPUx2 results in marginal or even a negative performances changes.
In contrast, scaling from SSP to SSPUx2 provides dB, and LSP to LSPUx2 provides dB, showing supervision is crucial to unlock the benefit of multiple opt. steps per frame.
Our bestperforming LSPUx2 scores over dB ERLE, doubling the SUP performance of dB.
Report issue for preceding element
When benchmarking against competing methods, we note that SOTA sueprvised NKF method is most comparable. Our SSPU model matches NKF performance while using only one-fifth of the NKF MFLOP count. Our MSPU model further enhances all metrics and uses fewer MFLOPs. For our best-performing LSPUx2, we score dB ERLE, a dB improvement over NKF.
In perceptual metrics, our top-performing LSPUx2 model achieves dB in R-ERLE and a R-AEC-MOS of , while remaining real-time on a single CPU core.
Surprisingly, RTF scales non-linearly with MFLOPs and model size, showing untapped scaling potential.
Report issue for preceding element
TABLE II: Beaforming performance vs. computational cost.
Model
SI-SDR
SIR
SAR
STOI
MFLOPs
RTF
Mixture
-0.71
-
-
0.674
-
-
NLMSP
8.60
16.21
9.78
0.905
0.43
0.36
NLMSPU
8.84
16.54
10.00
0.910
0.47
0.47
RLSP
9.84
16.70
9.70
0.919
0.53
0.50
RLSPU
10.14
17.16
11.49
0.924
0.54
0.62
SUP
12.20
22.57
12.79
0.931
4.70
0.41
MUP
12.62
22.56
13.26
0.938
12.08
0.45
LUP
12.45
22.43
13.09
0.935
36.35
0.53
SSP
13.92
23.00
14.66
0.950
4.70
0.41
MSP
14.34
23.45
15.07
0.953
12.08
0.45
LSP
14.69
24.36
15.33
0.954
36.35
0.53
SSPU
15.46
25.69
16.09
0.956
4.74
0.51
MSPU
16.83
27.70
17.41
0.960
12.12
0.54
LSPU
17.22
28.37
17.80
0.962
36.39
0.62
SSPUx2
15.67
25.89
16.35
0.956
9.07
0.70
MSPUx2
17.06
28.42
17.63
0.961
23.83
0.76
LSPUx2
17.72
29.52
18.25
0.964
72.37
0.91
Report issue for preceding element
V-CGSC Beamforming Scaling and Benchmarking
Report issue for preceding element
Beamforming results are in Table II. Notably, SMS-AF improvements apply without any modifications. Here, all models assume access to a steering vector, which can be challenging to estimate in practice. Again, supervision and multi-step optimization yield significant performance gains. Our SSP model outperforms all baselines including LUP across all metrics. Our model scales reliably with the LSP variant improving performance in all metrics. Scaling up the iterations to SUPUx2 yields larger gains across all metrics. Our largest and best model, LSPUx2 scores a remarkable 17.72 dB SI-SDR while still being real-time. Of note, the LSPU model has the same RTF as RLSPU, even though RLS uses fewer operations.
Again, we show that SMS-AF performance scales with both model capacity and optimization steps per frame.
Report issue for preceding element
VI Conclusion
Report issue for preceding element
We introduce a method for a neural network-based adaptive filter optimizers called supervised multi-step adaptive filters (SMS-AF). We extend meta-adaptive filtering methods with several advances that combine to reliably increase performance by leveraging more computation.
We evaluate our method on low latency, online AEC and GSC tasks, compare against many baselines and test on both synthetic and real data.
SMS-AF improves both subjective and objective metrics, achieving dB ERLE/SI-SDR gains compared to prior work, and increases the performance ceiling across AEC and GSC.
Furthermore, we relate our work to the Kalman filter and meta-AFs, giving insight for many other applications.
We believe scaling-up AFs is a promising direction and hope our results encourage future work on scalable, general purpose AFs.
Report issue for preceding element
References
Report issue for preceding element
[1]↑
Bernard Widrow and Marcian E. Hoff,
“Adaptive switching circuits,”
Tech. Rep., Stanford University, 1960.
[2]↑
V. John Mathews,
“Adaptive polynomial filters,”
IEEE Signal Processing Magazine (SPM), 1991.
[3]↑
José Antonio Apolinário, José Antonio Apolinário, and R Rautmann,
QRD-RLS adaptive filtering,
Springer, 2009.
[4]↑
Simon S. Haykin,
Adaptive filter theory,
Pearson, 2008.
[5]↑
Lawrence R. Rabiner, Bernard Gold, and CK Yuen,
Theory and application of digital signal processing,
Prentice-Hall, 2016.
[6]↑
Ethan Caballero, Kshitij Gupta, Irina Rish, and David Krueger,
“Broken neural scaling laws,”
in International Conference on Learning Representations (ICLR), 2022.
[7]↑
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al.,
“Training compute-optimal large language models,”
arXiv preprint arXiv:2203.15556, 2022.
[8]↑
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie,
“A ConvNet for the 2020s,”
in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[9]↑
Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, and Taesung Park,
“Scaling up gans for text-to-image synthesis,”
in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[10]↑
Jonah Casebeer, Jacob Donley, Daniel Wong, Buye Xu, and Anurag Kumar,
“NICE-Beam: Neural integrated covariance estimators for time-varying beamformers,”
arXiv:2112.04613, 2021.
[11]↑
Thomas Haubner, Andreas Brendel, and Walter Kellermann,
“End-to-end deep learning-based adaptation control for frequency-domain adaptive system identification,”
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
[12]↑
Thomas Haubner and Walter Kellermann,
“Deep learning-based joint control of acoustic echo cancellation, beamforming and postfiltering,”
in IEEE European Signal Processing Conference (EUSIPCO), 2022.
[13]↑
Amir Ivry, Israel Cohen, and Baruch Berdugo,
“Deep adaptation control for acoustic echo cancellation,”
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
[14]↑
Behrad Soleimani, Henning Schepker, and Majid Mirbagheri,
“Neural-afc: Learning-based step-size control for adaptive feedback cancellation with closed-loop model training,”
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[15]↑
Dong Yang, Fei Jiang, Wei Wu, Xuefei Fang, and Muyong Cao,
“Low-complexity acoustic echo cancellation with neural kalman filtering,”
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[16]↑
Jonah Casebeer, Nicholas J. Bryan, and Paris Smaragdis,
“Auto-DSP: Learning to optimize acoustic echo cancellers,”
in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021.
[17]↑
Jonah Casebeer, Nicholas J. Bryan, and Paris Smaragdis,
“Meta-AF: Meta-learning for adaptive filters,”
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2022.
[18]↑
Junkai Wu, Jonah Casebeer, Nicholas J. Bryan, and Paris Smaragdis,
“Meta-learning for adaptive filters with higher-order frequency dependencies,”
in IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), 2022.
[19]↑
Jonah Casebeer, Junkai Wu, and Paris Smaragdis,
“Meta-af echo cancellation for improved keyword spotting,”
arXiv:2312.10605, 2023.
[20]↑
Marcin Andrychowicz, Misha Denil, Sergio Gómez Colmenarejo, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando de Freitas,
“Learning to learn by gradient descent by gradient descent,”
in NeurIPS, 2016.
[21]↑
Moritz Wolter and Angela Yao,
“Complex gated recurrent neural networks,”
in NeurIPS, 2018.
[22]↑
Gerald Enzner and Peter Vary,
“Frequency-domain adaptive Kalman filter for acoustic echo control in hands-free telephones,”
Elsevier Signal Processing, 2006.
[23]↑
Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Hannes Gamper, Sebastian Braun, Karsten Sorensen, and Robert Aichner,
“ICASSP 2022 acoustic echo cancellation challenge,”
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
[24]↑
Gerald Enzner, Herbert Buchner, Alexis Favrot, and Fabian Kuech,
“Acoustic echo control,”
in Academic press library in signal processing. Elsevier, 2014.
[25]↑
Jon Barker, Ricard Marxer, Emmanuel Vincent, and Shinji Watanabe,
“The third CHiME speech separation and recognition challenge: Dataset, task and baselines,”
in Automatic Speech Recongition and Understanding Workshop (ASRU). IEEE, 2015.
[26]↑
Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, and John R. Hershey,
“SDR–half-baked or well done?,”
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
[27]↑
Emmanuel Vincent, Rémi Gribonval, and Cédric Févotte,
“Performance measurement in blind audio source separation,”
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2006.
[28]↑
Cees H. Taal, Richard C. Hendriks, Richard Heusdens, and Jesper Jensen,
“An algorithm for intelligibility prediction of time–frequency weighted noisy speech,”
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2011.