Multi-Channel Speech Enhancement

Multi-channel speech
enhancement
Chunjian Li
DICOM, Aalborg University
3/24/2006
Lecture notes for Speech Communications
Methods & applied fields
Dual-channel spectral subtraction

- noise reduction in speech
Adaptive Noise Canceling (ANC)

- noise reduction and interference elimination
- echo canceling
- adaptive beamforming
Blind Source Separation (BSS)

Blind Source Extraction (BSE)
3/24/2006
Lecture notes for
Dual-channel spectral
subtraction
- Hanson and Wong, ICASSP84.
3/24/2006
Lecture notes for
The method
The exponent is chosen to be a=1 based on

listening test and spectral distortion
measure.
The noisy phase is used in the
reconstruction of signal.
The estimate of noise spectrum is either
obtained from a reference channel or
estimated from the noisy signal assuming
the SNR is very low (about -12 dB).
3/24/2006
Lecture notes for
Revisiting the phase issue

To see the dependency of magnitude on phase:
S ( f ) S ( f ) N ( f ) N ( f )
S ( f )
S( f )
1
a
e j ( f )
S( f )
2
N ( f ) 1
N( f )
N ( f )
a
2
cos()
1
a
N ( f )
a
where is the phase difference between the two signals.

It is clear that the estimate of signal magnitude spectrum depends
on both the SNR and the phase difference. But phase is not estimated
in this method because the enhanced quality is acceptable.
3/24/2006
Lecture notes for
Comments
The simplest (and a bit unrealistic)

form of exploiting multi-channel.
Aims at improving intelligibility.
Significant intel. gains only at very low
SNR (-12dB).
Unvoiced speech is not processed.
3/24/2006
Lecture notes for
Adaptive Noise Canceling
First proposed by Widrow et al. [1] in 1975.

It is adaptive because of the use of adaptive
filter such as the LMS algorithm.
The objective: estimate the noise in the
primary channel using the noise recorded in
the secondary channel, and subtract the
estimate from the primary channel
recordings.
[1] B. Widrow, J. R. Grover, J. M. McCool et al. Adaptive noise canceling:
Principles and applications, Proceedings of the IEEE, vol.63, pp. 1692-1716,

Dec. 1975.
3/24/2006
Lecture notes for
Signal model
3/24/2006
Lecture notes for
Signal estimation
The estimated signal:
s( n) y (n) d1 (n)
M 1
d1 (n) h(i )d 2 (n i )
i 0
The optimization criterion:
h arg min y (n) h(i )d 2 (n i )

h
i 0
M 1
3/24/2006
Lecture notes for
Signal estimation
The minimization can be solved by applying the orthogonality principle:
M
ryd 2 ( ) h(i )rd 2 ( i ) 0

i 0
This can be solved in the same way as solving the normal equations.
But it is usually solved by sequential algorithms such as the LMS
algorithm. The advantages of the LMS are:
-No matrix inversion, low complexity
-Fully adaptive, suitable to non-stationary signal and noise
-Low delay
3/24/2006
Lecture notes for
LMS
-It is a sequential, gradient descent minimization method,
- The estimate of the weights is updated each time a new sample
is available:
h k h k 1 k g
Where the element of the gradient vector:
g
3/24/2006
( )
M 1
2 ryd 2 ( ) h(i )rd 2 ( i )

h( )
i 0
Lecture notes for
LMS
Or, in matrix form:
g 2(ryd 2 R d 2 h )
The most important trick is, in this sequential implementation, to
approximate the correlation matrix and cross-correlation vector by
The instantaneous estimates.
d dH 2
R
d2
2
ryd 2 d 2 y (n)
3/24/2006
Lecture notes for
LMS
The step size is often chosen empirically, as long as the following
condition is satisfied for stability reason:
max
where max is the largest eigenvalue of the matrix R d 2

The larger the step-size, the faster the convergence, but also the
larger estimation variance.
3/24/2006
Lecture notes for
Comments
The LMS belongs to the stochastic gradient

algorithm.
The algorithm is based on the instantaneous
estimates of correlation function, which are of high
variance. But the algorithm works well because of
its iterative nature, which averages the estimate
over time.
Low complexity: O(M), where M is the filter order.
Although the derivation is based on WSS
assumption, the algorithm is applicable to stationary
signals, due to the sequential implementation.
3/24/2006
Lecture notes for
Implementation issues of ANC
Microphones must be sufficiently separated in

space or contain acoustic barriers.
Typically 1500 taps are needed => large
misadjustment => pronounced echo => must use
small step-size => long convergence time.
Different delays from the sources to the two
microphones must be taken care of.
Frequency domain LMS can reduces the number
of taps needed.
ANC can be generalizes to a multi-channel system,
which can be seen as a generalized beamforming
system.
3/24/2006
Lecture notes for
Eliminating cross-talk
Cross-talk: If the signal is also captured in the reference channel, the ANC
will suppress part of the signal. Cross-talk can be reduced by employing
two adaptive filter within a feedback loop.
3/24/2006
Lecture notes for
Beamforming
Compared to ANC, beamforming is

truly a spatial filtering technique.
First, locate the source direction; then
form a beam directing to the source.
The source location problem is a
analogy of the spectral analysis
problem, with the frequency domain
replaced by the spatial domain.
3/24/2006
Lecture notes for
A simple array model
Planar wave
Uniform linear array
Sensors responses are identical and
LTI
Sensors are omni directional
One parameter to estimate: DOA
3/24/2006
Lecture notes for
ULA
3/24/2006
Lecture notes for
ULA
The signal model:
y (t ) a( ) s (t ) e(t )
where the array transfer vector :
a( ) 1 e
jc 2
... e
jc m T
Where m is the delay with reference to the first sensor, and c is the
center frequency of the signal. By defining the spatial frequency as:
s c
d sin
c
we can write the array transfer vector as:
a( ) 1 e
3/24/2006
j s
... e
j ( m 1) s T
Lecture notes for
ULA
A direct analogy between frequency

analysis and spatial analysis using the
spatial frequency.
To avoid spatial aliasing:
d /2
All frequency analysis techniques can be
applied to the DOA estimation problem.
3/24/2006
Lecture notes for
Spatial filtering
Analogy between spatial filter and temporal filter
3/24/2006
Lecture notes for
Spatial filtering
The spatially filtered signal: x(t ) h*a( )s(t )

Objective: find the filter that passes
undistorted the signals with a given DOA;
and attenuates all the other DOAs as
much as possible.
min h*h subject to h*a( ) 1
h
3/24/2006
Lecture notes for
The beam pattern
3/24/2006
Lecture notes for
Restrictions to beamforming
Very sensitive to array geometry, need good

calibration
Has only directivity, no selectivity in range or
other location parameters
Frequency response is not flat
Ambient noises are assumed to be spatially
white
Beam width (or selectivity) depends on the
size of the array
Spatial aliasing problem
3/24/2006
Lecture notes for
Blind Source Separation (BSS)
MIMO systems
Spatial processing techniques with no
knowledge of array geometry
Invisible beam
Arbitrarily high spatial resolution
Do not depend on signal frequency
Spatial noise is not assumed to be white
Not a spatial sampling system
3/24/2006
Lecture notes for
Solutions to BSS
Independent Component Analysis

(ICA) [2]
Independent Factor Analysis (IFA) [3]
[2] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, Inc. 2001
[3] H. Attias, Independent factor analysis, Neural Computation, 1999.
3/24/2006
Lecture notes for
Independent component
analysis (ICA)
Instantaneous mixing
The number of sensors is greater than
or equal to the number of sources
No system noise
The sources (components) are
independent of each other
The sources are non-Gaussian
processes
3/24/2006
Lecture notes for
ICA model
Cocktail party problem. Three sources, three sensors:
x1 (t ) a11 s1 (t ) a12 s2 (t ) a13 s3 (t )
x (t ) a s (t ) a s (t ) a s (t )
22 2
23 3
2 12 1
x3 (t ) a31s1 (t ) a32 s2 (t ) a33 s3 (t )
Or, in matrix form
x As
Neither s nor A are known. Can not be solved by linear algebra.
If the sources are independent non-Gaussian, the A matrix can
be found by maximizing the non-Gaussianity of the sources.
3/24/2006
Lecture notes for
Contrast function
An iterative gradient method. First initialize the A matrix.
If the mixing matrix A is square and non-singular, move it to the left:
A 1x s
Calculate the non-Gaussianity of s, and find the next estimate of A that
gives a higher non-Gaussianity. Iterate until convergence.
The contrast function is the objective function to maximize or minimize.
3/24/2006
Lecture notes for
Maximizing non-Gaussianity
Non-Gaussian is independent
Measuring non-Gaussianity
- by kurtosis
- by negentropy
3/24/2006
Lecture notes for
ICA methods
ICA by maximizing non-Gaussianity

ICA by Maximum Likelihood
ICA by minimizing mutual information
ICA by nonlinear decorrelation
3/24/2006
Lecture notes for
Extensions to ICA
Noisy ICA
ICA with non-square mixing matrix
Independent Factor Analysis
Convolutive mixture
Methods using time structure
3/24/2006
Lecture notes for
Blind Source Extraction
Only interested in one or a few

sources out of many (feature
extraction)
Save computation
Dont know the exact number of
sources
3/24/2006
Lecture notes for
BSE
D. Mandic and A. Cichocki, An Online Algorithm For Blind Extraction Of Sources With Different
Dynamical Structures.
3/24/2006
Lecture notes for

Multi-Channel Speech Enhancement

Uploaded by

Copyright:

Available Formats

Multi-Channel Speech Enhancement

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multi-Channel Speech Enhancement

Uploaded by

Copyright:

Available Formats

Multi-channel speech

Lecture notes for Speech Communications

Methods & applied fields

Dual-channel spectral subtraction

Adaptive Noise Canceling (ANC)

Blind Source Separation (BSS)

Lecture notes for

Lecture notes for

The exponent is chosen to be a=1 based on

Lecture notes for

Revisiting the phase issue

where is the phase difference between the two signals.

Lecture notes for

The simplest (and a bit unrealistic)

Lecture notes for

Adaptive Noise Canceling

First proposed by Widrow et al. [1] in 1975.

[1] B. Widrow, J. R. Grover, J. M. McCool et al. Adaptive noise canceling:

Principles and applications, Proceedings of the IEEE, vol.63, pp. 1692-1716,

Lecture notes for

Lecture notes for

The optimization criterion:

h arg min y (n) h(i )d 2 (n i )

Lecture notes for

ryd 2 ( ) h(i )rd 2 ( i ) 0

Lecture notes for

2 ryd 2 ( ) h(i )rd 2 ( i )

Lecture notes for

Lecture notes for

where max is the largest eigenvalue of the matrix R d 2

Lecture notes for

The LMS belongs to the stochastic gradient

Lecture notes for

Implementation issues of ANC

Microphones must be sufficiently separated in

Lecture notes for

Lecture notes for

Compared to ANC, beamforming is

Lecture notes for

A simple array model

Lecture notes for

Lecture notes for

we can write the array transfer vector as:

Lecture notes for

A direct analogy between frequency

Lecture notes for

Analogy between spatial filter and temporal filter

Lecture notes for

The spatially filtered signal: x(t ) h*a( )s(t )

Lecture notes for

The beam pattern

Lecture notes for

Very sensitive to array geometry, need good

Lecture notes for

Blind Source Separation (BSS)

Lecture notes for

Independent Component Analysis

Lecture notes for

Lecture notes for