Formant Tracking Using LPC Root Solving
Formant Tracking Using LPC Root Solving
Formant Tracking Using LPC Root Solving
Motivation
The application of formants is useful in different applications such as speech recognition, enhancement, noise reduction, hearing aid adaptive filters, etc.
Formants
Formants are defined as the spectral peaks of the sound spectrum of the voice. In speech science and phonetics, formant is also used to mean an acoustic resonance of the human vocal tract.
Algorithm
S[n]
Pre-emphasis Filter
In order to improve the overall SNR ratio in the given band of frequencies, the magnitude of the higher frequencies are increased w.r.t the magnitude of the usually lower frequencies. In this algorithm we use a common method of pre-emphasizing such as filtering the speech signal with the help of HPF (High Pass Filter). Using this, the above mentioned glottal waveform and radiation load contribution is removed and the energy is redistributed to approximately all the frequency in the given band region. A pre-emphasis High Pass Filter would be given by H[n] =H[n] a1*H[n-1]
The conversion of a real signal into an analytical signal has many advantages and the main advantage while dealing with the adaptive filter banks is that the analytic signal forms a complex signal for corresponding filtering. Sc[n] = SR[n] + j*SH[n] Where, Sc[n] is the analytic signal, SR[n] is the real signal, SH[n] is the Hilbert Transform
Algorithm
S[n]
The Adaptive Band-pass filter suppress interference from neighboring formant frequencies while tracking an individual formant frequency as it varies with time. Hence, it tracks only a single formant frequency. Adaptive Band-pass filter consists : 1) All Zero Filters (AZF) 2) Dynamic Tracking Filter (DTF)
The AZF in each formant filter is the Adaptive All Zero Filter whose three zero locations are always set to the value of the previous formant frequency estimated from the other three formant filters. The Filters Transfer Function is:-
The value of Kk[n] ensures that the gain is unity and there is zero phase lag at the estimated formant frequency of the kth component. There is an additional zero which is present at the location of the pitch estimate and to suppress the pitch effect the zero is included in the filter.
Algorithm
S[n]
The Dynamic Tracking Filter (DTF) in each formant filter is a single pole dynamic tracking filter for which the pole location is always set to the previous value of the formant estimate. The transfer function of the kth DTF at index n is:
Algorithm
S[n]
This detector checks if the initial window frame speech signal considered is the voiced part of the signal. This is done by finding the pitch period of the signal window by finding its autocorrelation. This pitch period would lie in the range of 4ms to 9 ms for male and female speaker.
Energy Detector
After the speech signal is filtered using the adaptive band-pass filter-bank, energy of the signal in that window frame is calculated. The energy of that formant band must be higher than a specified energy threshold value. The LPC root solving is only done if both minimum energy criteria and that particular window frame belongs to the voiced part of the speech
Linear Prediction analysis provides a good approximation to the vocal track spectral envelope especially to the voiced region of speech where all pole model of LPC is used. During unvoiced transient region of speech, this LPC model is less effective than for voiced regions and but still provides acceptable results. The Linear Predication method can be stated as finding the coefficients ak which results in the best prediction i.e. which minimizes the mean-squared prediction error of the speech sample s[n] in terms of the past samples s[n-k] The Linear predictor of order p is: E[n] = S[n] -
Moving Average
The Moving Average computes the Moving average of each formant frequency and assigns the estimated value of Moving Average if the segment is unvoiced or the energy of the formant frequency is below the threshold value. In all the other cases when the energy is above a threshold value and the speech being voiced, the estimated value of the formant from the LPC analysis is assigned. The Formant assigns Moving average of the formant frequency is given by:
Results
The above discussed algorithm has been applied over the speech .wav files. Formant Tracker performance for the database speech signal
Formant Tracker performance on the database speech signal with a background noise of SNR of 40dB
Discussion
As the adaptive filter is used with initial values of formant frequencies, the outputs also depend on these specific initial values given. So, in few cases when the actual formant frequency does not lie near the initial formant frequency given as input, we would be few more poles and zeros rather than removing those. Although, it is found that possibility of such cases are rare. Difficulty also arises if background noise or a sudden change in the formant frequencies causes the tracker to wander far away from the true formant values. Hence, it was necessary to place limit on the frequency range allowable for each formant.
References
[1] Bruce, Ian C., et al. "Robust formant tracking in noise." Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on. Vol. 1. IEEE, 2002. [2] A. Rao and R. Kumaresan, On decomposing speech into modulated components, IEEE Trans. Speech Audio Processing, vol. 8, no. 3, pp. 240254, May 2000.
[3] Poonam Jindal, Algorithms for tracking formant frequencies of a continuous speech with speaker variability, Thesis.
[4] Snell, Roy C., and Fausto Milinazzo. "Formant location from LPC analysis data." Speech and Audio Processing, IEEE Transactions on 1.2 (1993): 129-134.
Demo on Matlab!!!
Thank you
Questions