Isolated Word Recognition Using LPC & Vector Quantization: M. K. Linga Murthy, G.L.N. Murthy
Isolated Word Recognition Using LPC & Vector Quantization: M. K. Linga Murthy, G.L.N. Murthy
Asst. Professor, ECE, Lakireddy Balireddy College of Engineering, Andhra Pradesh, India, [email protected], [email protected]
Abstract
Speech recognition is always looked upon as a fascinating field in human computer interaction. It is one of the fundamental steps towards understanding human recognition and their behavior. This paper explicates the theory and implementation of Speech recognition. This is a speaker-dependent real time isolated word recognizer. The major logic used was to first obtain the feature vectors using LPC which was followed by vector quantization. The quantized vectors were then recognized by measuring the Minimum average distortion. All Speech Recognition systems contain Two Main Phases, namely Training Phase and Testing Phase. In the Training Phase, the Features of the words are extracted and during the recognition phase feature matching Takes place. The feature or the template thus extracted is stored in the data base, during the recognition phase the extracted features are compared with the template in the database. The features of the words are extracted by using LPC analysis. Vector Quantization is used for generating the code books. Finally the recognition decision is made based on the matching score. MATLAB will be used to implement this concept to achieve further understanding.
Index Terms: Speech Recognition, LPC, Vector Quantization, and Code Book. -----------------------------------------------------------------------***----------------------------------------------------------------------1. INTRODUCTION
Speech is a natural mode of communication for people. We learn all the relevant skills during early childhood, without instruction, and we continue to rely on speech communication throughout our lives. It comes so naturally to us that we dont realize how complex a phenomenon speech. Speech recognition, or more commonly known as automatic speech recognition (ASR), is the process of interpreting human speech in a computer. A more technical definition is given by Jurafsky, where he defines ASR as the building of system for mapping acoustic signals to a string of words. He continues by defining automatic speech understanding (ASU) as extending the goal to producing some sort of understanding of the sentence. A second dimension of variation is how fluent, natural or conversational the speech is isolated word recognition, in which each word is surrounded by some sort of pause, is much easier than recognizing continuous speech A third dimension of variation is channel and noise. Commercial dictation systems, and much laboratory research in speech recognition, is done with high quality, head mounted microphones A final dimension of variation is accent or speaker-class characteristics. The objective of this paper is to recognize the isolated words spoken by the speaker. These results are very useful for implementing the recognition systems. The words or utterances are recorded by Microphone and are stored in work space, then processed using MATLAB signal processing toolbox. It involves pre-emphasis, frame blocking autocorrelation analysis, LPC Analysis and Vector quantization.
1.1 Challenges
The general problem of automatic transcription of speech by any speaker in any environment is still far from solved. But recent years have seen ASR technology mature to the point where it is viable in certain limited domains.
1.2 Difficulties
One dimension of variation in speech recognition tasks is the vocabulary size.
Here we want to extract spectral features of entire utterance or conversation, but the spectrum changes very quickly. Technically, we say that speech is a non-stationary signal, meaning that its statistical properties are not constant across time. Instead, we want or extract spectral features from a small window of speech that characterizes a particular subphone and for which we can make the assumption that the signal is stationary. this is done by using a window which is non-zero inside some region and zero elsewhere, running this window across the speech signal, and extracting the waveform inside this window. A more common window used in feature extraction is the Hamming window, which shrinks the values of the signal toward zero at the window boundaries, avoiding discontinuities.
Where the highest autocorrelation value, p, is the order of the LPC analysis. Typically, values of p from 8 to 16 have been used, with p = 10 being the value used for this systems. A side benefit of the autocorrelation analysis is that the zeroth autocorrelation, , is the energy of the frame.
2.6
LPC
parameter
conversion
to
Cepstral
coefficients:
A very important LPC parameter set, which can be derived directly from the LPC coefficients set, is the LPC Cepstral Fig2.2 blocking of speech into overlapping frames. Typical values for N and M are 256 and 128 when the sampling rate of the speech is 6.67 kHz. These correspond to 45-msec frames, separated by 15-msec, or a 66.7-Hz frame rate. coefficients, . The recursion method is used. The Cepstral coefficients, which are the coefficients of the Fourier transform representation of the log magnitude spectrum, have been shown to be a more robust, reliable feature set for speech recognition than the LPC coefficients, the PARCOR coefficients, or the log area ratio coefficients.
Fig 4.1 Comparisons for isolated word recognition system for two speakers
CONCLUSION:
Isolated Word Recognition using Linear Predictive Coding and Vector Quantization provides basic idea for implementing for Speech Recognition for Isolated Words. The Vector Quantized based speech recognition is very simple method. In this method by using LPC analysis extracts the features of given words & vector quantization is used for feature matching in Speech Recognition. Fig 4.1 A Vector Quantized based speech recognition system It is the pattern recognition based approach to speech recognition, we see that the M codebooks are analogous to M (sets of) reference patterns (or templates) and the dissimilarity measure is defined according to Eq. (4.1) and Eq. (4.2), where no explicit time alignment is required. In the successful implementation the results were found to be satisfactory considering less number of training data. The accuracy of the real time system can be increased significantly by using an improved speech detection/noise elimination algorithm.
REFERENCES:
[1]. L.R.Rabiner and B.H.Juang, Fundamentals of speech recognition, Prentice Hall (Signal Processing series) 1993. Richard O.Duda, Peter E.Hart, David G.Stork,Pattern Classification, John Wiley & Sons (ASIA) Pte Ltd. Y.Linde ,A.Buzo and R.M.Gray, An algorithm for vector quantizer design ,IEEE Trans .COM28,January 1980 . Mayukh Bhaowal and kunal chawla , Isolated word recognition for English language using LPC,VQ & HMM , students of IIT, Allahabad, India. Poonam Bansal , Amita dev and Shail Bala Jain Automatic speaker Identification using VQ , Medwell journals,6(9) : 938 942 ,2007. Lawrence R. Rabiner Applications of speech recognition in the area of Telecommunications, AT&T Labs Florham Park, New Jersey 07932, 07803-3698-4/97/$10.00 0 1997 IEEE L. R. Rabiner, Applications of Voice Processing to Telecommunications, Proc. IEEE, Vol. 82, No. 4, pp. 199-228, Feb. 1994.
[2].
[3].
[4].
[5].
4. SYSTEM IMPLEMENTATION:
The training set for the vector quantizer was obtained by recording the utterances of set isolated words .The words are recorded for a two different speakers. The recognition vocabulary consisted of the names (Forward, Back, Left, Right, and Stop). Here each word is applied for 10 times .The results obtained are shown in the table below.
[6].
[7].
[9].
[10].
[11].
BIOGRAPHIES:
Mr. M. K. Linga Murthy is currently working as Asst. Professor in ECE in LBRCE, Mylavaram. He completed his B.Tech in the year 2001 at SJCET, Yemmiganur. He completed his M.Tech in the year 2008 at MITS, Madanapalle. He has over 06 years of teaching experience & his research an area is signal processing.
Mr. G.L.N Murthy is currently working as Asso. Professor in ECE in LBRCE, Mylavaram. He completed his B.Tech in JNTU, Anantapu. He completed his M.Tech in JNTU, Anantapur. He pursuing his PhD in SVU Tirupathi, he has over 12 years of teaching experience & his research an area is signal processing