LPC and LPCC Method of Feature Extraction in Speech Recognition System

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

LPC AND LPCC METHOD OF FEATURE

EXTRACTION IN SPEECH RECOGNITION


SYSTEM
Harshita Gupta1, Divya Gupta2
Department of Computer science and engineering
Amity University Uttar Pradesh
Noida, India
[email protected] 1, [email protected]

Abstract— Automatic speech recognition (ASR) has been This paper is structured as follows. In section 2 the speech
under the scrutiny of researchers for many years. Speech recognition system is described. Section 3 the concept of
Recognition System is the ability to listen what we speak, feature extraction is introduced. Section 4 includes detailed
interpreter and perform actions according to spoken theory of LPC and LPCC. Section 5 introduced the relative
information. After so many detailed study and analysis of LPC and LPCC. Section 6 includes the conclusion
optimization of ASR and various techniques of features based on the comparison between LPC and LPCC and future
extraction, accuracy of the system is still a big challenge. scope.
The selection of feature extraction techniques is completely
based on the area of study. In this paper, a detailed theory II. SPEECH RECOGNITION SYSTEM
about features extraction techniques like LPC and LPCC Automatic speech recognition uses speech signals for
is examined. The goal of this paper is to study the analyzing and processing and converting them into a series of
comparative analysis of features extraction techniques like lexis through a method employ as a computer program. In
LPC and LPCC. simple words, it is process of translation of spoken words into
text. The components of a speech recognition system are
General Terms — Performance; Speech; Technology; Accuracy Feature Extraction, Decoder, Pronunciation Models, Acoustic
Models, and Speech Input.
Keywords —Speech Recognition; Automatic Speech
Recognition; Feature Extraction; LPC; LPCC
a. Voice Input: Audio as input to the system is recorded by
microphone.
b. Decoder: The decoding process is performed to discover
I. INTRODUCTION the collections of words and a best match is obtained using
The concept of human interaction with technology led to knowledge base. The tangible resolution regarding
interest with subsequent research in Speech Recognition and recognition of speech utterance by optimizing and
With the speedy development of computer software and combining the data that is convey by the language and
hardware, the speech recognition technology is used in broad acoustic models is evaluated by decoder.
range of applications like mobile phone applications, video c. Acoustic Models: After so many years of study, the
games, medical like deaf telephony, weather forecasting, precision of speech recognition system still remain the most
agriculture, communication technology, robotics, national significant topic. Accuracy of the speech system is
defense, education and automatic translation. The aim of determined by many of the recognized factors. In improving
researchers is to invent a computational technique, on the the accuracy of the system, Acoustic model acts a very
basis of which, the system can help in differentiating among significant role.
the various words spoken by different speaker having unique Two different types of acoustic models are phoneme and
accents with different environmental conditions. word model and are implemented by the help of various
Usually the speech recognition system delivers two type of methods like HMM, support vector machines (SVM),
important information: Voice recognition and speaker content dynamic Bayesian networks (DBN) and ANNs.
[10][34][15].Voice recognition is refers to speaker d. Language Models: Speech recognition is an application
identification rather than what they are saying and can be used of natural language processing which attempt to capture the
to validate the distinctiveness of speaker for security process. concepts of language modeling. These models helped to
Speaker recognizers focus on extracting the useful information direct the correct word sequence by predicting the possibility
which is independent of speaker identification. of nth word using (n-1) preceding words.

978-1-4673-8203-8/16/$31.00 2016
c IEEE 498
e. Pronunciation Models: In speech recognition system, the The technology evolved was
term pronunciation means what the speech engine thinks statistical knowledge of language
about how these words should sound like or pronounced that 1991-2000 and acoustic models. Vocal Tract
Length Normalization, HMM
is their phonetic representation. Multiple pronunciations can
Toolkit are developed.
be associated with single word which makes the system
difficult to recognize them well. 2001-2010
Concatinative Synthesis, Machine
Learning, Mixed –intiative Dialog
The basic system of speech recognition is summarized in
After the years of evolution, the
Figure1. technology of speech recognition
2010-present has established a landmark in
marketplace by helping the users
in number of ways.

III. FEATURE EXTRACTION


Feature extraction acts a very crucial task in speech
recognition process and drawing out of valuable data from
sample speech is been a crucial part of research for so many
years. The main purpose of this effort is to find out the
performance level of different Features like MFCC, LPC, PLC
etc. and the selection of the same acts as a significant role in
accuracy of recognition rate which is the most important
criteria to be an effective speech recognition system.

Feature extraction helps in differentiate among one speech to


another. It transform the unprocessed speech signal into
condensed but effective demonstration which is more stable
than the original one from which it is possible to reconstruct
the original signal from it. In this paper different existing
techniques are studied like MFCC, MMFCC and a
comparative study of LPC and LPCC is done for speech
Fig.1. Speech Recognition System [8] [32] enhancement.

IV. FEATURE EXTRACTION TECHNIQUES


Advancement in Speech Recognition including Feature
Extraction and Speech Matching is shown below in Table1. A. Linear Predicative Coding (LPC)
1. Introduction
TABLE I. BACKGROUND OF SPEECH RECOGNITION [9, 14, 23] The most popular approach for modeling human voice
Years Name Of Evolution production is Linear Predicative Coding (LPC) which
The Researchers of Bell Labs
performs well in clean environment but not so in noisy. The
developed Digit Recognizer which parameters for speech signals are: Pitch Period, Speech Frame
1950-1960
uses formant as speech but was Energy, Formant, Short Time Spectrum and bandwidth
restricted to single-speaker [17].LPC is a very important feature used in auditory and
systems with vocabularies of
around ten words.
processing of speech signal which draws out parameters of
speech like spectra and pitch formants. LPC is also known as
A range of 100-1000 words called temporal approach which is invented to make it equivalent to
medium vocabularies and the resonant structure of human vocal tract that developed the
Dynamic Time Warping and subsequent sound. The block diagram of LPC is depicted in
1960-80 Cepstral analysis was evolved.
Pattern recognition and analysis of Figure2.
feature extraction methods like Steps for the computation of LPC are as follows:
LPC is introduced. a. Pre-Emphasis: Firstly the analysis of speech sample is
done by passing it through a filter with the goal of spectrally
In 1980s range up to 1000-
unlimited known as large flatten the signal and to make it less prone to precision
1980-1990 vocabularies was introduced and effect. The coefficient of filter should be between 0.9 and 1.
feature extraction methods like b. Framing: After pre-emphasis step, the resulting speech
MFCC, PLP was developed. are divided into frames consists of M samples each of 20 to
40 second. There is a standard overlap of 10ms between two
adjacent frames to ensure stationary between frames.

2016 6th International Conference - Cloud System and Big Data Engineering (Confluence) 499
c. Windowing: In this, the resulting frames are multiplied b. LPCC features are calculated by introducing the cepstrum
by hamming window with the purpose of minimizing the coefficients (CCs) in the LPC parameters. If we are able to
edge effect. find the predicator coefficient vector, it become easy to
d. Computing the LPC: In the final step, the method of calculate cepstrum coefficients. With the help of recursion
auto-correlation is functional upon the frames of the LPC parameters are converted into CC.
windowing speech sample. Maximum Autocorrelation
value is when analysis of the order of LPC is evaluated.

Fig.3. Block diagram of LPCC [31]

Fig.2. Block diagram of LPC [10, 25]


2. Methodology
LPCC is derived from the LPC coefficient [29] which is the
2. Methodology first step to obtain LPCC. In other words, LPCC stands for
LPC is the most powerful speech analyzing method and Linear Predicated Coefficients in cepstrum domain [19]. The
speech sample for the current time can be approximated as a concept of this technology is that one speech sample at any
linear sequence of past speech samples is the basic concept current time can be taken as the linear sequence of past speech
behind LPC[33]. The methodology behind the usage of LPC is samples.
to reduce the squared difference between original and The input speech signal is gone through the first phase of pre-
estimated speech signal at a finite time [33]. emphasized including widowing and framing using a first
order high filter. As the energy is contained more in lower
Types of LPC [33]: frequencies rather than high .To boost the energy in higher
Different types of LPC are as follows: frequencies, pre-emphasis of signal is done.
 Coded-Excited LPC(CELP)
 Pitch Excitation LPC V. COMPARATIVE STUDY OF LPC AND LPCC
 Residual Excitation LPC
 Voice Excitation LPC A. Based on Advantages and Disadvantages
 Multiple Excitation LPC(MPLPC) TABLE II. COMPARISION OF MERITS AND DEMERITS

B. Linear Predictive Cepstral Coefficients (LPCC) Feature


1. Introduction Extraction Merits Demerits
Method
Assumption of LPCC is nature of the sound being produced is
governed by the shape of the vocal tract. The common
parameters of speech signal like Pitch Period, Speech Frame Powerful Method
Energy, and Formant and to estimate them, LPCC has become for Feature LPC is not able to capture the
Extraction as able to unvoiced and nasalized sound
one of the most important features. characterizes the accurately.
The aim of the feature extraction is to showcase speech signal vocal tract well.[4]
through finite numbers of measures of signal. With the help of
LPC
LPC, we can derive the LPCC coefficients which is further
translate into cepstral coefficients. LPCC is obtained by the Effective
Technique for
method called autocorrelation. The block diagram of LPC is computation as it Poor performance of LPC in
depicted in Figure3. permit encoding of noisy environment
Steps for the computation of LPC are as follow: better speech quality
a. All the steps mentioned above in computation of LPC are at a very low bit.[4]
followed.

500 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)
like LPC and LPCC. The most prominent discovery of this
Accurate study concludes that the LPC is good when it comes to
calculations of Not good for representing
estimate the speech parameters precisely and the LPCC has
speech parameters speech as it considers single
can be obtained stationary within the given good reliability and robustness. The process involved in
with the help of frame extracting these features is same except that the LPCC include
LPC an extra step of conversion of LPCC coefficients cepstral
coefficients to obtain LPCC. Both the techniques reduced their
Well-known for the performance when it comes in contact to noisy environment.
good performance
LPCC is highly sensitive to But today, hybrid approaches are future implementation used
and relative
the quantization noise in improving efficiency of the ASR. In future, the speech may
simplicity.
LPCC become speech understanding.
.
LPCC is much more
In case of insufficient order
robust and reliable
being used, the performance
than LPC
of LPCC is degraded [16].
REFERENCES
[1] Ruchismita Tripathy, Hrudaya Kumar Tripathy, “Unalike
Methodologies of Feature Extraction & Feature Matching in Speech
Recognition”, published in,2014.
B. Based on Computational Steps
[2] M.A.Anusuya, S.K.Katti, "Speech Recognition by Machine: A
The major steps like pre-processing containing frame-block Review,"(IJCSIS) International Journal of Computer Science and
and Windowing, Autocorrelations analysis, LPC analysis are Information Security, Vol. 6, No. 3, 2009, pp.181-205.
same except that LPCC includes one extra step after all these [3] Paul A.K., Das D., Kamal M.M.,2009.“Bangla Speech Recognition
that is LPC parameter conversion. So we can conclude they System Using LPC and ANN”, Seventh International Conference on
both can be derived from same procedure. Advances in Pattern Recognition, IEEE Xplore, (Kolkata, Feb. 4-6
2009), 171 –174.
[4] Sonia Sunny, David Peter S and K Poulose Jacob, “A Comparative
Study of Parametric Coding and Wavelet Coding Based Feature
Extraction Techniques in Recognizing Spoken Words”, CUBE
2012,Published in September 3–5, 2012, Pune, Maharashtra, India.
[5] Thiang , Suryo Wijoyo,,. “Speech Recognition Using Linear Predictive
Coding and Artificial Neural Network for Controlling Movement of
Mobile Robot”, Proc. of. Int. Conf. on Information and Electronics
Engineering, IPCSIT vol.6, (IACSIT Press, Singapore), 2011.
[6] C. Demiroglu and T. Barnwell, “A Missing-Data Approach to Noise-
Robust LPC Extraction for Voiced Speech Using Auxiliary Sensors”,
Published in ICASSP 2005.
[7] L. S. Chee, Ooi Chia Ai, and S. Yaacob, "Overview of Automatic
Stuttering Recognition System," in International Conference on Man-
Machine Systems (ICoMMS 2009) Penang, Malaysia, 2009.
[8] Preeti Saini , Parneet Kaur, “Automatic Speech Recognition: A
Review”, Published in International Journal of Engineering Trends and
Technology- Volume4Issue2- 2013.
[9] Andre Gustavo Adami, "Automatic Speech Recognition: From the
Beginning to the Portuguese Language," Tutorial paper published in
proceedings of 9th International Conference on Computational
Processing of the Portuguese Language, April 27-30, Porto AlegrelRS
,Brazil. www.inf.pucrs.br/-propor20 1 O/proceedings Itutorialsl Adami.
pdf.
[10] Nidhi Desai , Prof.Kinnal Dhameliya , Prof.Vijayendra Desai, “Feature
Extraction and Classification Techniques for Speech Recognition: A
Review”, Published in International Journal of Emerging Technology
and Advanced Engineering (ISSN 2250-2459, ISO 9001:2008 Certified
Journal, Volume 3, Issue 12, December 2013).
[11] Anusuya, M. A., & Katti, S. K., “Front end Analysis of Speech
Recognition: A review”, International Journal of Speech Technology,
Springer, vol.14, pp. 99–145, 2011.
[12] S.J.Arora and R.Singh, “Automatic Speech Recognition: A Review,
Fig.4. Combined block diagram of LPC and LPCC “International Journal of Computer Applications, vol60-No.9, December
2012.
[13] Utpal Bhattacharjee, “A Comparative Study of LPCC and MFCC
Features for the Recognition of Assamese Phonemes”, International
VI. CONCLUSION AND FUTURE SCOPE Journal of Engineering Research & Technology (IJERT) Vol. 2 Issue 1,
After the detailed study of many research papers, the January- 2013 ISSN: 2278-0181.
technologies are trying to achieve better accuracy for ASR .In [14] Jesse C. Hansen "Modulation Based Parameters for Automatic speech
this paper, we have presented comparative analysis of Features Recognition," University of Rhode Island. M.tech Thesis (2003).

2016 6th International Conference - Cloud System and Big Data Engineering (Confluence) 501
[15] Wei Hong, Zhou Hao, Yang Jian., “Speaker Recognition Based on the [27] A.N.Mishra, S.Mishra, M.Chandra, and S.N.Sharan, “Speech
Combination of Vector Quantization Parameter Method” Journal of Recognition Using Linear Prediction Based Features” National Seminar
Yunnan University, 24 2 , pp: 96-100, 2002. on Devices, Circuits &Communication, Mesra, Ranchi, India, November
[16] Mr. P, Kumar , Dr. S. L. Lahudkar, “Automatic Speaker Recognition 6-7, 2008.
using LPCC and MFCC”, International Journal on Recent and [28] Mostafa Hydari, Mohammad Reza Karami, Ehsan Nadernejad, “Speech
Innovation Trends in Computing and Communication ISSN: 2321-8169 Signals Enhancement Using LPC Analysis based on Inverse Fourier
Volume: 3 Issue: 4, April 2015. Methods”, Contemporary Engineering Sciences, Vol. 2, 2009, no. 1, 1 –
[17] Xinxing Jing, Jinlong Ma, Jing Zhao, Haiyan Yang, “Speaker 15.
Recognition Based on Principal Component Analysis of LPCC and [29] Usha Sharma, Sushila Maheshkar and A.N.Mishra, “Study of Robust
MFCC”, published in 2014 IEEE. Feature Extraction Techniques for Speech Recognition System”, 1st
[18] Zhujianchen, Liuzengli, “Analysis of hybrid feature research based on International conference on futuristic trend in computational analysis
extraction LPCC and MFCC”, 10th International Conference on and knowledge management (ABLAZE 2015),2015.
Computational Intelligence and Security, 2014. [30] A.N.Mishra,M.C.Shrotriya and S.N. Sharan,“Comparative Wavelet, PLP
[19] M.G.Sumithra ,A.K.Devika, “A Study on Feature Extraction Techniques and LPC Speech Recognition Techniques on the Hindi speech digits
for Text Independent Speaker Identification”, 2012 International Database”, ICDIP Singapore, 2010.
Conference on Computer Communication and Informatics (ICCCI - [31] N. S. Nehe and R. S. Holambe, “New Feature Extraction Methods Using
2012), Jan. 10 – 12, 2012, Coimbatore, india. DWT and LPC for Isolated Word Recognition”, conference,11/2008.
[20] Jayanna H S, Mahadeva Prasanna S R. "Analysis, Feature Extraction, [32] Rajesh Kumar Aggarwal and M. Dave, “Acoustic modeling problem for
Modeling and Testing Techniques for Speaker Recognition". IETE Tech automatic speech recognition system: advances and refinements Part
Rev 2009; 26:181-90. (Part II)”, Into J Speech Technol, pp. 297– 308, 2011.
[21] N. S. Nehe ,P.R.E.C. Loni, R. S. Holambe ,S.G.G.S.I.E. & T, Nanded, [33] Urmila Shrawankar, Dr. Vilas Thakare, “Techniques for feature
“New Robust Subband Cepstral Feature for Isolated extraction in speech recognition system: A Comparative Study”,
WordRecognition”,International Conference on Advances in http://arxiv.org/ftp/arxiv/papers/1305/1305.1145.pdf.
Computing, Communication and Control (ICAC3’09). [34] Bishnu Prasad Das and Ranjan Parekh, “Recognition of Isolated Words
[22] Kavita, S.Yadav, M.M.Mukhedkar,"Review on Speech using Features based on LPC, MFCC, ZCR and STE, with Neural
Recognition,"International Journal of Science and Engineering, Vo.l, Network Classifiers”, International Journal of Modern Engineering
No. 2,2013,pp.61-70. Research (IJMER) , Vol.2, Issue.3, May-June 2012.
[23] Wouter Gevaert, Georgi Tsenov, Valeri Mladenov, "Neural
Networksused for Speech Recognition," Journal of Automatic,VoI.20,
2010.
[24] Vipulsangram K Kadam, Dr.Ravindra C Thool, “Optimization of Speech
Recognition using LPC Technic”, Published in IOSR Journal of
Engineering (IOSRJEN) ISSN: 2250-3021 Volume 2, Issue 8 (August
2012), PP 09-13.
[25] Yusnita M.A., Paulraj M.P., Sazali Yaacob, Shahriman Abu Bakar and
A.Saidatul, “Malaysian English Accents Identification using LPC and
Formant Analysis”,published in IEEE International Conference on
Control System, Computing and Engineering,2011.
[26] Wiqas Ghai and Navdeep Singh,“Literature Review on Automatic
Speech Recognition”, International Journal of Computer Applications
vol. 41– no.8, pp. 42-50, March 2012.

502 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)

You might also like