Speech Recognition Using DSP PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

AL-Nahrain University

College of Information Engineering


Communication Engineering Department

Speech Recognition Using Digital Signal Processing

By:
Mohammed Wahhab Abdulrazzaq
Abstract
 Speech recognition is the process of converting spoken words into text or commands that a
computer can understand.

 It involves several steps, including signal processing, feature extraction, and classification.

 Digital signal processing (DSP) plays a crucial role in speech recognition by processing and
analyzing the raw speech signal to extract relevant features such as pitch, zero crossing level
and short term energy level.

 These features are then used to recognize the male and female speech by using Digital filters
(low pass filter) and features extraction.

 In this project 13 audio were tested and recognized as male and female.

 After these tests, we found the females speech has more values than males for the above
mentioned characteristics.
Contents
 Introduction

 Problem to be Discussed

 Apparatus of Articulation

 Voice Production

 Speech Signal

 Speech Signal Characteristics

 DSP Recognition Techniques

 Speech Recognition Steps

 Simulation and Results

 Conclusion

 References
Introduction
• Speech recognition using DSP is the process of converting speech signals into
digital representations, which can be processed and analyzed to recognize
spoken words or phrases.

• The main goal of speech recognition is to enable machines to understand and


interpret human speech, and there are several ways to achieve this.

• The first step in speech recognition using DSP is to process the analog speech
signal, typically by digitizing it using an analog-to-digital converter (ADC). This
digital representation of the speech signal can then be processed using various
DSP techniques, such as filtering, feature extraction, and pattern recognition.
Problem to be Discussed

This presentation will explain how to make the machine be able to


recognize the speech from either male or female using the DSP
techniques.
Apparatus of Articulation
• The apparatus of articulation and
resonance consists of structures
and cavities that extend from the
vocal cords (excluded) to the lips
with the insertion of the nose.
• The larynx is placed on the
anterior neck, slightly below the
point where the pharynx divides
and gives rise to the separate
respiratory and digestive tracts.
Because of its location, the larynx
plays a critical role in normal
breathing, swallowing and
speaking.
Apparatus of Articulation (Cont…)

• The mouth, called also oral cavity, in addition to its primary role as
the beginning of the digestive system, also plays a significant role in
communication in humans.

• While primary aspects of the voice are produced in the throat, the
tongue, lips, and jaw are also needed to produce the range of sounds
included in human language.

• The tongue is a muscle on the floor of the mouth that manipulates


food for chewing and swallowing (deglutition). A secondary function
of the tongue is speech.
Voice Production
• Voice is the result of a complex mechanism, called phonation.
• It is the result of the vibration of the vocal cords. Such vibration
determines the production of a sound.
• The larynx-fundamental tone generated by the resonance cavities in the
upper part of the larynx.
• During the phonation process, the aerodynamic energy generated by the
respiratory system is transformed into a laryngeal acoustic energy; the
larynx through the rhythmic glottic opening and closing behaves as a
transducer of energy.
• The continuous air flow from the trachea is modulated. The glottal flow is
then modified from the supraglottic structures.
• The glottal flow is composed of glottal pulses. The period of a glottal pulse
is the pitch period.
Voice Production (Cont…)
• The reciprocal of the pitch period is the fundamental frequency, also
known as pitch. The vocal tract works as a time-varying filter to the glottal
flow.
• The characteristics of the vocal tract include the frequency response, which
depends on the position of organs.
• The peak frequencies in the frequency response of the vocal tract are
formants, also known as formant frequencies.
• In signal processing, a voice signal is a convolution of a time-varying
stimulus and a time-varying filter. In particular, the time-varying stimulus is
the glottal flow, whereas the time-varying filter is the vocal tract.
• The below figure shows the voice production model used in signal
processing.
Speech Signal
• A speech signal is an acoustic waveform that is produced by a human
voice when they speak.
• It is a complex sound wave that is created by the movement of the vocal
cords and the resonances of the vocal tract.
• The speech signal contains information about the linguistic content of
the speech, as well as the speaker's identity, emotional state, and other
characteristics.
• The speech signal is typically measured using a microphone or other
recording device, and it can be analyzed and processed using various
techniques from the field of digital signal processing.
• Speech signals are used in a wide variety of applications, including
speech recognition, speaker identification, and emotion recognition.
Speech Signal (Cont…)
• Voice is the result of the interaction between
three subsystems that constitute the vocal
apparatus.
• Power source: the driven air stream from the
lungs .

• Sound source: the modulated airflow


generated after the interruption of air stream
by the vocal folds closing .

• Sound Modifiers: the articulators that modify


the length and shape of the vocal tract and
thus its resonance frequencies
Speech Signal Characteristics
• Speech signals contain many different features that are important for understanding
and processing human speech. Some of the most common features of speech signals
include:
First: Fundamental Frequency
1. The fundamental frequency (often abbreviated as "f0") is the lowest frequency that
a periodic waveform or sound contains. It is also sometimes referred to as the first
harmonic.
2. The fundamental frequency is an important characteristic of a sound or waveform,
as it determines the pitch of the sound.
3. In speech, the fundamental frequency is often referred to as the "pitch" of the
speaker's voice.
4. It is measured in hertz and can vary depending on a variety of factors, including the
speaker's age, gender, and emotional state.
5. In general, male voices have a lower fundamental frequency than female voices, and
the fundamental frequency tends to increase with the speaker's level of emotional
arousal.
Speech Signal Characteristics (Cont…)

Second: Short-term energy


1. Short-term energy is a measure of the energy in a speech signal
over a short time interval.
2. It is calculated by squaring the signal values over a short time
window and summing them.
3. Short-term energy can be used to detect voiced and unvoiced
speech segments and can be useful in speech segmentation,
speaker recognition, and emotion recognition.
4. The calculation of short term energy is shown as in the following
equations
Speech Signal Characteristics (Cont…)
Speech Signal Characteristics (Cont…)
Third: Number of zero crossings
1. The number of zero crossings in a speech signal is a measure of the number of
times the signal changes sign.
2. It can be used as a measure of the periodicity of the signal, which is related to
the pitch of the speech signal.
3. The number of zero crossings can be used in speech analysis and synthesis, as
well as in speaker recognition and emotion recognition.
4. It can be calculated using the below equations:
DSP Recognition Techniques
Digital signal processing (DSP) techniques play a crucial role in speech recognition,
which is the process of automatically identifying spoken words or phrases. Some of the
key DSP techniques used in speech recognition include:
1. Digital filtering: Digital filters can be used to remove noise and other unwanted
signals from a speech signal. Common types of filters used in speech processing
include high-pass filters, low-pass filters, and bandpass filters.
2. Feature extraction: Feature extraction involves identifying key characteristics of a
speech signal that can be used to identify the spoken word or phrase. One of the
common features used in speech recognition is the fundamental frequency (pitch)
of the speech signal.
3. Hidden Markov Models (HMMs): HMMs are statistical models that can be used to
represent the sequence of speech sounds that make up a spoken word or phrase.
DSP Recognition Techniques
• In this project the speech recognition is based on features extraction
and the digital filtering (low pass filter is also used).
Speech Recognition Steps
Step 1: Signal acquisition, the first step is to acquire the speech signal. This can be
done using a microphone or other audio input device.

Step 2: Signal sampling, the signal is sampled to be converted to digital form.

Step 3: Signal filtering, to remove noise and other unwanted signals.

Step 4: Feature extraction, Features are extracted from the preprocessed speech
signal.

Step 5: Feature analyzing, the extracted features are analyzed and compared with
the proved standards.

Step 6: Signal recognition, judging the signal is either for male or female according
to feature analyzing.
Simulation and Results
• For implementing this project, the MATLAB simulation tool is used.
• Simulation Parameters:
1. 13 sounds are tested.
2. Low Pass filter with cut of freq.=1000 Hz
3. fundamental_freq_level=135
4. zero_crossing_level=12
5. short_energy_level=0.5
Simulation and Results
Simulation and Results
Simulation and Results
Simulation and Results
Simulation and Results
Simulation and Results
Simulation and Results
Simulation and Results
Simulation and Results
Simulation and Results
Conclusion
• Speech signal differs from one human to another.
• Speech signal has different characteristics.
• DSP techniques can be used to extract and analyze the signal
features.
• It is demonstrated the female speech signal has higher values in term
of Fundamental Frequencies, Zero crossing level and short term
energy than male speech signal.
References
[1] M. Saundade P. Kurle, “Speech Recognition using Digital Signal Processing”, International
Journal of Electronics, Communication & Soft Computing Science and Engineering, volume 2, issue
6, 2013.
[2] A. Palumbo, B. Calabrese, P. Vizza, N. Lombardo, A. Garozzo,M. Cannataro, F. Amato and P.
Veltri,” A Novel Portable Device for Laryngeal Pathologies Analysis and Classification”, Springer-
Verlag Berlin Heidelberg 2010.
[3] D. Jurafsky, J. H. Martin "Speech and Language Processing“, second Edition, Prentice Hall, 2008.
[4] J. Layer and P. Trudgill, “Phonetic and linguistic markers in speech”, Cambridge University Press,
Language in Society in 1972.
[5] S. K. Banchhor, “TEXT-DEPENDENT METHOD FOR GENDER IDENTIFICATION THROUGH
SYNTHESIS OF VOICED SEGMENTS”, International Journal of Engineering Science and Technology
(IJEST), Vol. 3 No. 6 June 2011.
[6] D. Kovačićb, E. Balaban,“Voice gender perception by cochlear implantees”, Journal of the
Acoustical Society of America, September 2009.
[7] “Short Term Time Domain Processing of Speech”, Retrieved 19 February 2023, from
vlab.amrita.edu/?sub=3&brch=164&sim=857&cnt=1.
Thanks

You might also like