Speech Recognition and Retrieving Using Fuzzy Logic System

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Speech Recognition and Retrieving using Fuzzy Logic System

Arjuwan M. Abduljawad Al-Jawadi


Technical College , Foundation of Technical , Engineering , Mosul , Iraq
(Received 21 / 4 / 2009 , Accepted 1 / 11 / 2009 )

Abstract
Speech Analysis is one of the most interesting fields in Digital Signal Processing in which many researches have been
done on it based on different materials tools and scientific programs to produce an analysis that start from speech
production, processing, coding and recognition, Chester, F.J .Taylor and M.Doyle were the first to apply the analysis of
speech signal [1]. In this research, females and males speech samples of the word, 'Close', were used to build a system
in Neural Network and Fuzzy Logic to recognize the male from female speech voice and compared between the results
of the two systems, then the system of the fuzzy logic was developed based on three features of the speaker voice
which are energy value of the signal ,power spectrum of the signal and vowel sound “O” in the word close in the speech
samples to increase its ability in recognizing an individual speaker and to increase system security against intruders by
making the system recognizes the speech of a one person giving a voice acceptance authority to that person and make
an access denied to others to prevent accessing the system. The system shows good results during testing operation
using samples of one person against others female's and male's samples.
Introduction
The speech signal is a high redundant and non-stationary fundamental sound or phoneme, of English, the shape of
signal. This attributes causes the speech signal to be very the vocal tract is somewhat different which may lead to a
challenging to work [2]. The speech recognition systems different sound. In signal processing, sound shaping is
fall into two categories according to [3]: called filtering [3]. MATLAB program was used in the
1. Speaker dependent systems that are used and work which a very useful program in processing non-
often trained by one person. stationary signals such as speech signal and many
2. Speaker independent systems that can be used systems was designed based on its tool as in [5] .
by anyone. Materials and Methods
In general, speaker recognition can be subdivided into 1. Recording the Speech Samples
speaker identification (who is speaking?) and speaker The first strep in the research was building a data base of
verification (Is the speaker who we think he or she is?) speech samples, this step requires to collect both females
.In addition, speaker identification can be closed-set (The and males samples and an individual voice speaker
speaker is always one of a closed set used for training.) samples for a female the selected word for recording the
or open-set (speakers from outside the training set may speaker's voices was saying,' Close', word as the vowel
be examined.). Also, each variant may be implemented sound can be heard clearly . Dell Laptop computer was
as text-depended (The speaker must utter one of a closed used for recording the speech signals for both males and
set of words.) or text-independent (The speaker may utter females samples and using Sound Recorder software of
any type of speech) as in [4]. windows (XP) and a Microphone (Creative) model HS-
Human speech can be separated into two distinct 350.
sections: sound production and sound shaping. Sound 2. Low-Pass Filter
production is caused by air passing across the vocal cords It is very important to use low-pass filter because the
(as in "a", "e", and "o") or from a construction in the recording speech signals samples have noise that
vocal tract (as in "sss", "p", or "sh"). Sound production produces during speaking and usually that noise lies at
using the vocal chords is called voice speech and it has the higher frequencies while the speech data lies at the
been taken as a feature that used in tracking the vowel lower frequencies so this filter was used twice, first when
sound during speech in male and female samples in my reading the speech signal in MATLAB program
research; unvoiced speech is produced by the tongue, environment as shown in Fig (1) and second when
lips, teeth, and mouth .In signal processing sound calculating the power of the Discrete - Fourier Transform
producing is called excitation. Sound shaping is a to separate the data of the power amplitudes for the
combination of the vocal tract, the placement of the speech signal from the noise. As shown in Fig (2).
tongue, lips, teeth and the nasal passages. For each

821
Fig (1) the Effect of the Low-Pass Filter on a sample of the Speech Signal

Fig (2) the Effect of the Low-Pass Filter on the Power of a sample of the Speech Signal

3. Features Extraction signal is another feature that can be used in making


1.3- Pitch recognition between the male and the female voice By
Pitch is the most distinctive difference between male and applying the Fast Fourier transform algorithm.
female speakers. A person's pitch originates in the vocal 2.3- Fast Fourier Transform (FFT)
cords, and the rate at witch the vocal cords /folds vibrate The discrete Fourier transform (DFT) with a million
is the frequency of the pitch. The reason pitch differs points are common in many applications. Modern signal
between sexes is the size, mass and the tension of the and image processing applications would be impossible
laryngeal trade which includes the vocal folds and the without an efficient method for computing the discrete
glottis (the spaces between and behind the vocal Fourier transform which transform time or space – based
folds).The fundamental frequency or pitch of the human data into frequency – based data [7]. The DFT was used
voice is about 250 Hz and the fold length is about as a features extractor because the frequency magnitude
10.4mm . After puberty the human body grows to its full does contain information about the pitch and the
adult size, changing the dimension of the larynx area. formants. Beside the spectral magnitude also holds a
The vocal fold length in males increases to about 15- great deal of other information beside the pitch and the
25mm. While female's vocal fold length increases to formants magnitude .The DFT of the vector of length n is
about 13-15mm. The average pitch falls between 60 and another vector y of length n.
120 Hz ,and the range of female's pitch can be found
between 120 and 200 Hz. Females have a higher pitch ( )( )
range than males because the size of their larynx is ( ) ∑ ()
smaller [4]. This feature was very helpful in recognizing
between the male and the female voice in many
researches, one of them was proved using fuzzy logic ( )( )
system as in [6]. Based on the pitch feature I found in the () ( )∑ ( )
research that the amplitudes of the power of the speech

821
( ) window length and the transform length. The window
Where, X (k): The discrete input signal in time domain length is the length of the input data vector. It is
X (j): The discrete input signal in frequency domain determined by, for example, the size of an external
N: The number of the samples buffer. The transform length is the length of the output,
j: Sample no. of discrete time domain the computed DFT. An FFT algorithm pads or chops the
K: Sample no. of frequency domain input to achieve the desired transform length. Fig (3)
The last equation is an Nth root of unity. When using illustrates the two lengths.
FFT algorithms, a distinction is made between the

Fig (3) The Fast Fourier Transform algorithm chaps or pads the input

A program in MATLAB 7.6.0 (R2008a) was written window length of the speech signal, the transform length
using its commands to calculate the window length of the of the signal was calculated by finding the smallest
input signal which is the speech signal. The execution power of two that is greater than or equal to the absolute
time of the fast Fourier transform algorithm depends on value of the window length for the speech signal [7]. The
the transform length .It is fastest when the transform transform signal was used to calculate the power of the
length is a power of two. The algorithm allows DFT for the speech signal .The result was shown in the
estimating component frequencies in data from a discrete plot of power versus frequency which is called a
set of values sampled at a fixed rate. From calculating the periodogram as shown in Fig (4).

Fig (4) The Periodogram Plot of the speech signal

3.3- Spectrogram this speech. There are other varieties of researches that
Signals such as speech are composed by many different can be performed using spectrogram .The amount of
ranges of frequencies. Thus frequency representation is information a spectrogram can give is enormous and
necessary in the interpretation of a speech signal. many speech researches can identify plain English text
Spectrogram is one of well-known frequency from spectrogram [3] . In the research spectrogram tool
representation of the original speech. The vertical axis was used in MATALB language because the power
corresponds to time .The intensity of the pattern at any spectrum of a signal represents the contribution of every
instant of time corresponds to the energy level. frequency of the spectrum to the power of the overall
Spectrogram allows users to know the amount of energy signal. Beside it is very helpful in noise cancellation and
a speech, might have in terms of frequency scale. This is system identification.
useful tool to detect voiced and unvoiced areas, and The process started Using MATLAB Signal Processing
identifying the relevant frequency that is composed in Block set, by loading a speech signal from MATLAB

831
workspace for the word,' Close', for each sample. Then irrelevant in other areas. It was applied periodically to
separating the speech signal to number of segments the speech signal and average two spectra at one time.
called frames with given sample time (1/8000) and 80 The length of FFT was assumed 128, which is the
samples per frame. The output buffer size per channel number of samples per frame. Fig (5) shows the
(128) with (48) buffer overlap. Based on theses sitting spectrogram of the speech signal system that was
parameters ,the first output frame contains 48 initial designed in MATLAB signal processing Block set [7].
condition values followed by the first input frame .The The Only changes that were made in the designed system
second output frame contains the last 48 values from the is how to load the input speech signal from MATLAB
previous frame followed by the second 80 samples from work space by using Simulink signal processing sources
the second input frame and so on. The buffering of the ,signal from workspace block and since the speech signal
input signal into an output signal with 128 samples per was produces as two dimension signal so another change
frame was to minimize the estimating noise added to the was made through the selector block. The Vector Scope
speech signal. After this step was completed, the Fourier block was used to display the power spectrum of the
transform was taken for the signal using Periodogram speech signal as shown in Fig (6). While the spectrogram
block which calculate a nonparametric estimation of the of the speech signal was viewed using Matrix Viewer as
power spectrum of the speech signal. This operation is shown in Fig (7). The speech signal represents speech
depending on windowing process like Hamming and the sample for female's voice saying "Close".
purpose of it is to make intense in some area while

Fig (5) Viewing Spectrogram of the Speech Signal using MATLAB Signal Processing Block set

Fig (6) The Vector Scope Window displaying a Sequence of Power Spectrums, one for each Window of the
Original Speech Signal

838
Fig (7) The Matrix Viewer Window displaying the Spectrogram of the Speech Signal
From Fig (7), we can notice the harmonics that are contrast, very small steps may go in the correct direction,
visible in the speech signal when the vowel, "o" is but they also require a large number of iterations. In
spoken in the word, "Close", where most of the speech practice, the step size is proportional to the slope (so that
signal's energy is concentrated. AS seen in the Fig (7) the the algorithms settle down in a minimum) and to a
Spectrograms represent the color-based visualizations of special constant: the learning rate. The correct setting for
the evolution of the power spectrum of a speech signal as the learning rate is application-dependent, and is
this signal is swept through time that was calculated typically chosen by experiment; it may also be time-
using periodogram power spectrum estimation method. varying, getting smaller as the algorithm progresses. The
4. Training Neural Networks for Speech Recognition algorithm therefore progresses iteratively, through a
An Artificial Neural Networks (ANNs) is an information number of epochs. On each epoch, the training cases are
processing paradigm that is inspired by the way each submitted in turn to the network, and target and
biological nervous systems, such as the brain, process actual outputs compared and the error calculated. This
information. The key element of this paradigm is the error, together with the error surface gradient, is used to
novel structure of the information processing system. It is adjust the weights, and then the process repeats. The
composed of a large number of highly interconnected initial network configuration is random and training stops
processing elements (neurons) working in unison to solve when a given number of epochs elapse, or when the error
specific problems. ANNs, like people, learn by example. reaches an acceptable level, or when the error stops
An ANNs are configured for a specific application, such improving. In the research, Back propagation Neural
as pattern recognition or data classification, through a Network has been used to build a network that able to
learning process [8]. The best-known example of a neural recognize male from female's samples based on the
network training algorithm is back propagation power amplitudes response of the DFT for the speech
(Patterson, 1996; Haykin, 1994; Fausett, 1994). Modern signals samples where the results shows that the power
second-order algorithms such as conjugate gradient amplitude response for males' samples were higher than
descent and Levenberg-Marquardt (Bishop, 1995; the amplitude response for females' samples. The steps to
Shepherd, 1997) are substantially faster (e.g., an order of create a Neural Network based recognizer are [9]:
magnitude faster) for many problems, but back 1. Specify the phonetic categories that the
propagation still has advantages in some circumstances, network will recognize .In the research it was
and is the easiest algorithm to understand. In back the power amplitudes for the DFT for both
propagation, the gradient vector of the error surface is males and females samples.
calculated. This vector points along the line of steepest 2. Find many samples of each of these categories
descent from the current point, so we know that if we in the speech data. In the research the best
move along it a "short" distance, we will decrease the female and male samples depending on its
error. A sequence of such moves (slowing as we near the power amplitudes were used. The smallest
bottom) will eventually find a minimum of some sort. power value for the female sample assigned
The difficult part is to decide how large the steps should target [0 1 0 1] and the largest power value for
be. the male sample assigned target as [0 0 1 1] .
Large steps may converge more quickly, but may also 3. Train a network to recognize females and
overstep the solution or (if the error surface is very males samples. In the research the Back
eccentric) go off in the wrong direction. A classic Propagation Neural Network (BPNN) was
example of this in neural network training is where the trained for 462 epochs, with 50-hidden unit as
algorithm progresses very slowly along a steep, narrow, a start but it was not enough number to satisfy
valley, bouncing from one side across to the other. In the recognition process so the number of

832
hidden units was increased to 100-hidden unit parameters using MATLAB nntool
to enhance the results. Fig (8) shows the environment. Fig (9) shows the training state
trained neural network with the sitting of the Neural Network.

Fig (8) The Back propagation nntool Neural Network

Fig (9) The Training State of the Back propagation Neural Network

Fig (10) The Regression State of the Back propagation Neural Network
4. Evaluate the network performance using a test samples after the training of the BPNN for the
set. In the research the network was tested speech samples. Fig (11) shows the
using 45 – sample for males' and females' performance of the network.

833
Fig (11) The Performance of the Back propagation Neural Network

5. Fuzzy Logic noise data. These characteristics as in [10] suggested in


Fuzzy set theory and fuzzy logic were conceived in 1965 the researches that fuzzy logic might be an effective tool
by Lotfi Zadeh as a way of allowing uncertainly or for speech recognition. The fuzzy logic toolbox for use
vagueness to be represented mathematically .Fuzzy sets with MATLAB is a tool for solving problems. It helps to
are a super –set of classical sets .Each element in a fuzzy create and edit fuzzy inference systems by using
set is associated with real number which represents the graphical tools or command-line functioning [7]. In the
degree of membership of the element in the set. Fuzzy research the first fuzzy system program was built to
sets are usually expressed as a set of elements that having recognize between males and females samples based on
degree of membership for the truth values in closed unit two rules:
interval [0, 1]. The idea behind a fuzzy set represents a 1- If the power amplitude of the speech signal is
concept and having a context is a further expanded by small value then female speaks
linguistic variables. A linguistic variable is assigned to a 2- If the power amplitude of the speech signal is large
fuzzy region, a set of fuzzy sets that represent a complete value then male speaks
concept. And it is also a super-set of classical logic that The two rules helped the system to recognize the speaker
deals with prepositions which required being either true as female and as a male; the system is shown in fig (12)
or false. Fuzzy logic allows highly nonlinear, poorly that was built using mamdani MATLAB toolbox7.6.0
understood or mathematically complex systems to be (R2008a) and fig (13) shows the rule viewer of the
modeled reliably and efficiently. And it deals well with system.

Fig (12) The Speech Signal Recognizer System

831
Fig (13) The Rule Viewer Speech Signal Recognizer System

Based on the extracted features, the power-amplitude, speech signal feature, The power Spectrogram of the
power spectrum and vowel-sound, the above system was speech signal feature and the signal energy feature that
developed to be able to recognize an individual speaker a was concentrated in the harmonics of the vowel sound 'o'
female samples against other males and females samples. in the word,"Close". The system was able to retrieve the
Testing the system through several cases and changing required speech signal for the required speaker. The
the rules .A system was built of 24- rules based on the system is shown in Fig (14) and the rule viewer is shown
values of the three extracted features, with 3-inputs in Fig (15). While Fig (16) shows the 24- rules that used
represent the three extracted features for the speech in the voice acceptance process.
samples, using the power amplitude of the DFT for the

Fig (14): The Individual Speech Signal Recognizer

831
Fig (15) The Rule Viewer of the Individual Speech Signal Recognizer

Fig (16) The Rule Editor of the Individual Speech Signal Recognizer
The Results different males samples (M) and [0 1 0 1] for different
From collecting the speech samples for both male's and females samples (F). The output of training [0 -1.5252e-
female's samples, the back propagation Neural Network 011 1 1 ; 0 1 -3.0755e-011 1] , the results are shown
(BPNN) was tested to recognize the speech signal, using in table (1).
50-hidden units, The target was considered [0 0 1 1] for
Table (1): The Testing Results of the BPNN with 50-Hidden Units
Recorded Female Speech Output Recorded Male Speech Output
Speech [0 1 0 1] Speech [0 0 1 1]
F1 0 0.9668 0.7927 1 M1 0 -0.4829 3.0105 1
F2 0 1.8415 0.0727 1 M2 0 1.5231 0.2324 1
F3 0 1.0240 0.7514 1 M3 0 0.1036 0.9397 1
F4 0 1.9650 -0.0614 1 M4 0 1.7554 0.1526 1
F5 0 0.23 -3.2551 1 M5 0 0.1681 0.9723 1
F6 0 1.0343 0.1921 1 M6 0 -0.2679 1.6114 1
F7 0 1.4096 0.0081 1 M7 0 1.8479 0.0667 1
F8 0 -0.0137 1.0178 1 M8 0 0.5563 1.0399 1
F9 0 0.2674 1.0538 1 M9 0 -0.4514 3.0156 1
F10 0 -0.092 2.8621 1 M10 0 1.6042 0.2876 1
F11 0 0.4892 1.0666 1 M11 0 -0.0309 1.0415 1
F12 0 0.4693 1.0444 1 M12 0 -0.3433 1.8869 1
F13 0 0.7087 0.9610 1 M13 0 0.0847 0.9392 1
F14 0 -0.2339 2.5984 1 M14 0 1.0315 -0.0808 1
F15 0 -0.5156 2.6285 1 M15 0 -0.3244 1.8158 1

831
Table (2): The Testing Results of the BPNN with 100-Hidden Units
Recorded Female Speech Output Recorded Male Speech Output
Speech [0 1 0 1] Speech [0 0 1 1]
F1 0 0.1377 0.2221 1 M1 0 -0.4829 3.0105 1
F2 0 -0.1339 0.0883 1 M2 0 1.145 0.4904 1
F3 0 0.1695 0.1702 1 M3 0 -0.1462 1.0112 1
F4 0 -0.1741 0.0207 1 M4 0 -0.0167 0.0333 1
F5 0 1.1421 0.2155 1 M5 0 -0.2039 1.0091 1
F6 0 1.4022 0.3869 1 M6 0 0.3437 1.0299 1
F7 0 0.4446 -0.2962 1 M7 0 -0.1422 0.0913 1
F8 0 -0.2209 0.9423 1 M8 0 -0.0849 0.6085 1
F9 0 0.9996 -0.0001 1 M9 0 0.9182 0.2003 1
F10 0 0.3235 0.5692 1 M10 0 0.1735 -0.0632 1
F11 0 -0.1221 0.6829 1 M11 0 0.0473 0.9996 1
F12 0 0.5046 0.9067 1 M12 0 0.0407 1.015 1
F13 0 -0.0033 0.5 1 M13 0 -0.1239 1.0089 1
F14 0 0.2133 -0.2771 1 M14 0 1.2357 0.2661 1
F15 0 0.5603 0.7935 1 M15 0 0.3921 1.0214 1
The BPNN was trained using the female and male trained and tested using 50-hidden units ,then with 100-
samples for specified number of epochs as mentioned hidden units . Table (3) shows the recognition success
before, the speech samples were taken from the energy using BPNN with different numbers of hidden units. As
values extracted feature in the speech processing can seen in table (3) as the number of hidden units
operation, the target values was assigned as [0 0 1 1] for increased, the performance of the BPNN was increased
male samples and [0 1 0 1] for females samples, then the too , (True) samples refers to the number of samples that
BPNN was tested against others females and males were tested and produced the required output ,while the
speech samples, in BPNN the results are approximately (False) samples refers to the number of samples that were
values as can seen in the above tables, the BPNN was tested and did not produce the required results.
Table (3): The Recognition Success Rates using BPNN
Recognition Success (50-hidden units) Recognition Success (100-hidden units)
Female (True) 40% Female (True) 33.3%
Female (False) 60% Female (False) 66.7%
Male (True) 60% Male (True) 80%
Male (False) 40% Male (False) 20%
The fuzzy interface system was built using MATLAB7.6 The output range for the speaker was (0 – 10), female
(R2008) as a recognizer speech system for males and speaker resulted in range (0 – 2.5), while male speaker
females samples. Two rules were used in the system to resulted in range (2.6 and above). Like every system,
recognize the male from the female samples as there is an error value may result, the results are shown
mentioned above. in table (4) , The small power amplitude values resulted
1- If the power amplitude of the speech signal for a females speaker , while the large power amplitude
is small value then female speaks. With values resulted for males speaker , while table (5) shows
range specified (0 – 0.5). the recognition success rate of the system.
2- If the power amplitude of the speech signal
is large value the male speaks. With range
specified (0 – 0.5).
Table (4): Speech Recognizer Fuzzy Interface System
Recorded Power Amplitude Fuzzy output Recorded Power Amplitude Fuzzy output
Speech (Watt) Females Samples Speech (Watt) Males Samples
F1 0.2826 2.27 M1 0.3368 4.07
F2 0.2647 2.07 M2 0.3209 3.47
F3 0.2816 2.26 M3 0.2673 2.1
F4 0.2573 2 M4 0.3126 3.06
F5 0.2056 1.72 M5 0.3935 2.86
F6 0.1132 1.83 M6 0.2645 2.09
F7 0.2337 1.84 M7 0.2911 2.41
F8 0.3033 4.01 M8 0.7108 3.74
F9 1.4565 2.72 M9 0.2709 2.13
F10 0.2929 3.74 M10 0.3420 3.94
F11 0.4648 2.45 M11 0.4163 2.68
F12 0.2875 2.97 M12 0.3235 3.61
F13 0.2424 2.35 M13 0.1728 1.69

831
F14 0.2426 1.89 M14 0.4102 2.7
F15 0.4980 3.69 M15 0.3262 3.77
Table (5): Speech Recognizer Fuzzy Interface System
Recognition Success Recognition Success (Males)
(Females)
True % 80% True % 73.3%
False % 20% False % 26.7%

But this system was not enough to recognize an MATLAB processing block set and the vowel sound
individual speaker to be tested against others speakers, which resulted from reading the harmonics for the vowel
females and males. 15- Samples were collected for an sound, ’O’, in the word ‘Close’, the speech samples
individual speaker all belong to one female file named (I) were recognized into females and males samples in the
and tested against the other samples. The system was first fuzzy logic system using only two rules ,the power
developed by using 24- rules based on the three extracted amplitude values ,then more rules were added to the
features as shown in Fig (16), 3-inputs ,the extracted fuzzy system ,using the three extracted features ,to make
features for each speech sample and one output represent the system able to recognize only one specified speaker
the voice acceptance which based on two cases, First (I) , without the three grouped features ,it will be difficult
case produce," Access-Validate”, in range (0-5) for the for the system to recognize the required speakers, in
required authority speaker to access a work area or BPNN this operation might be difficult and limited so
security system ,which is the required speech sample (I) fuzzy logic gives more flexibility in work . The using of
and second case to produce," Access- not Validate", in the 24- rules produced a robust fuzzy logic system in
range (6-10) to give a rejection for non-required or recognizing an individual speaker. From the table, F1
authorized voice ,which is female speech sample (F) or represent a female sample with power amplitude
male speech sample (M). The three inputs that represent (0.2826), power spectrum value (-4) and vowel sound
the extracted features from the speech signal process are: value (-55), its voice acceptance is (6.05) ,it lays in the
- Power Amplitude range (0 – 0.5) watt. range of (6 - 10) ,which is “Access not – Validate”, this
- Power Spectrogram range (-100 – 0) db is because F1 is not the authorized speaker for the system
- Vowel Sound,’O’, (- 100 – 0) db. , the same is true for sample M1 , while testing sample I1
Table (6) shows the results of the fuzzy logic interface , it’s power amplitude (0.1853) , power spectrum (-5) and
system that was designed to recognize one authorized it’s vowel sound value (-50) ,the voice acceptance result
speaker only (I) against other females speakers (F) and was (5.91) , according to the 24 - rules, it’s voice
males speakers (M), each speech samples passed through acceptance results in the range (0 – 5) ,which is “Access
the speech signal processing operation for the features Validate” ,and from its result ,it is recognized as the
extraction ,power amplitude value produced from the required authorized speaker for the system . The others
power detection algorithm using DFT ,the power samples values are all shown in table (6). Table (7); show
spectrum value produced from the power spectrogram in the success rate for recognizing the required speaker (I).

831
Table (6) Speech Recognizer Fuzzy Interface System for an Individual Speaker
Recorded Speech Power Amplitude Spectrogram Power Vowel Sound Voice Acceptance
F1 0.2826 -4 db -55 db 6.05
F2 0.2647 -15 db -70 db 5.35
F3 0.2816 -5 db -48 db 6.38
F4 0.2573 -30 db -60 db 4.85
F5 0.2056 -15 db -70 db 5.16
F6 0.1132 -10 db -50 db 5.26
F7 0.2337 -5 db -55 db 5.07
F8 0.3033 -5 db -60 db 7.56
F9 1.4565 -11 db -61 db 6.68
F10 0.2929 -3 db -60 db 7.75
F11 0.4648 -5 db -70 db 6.36
F12 0.2875 -6 db -46 db 7.54
F13 0.2424 -18 db -61 db 5.84
F14 0.2426 -18 db -60 db 5.13
F15 0.4980 -20 db -61 db 5.58
M1 0.3368 -2 db -60 db 7.52
M2 0.3209 -5 db -60 db 7.17
M3 0.2673 -10 db -69 db 5.63
M4 0.3126 -8 db -60 db 6.95
M5 0.3935 -5 db -61 db 7.63
M6 0.2645 -3 db -44 db 7.29
M7 0.2911 -10 db -56 db 6.31
M8 0.7108 0 db -61 db 7.08
M9 0.2709 -1 db -51 db 5.73
M10 0.3420 0 db -62 db 6.47
M11 0.4163 -5 db -66 db 7.86
M12 0.3235 -5 db -64 db 7.24
M13 0.1728 -11 db -50 db 4.81
M14 0.4102 -5 db -55 db 7.63
M15 0.3262 0 db -61 db 7.3
I1 0.1853 -5 db -50 db 5.91
I2 0.2795 -40 db -65 db 4.81
I3 0.2009 -40 db -79 db 4.81
I4 0.3798 -25 db -80 db 4.94
I5 0.4574 -45 db -71 db 4.93
I6 0.3030 -12 db -75 db 6.66
I7 0.2527 -25 db -75 db 4.81
I8 0.2567 -40 db -75 db 4.81
I9 0.1030 -35 db -75 db 4.81
I10 0.1851 -50 db -80 db 5.35
I11 0.1223 -44 db -75 db 4.81
I12 0.1191 -38 db -75 db 4.81
I13 0.1811 -40 db -65 db 4.82
I14 0.1144 -30 db -75 db 4.81
I15 0.1756 -40 db -80 db 4.82

Table (7): Speech Recognizer Fuzzy Interface System for an Individual Speaker
Recognition Success
(Individual Speaker)
True % 91.1 %

False % 8.8 %

Discussion speech analysis, .As seen in the results section, Fuzzy


Speech analysis is still a challenging area in recognition, logic was proved to be a successful system in both
classification and retrieval .many researches have done in recognition and retrieval operations of an individual

831
speaker, compared with BPNN which suffered from Individual Speaker Recognizer", as it was able to
several problems in giving a convincing recognition recognize a specified individual speaker and give it
rates. Fuzzy logic is a flexible system where true and ,"Access- Validate", while give ,"Access- Denied ", to
false decisions are taken based on a membership other unauthorized speakers whether they were males or
functions. The first system was used in recognizing females , the individual speaker in the research was taken
between males and females samples and it shows a good form female's samples .The results of table (7) ,shows the
success rate but to increase the recognition ability of this recognition success rates of the system. Finally recent
system ,more features from speech signal were required studies have shown that fuzzy systems and neural
to be define ,so spectrogram analysis which used in networks are both part of a class of universal
speech analysis researches as in [4] ,[11] was very approximations of continuous functions. This similarity
helpful analysis in defining two other robust features that means that a fuzzy system can be replaced by some form
define additional rules to the fuzzy interface system of neural network and vice versa. Fig (17) shows the
using MATLAB 7.6 (R2008a) tool and increases the model assumption for the recognizing and retrieving of
system security and authority against unauthorized speech signals samples .As a future work ,this fuzzy
speakers that are not allowed to enter the system or to logic system can be developed by connecting it to a
specified work area so as seen from table (6) ,the three microcontroller device ,designing a complete recognizer
features were the fuzzy interface system inputs and based system for an authorized speaker , performing orders
on 24- rules a decision was made to give a voice ,like open system ,close system and so on.
acceptance to the speaker ,the system was named an,"

Recorded-speech Signal

Power Amplitude
Processing

Fuzzy Logic
Recognizer Speech
System

Data Base speech


samples

Features Extraction
Processing

Fuzzy Logic
Recognizer an
Individual Speech
System
Fig (17) Model Assumption for Speech Recognition and Retrieval Operations

References

811
[1]. Chester ,F.J ,Taylor ,and M.Doyle ,"The Winger
Distribution In Speech Processing Applications",
J.Franklin Inst.,vol.318 ,pp. 415 – 420 ,1984.
[2]. Ho, K. Lai, and, Octavian, C.," Speech Processing
Workshop", Department of Electrical and Electronic
Engineering, Part IV Project Report 2003.
[3]. Jere, B. ," Digital Signal Processing Application
using the ADSP-2100 Family ", Application Engineering
staff of Analog Devices DSP Division Edited ,PTR
,Prentice Hall ,New Jersey ,p.330-p.331.
[4]. Brain, J. Love, and Jennifer, V. and, Xuening,
S.,"Automatic Speaker Recognition Neural Networks",
Electrical and Computer Engineering Department, the
University of Texas at Austin, Spring 2004.
[5]. Abdul-Bary, R., S., Soad, T. A., "MATLAB-Based
Design and Implementation of Time-Frequency
Analyzer", College of Electrical Eng., Technical
Institute.
[6]. M.Liu, C.Wan, L.Wang," Content-Based Audio
Classification and Retrieval using a Fuzzy Logic System
towards Multimedia Search Engines".
[7]. The Math Work, Inc. "MATLAB the Language of
Technical Computing ", Version .7.6.0(R2008a), USA.
[8]. Christos, S. and, Dimitrios, S., "Neural Networks",
http://www.emsl.pnl.gov:2080/docs/cie/neural/neural.ho
mepage.html
[9]. John-Paul, H., and Ron, C. and Mark, F. and John ,
S. , and Yonghong ,Y. and Wei ,W. , "Training Neural
Networks for Speech Recognition", Center for spoken
language understanding ,(CSLU), Organ Graduate
Institute of Science and Technology,January29,1998.
http://speech.bme.ogi.edu/tutordemos/nnet_training/tutor
ial.html.
[10]. Patrick.M. Mills,"Fuzzy Speech Recognition",
Bachelor of Science in Engineering, Swathmore College,
1994, University of South Carolina, Thesis 1996.
[11]. Dunia, J.,Jamma, "Low Cost Wavel et Analyzer",
Computer Engineering Department, Technical College/
Mosul, Foundation of Technical College, Thesi , January
/2005.

818
‫الملخص‬
‫تمييزصوت المتكلم هو واحد من أكثر المواضيع المثيرة لألهتمام في مجال معالجة األشارة الرقمية حيث أنجزت العديد من البحوث في‬
‫هذا المجال بأستخدم المواد واألدوات والبرامج العلمية المختلفة لتوليد ‪,‬معالجة ‪,‬تشفير وتمييز صوت المتكلم‪ .‬في هذا البحث تم أستخدام‬
‫‪.‬‬ ‫عينات من أصوات األ ناث و الذكور المتكلمين لبنا ناام قادرعل التمييز بينمما بأستخدام الشبكات العصبية وناام المنق المضب‬
‫بأعتماد ثالث ميزات لصوت المتكلم وهي قيمة القاقة لألشارة ‪ ,‬قيف األشارة و صوت حرف العلة‬ ‫تم أستخدام ناام المنق المضب‬
‫التعرف عل هوية الشخص وتمييز صوت متكلم محدد لشخص واحد دون أصوات المتكلمين‬ ‫في الكلمة المذكورة ليكون قادر عل‬
‫ناام أو منققة عمل محددة ومنع‬ ‫أمكانية أعقا قبول وتخويل للشخص المتكلم بالدخول ال‬ ‫األخرين و تـقنية الناام تعتمد عل‬
‫الناام نتائج جيدة بأستخدام عينات‬ ‫الناام أو منققة العمل ‪ .‬خالل مرحلة األختبار أعق‬ ‫األشخاص غير المخولين من أخت ار‬
‫لصوت متكلم محدد مقارنة مع عينات أصوات متكلمين أخرين من أناث وذكور‪.‬‬

‫‪812‬‬

You might also like