(IJCST-V9I2P18) :swati, Harpreet Kaur
(IJCST-V9I2P18) :swati, Harpreet Kaur
(IJCST-V9I2P18) :swati, Harpreet Kaur
ABSTRACT
This is a software in which the user control computer functions and dictates text into voice. The project is having two
components, first one is for processing signal which will be captured by microphone, the output we will be having is
words to signals. The second component is to capture the signals and then we will be getting signals to words. We know
that for interacting with other people voice is the basic communication method. This technology helps to respond
correctly to human voices and provide valuable services. Using voice commands communicating with a computer
is f ast.
Keywords: - Voice Recognition, Text to Speech
I. INTRODUCTION
Voice is one of the form of communication. This is for
We will be having pre-processing and feature
interacting with people. These days they are so many
extraction,acoustic,language and pattern steps. These works
speech technologies but with different tasks. People prefer
as follows:
these kind of services because communicating with
computer is faster than any other alternatives. 1. Pre-processing:
By developing voice recognition and text to speech The voice signals known as analog signals will be
system this task can be accomplished. This allows transformed into digital signals. These signals later used
computer to translate voice into text. It is the process of for processing. Firstly these digital signals will be moved
converting an signalinto words and vice versa. into filters for flattening of filters. This will help in
increasing the signals energy for more frequency.
II. VOICE RECOGNITION
Speech recognition is a technique which is used to
identify all the words and it will be convert to the format
which machine understands. This technology will be based
on some parameters like vocal sound and vocabulary. We
will be having different parameters[1] to recognize speech.
They arevocal sounds and vocabulary. Each person will be
having different tone of voice. So our project design will
be having different modes of tones which matches to
different speakersvoice.
1. Vocal Sound: One of the main role of this project is
recognizing the voice of speaker. Some people have a
habit of speaking continuously and others speak by giving
gaps between the words. So the project needs to catch all
kinds oftones and phrases.
2. Vocabulary: To understand what a person is speaking
vocabulary is the most important part. Our system needs Fig.1. Architecture of Speech Recognition
to perform good and it needs to determine the complexity
2. Feature Extraction:
of the system.Now we will see the basic speech
recognition modeland speech to text conversion methods. In this process parameters will be having relation with
speech signals. These parameters will be processed to the
Speech Recognition Model: wave forms. Now we can say that the main focus of these
The Speech recognition model will follow the following parameters are for representation of the input signal. For
steps: this feature extraction process commonly used technique
is
Linear Predictive Coding (LPC): [2] This is the basic Then the signal undergoes these steps:[3]
speech recognition technique. By using this technique a. Pre-processing: This is the first step where speech signals will
the digital signal will be blocked into frames of some X be converted into frames and produces a unique sample.
samples. Each frame will be used to minimize any
discontinuities in signal. Then we get analysis in the last b. Training: This part is the representative of features which use
step. one or more testing patterns which corresponds to speech signals.
2. Recognition Accuracy:
Text to speech conversion follows the following steps: In every sentence, in the target language it is a translation with
Processing of Text: Firstly, input text will be analyzed, then the probability from source language. Higher the probability,
the system handles all the abbreviations and matches the text higher will be the accuracy of translation and vice versa.
and finally it will be converted into some phonetic
representations. IV. 4A.OBSERVATIONS
After converting into phonetics, the next we will be having is
speech synthesis.[10] All observations of different techniques discussed
above:
Speech Synthesis: [5]We had some speech synthetic
techniques. They are as follows: MODELS TECHNIQUES FINDINGS
1. Format Synthesis: Speech Linear Feature
On the basis of parameters, representations of speech of stored. recognition: Predictive extraction
For better performance the combination of cascade and parallel Feature Coding(LPC) method is
formats is used. We can also use these two formats individually, Extraction used. Analysis
but combination of both results best. is done using a
2. Concatenative Synthesis: fixed resolution
This type is used in speech synthesis to generate the sequence along with a
of sounds from different users recordings, which ware stored in frequency
scale.
a database. The recordings may be from phone, diphone and a
triphone. Where phone is a single unit of sound, diphone is for
the signal from the midpoint or change within the phone and Speech Template Based Errors like
triphone is for signal taking in sequence going in between the Recognition: segmentation
phone through the next one or the other. Pattern or
Matching classifications
are avoided.
3C. LANGUAGE TRANSLATION Knowledge System uses
There were many languages in India and all over the world. As Based the
there were many languages we need to have applications and information
the processes which converts a text from one language to other. like phonetics.
In Artificial Intelligence, Machine Translation is a field, which Neural Based Complicated
deals in translating from one language to other. This will be recognition
done by using machine translation system. Now let us see some kind of tasks
of the machine translation models. use this kind of
1. Rule Based Machine Translation:[8] method.
Translation is generated on the basis of analyzing both the Speech to Artificial neural Increases the
source and target languages. So, for such kinds of translations text network based accuracy of
system consists of some rules. The most important rule is conversion speech
grammar. The grammar rules basically contains syntax, recognition.[4]
semantic and parts of speech features.Along with these
grammar structures system should also consists of dictionary
words for translation.
2. Example Based Machine Translation:
In this model we use the texts which were already translated.
The translated texts will be placed again, so that the original
texts will be displayed. These original sentences can be
translated into any of the target language. To form a complete
translation all the phrases are to be kept together.
3. Statistical Machine Translation:
This model is characterized by using Machining Learning
methods. This model treats translation as a mathematical
problem.
Speech recognition:
We need to receive real time speech recognition results as Text to Speech:
APIprocesses audio input.[17] Text-to-speech is a common assistive technology in which a
Speech to Text On- premises: monitor or tablet reads out loud to the user the words on the page.
While grasping Googles speech recognition This technology is common among students who have reading
technology,system need to have full control over disabilities, especially those who have trouble decoding. By
infrastructure and speech data protection in our data centers. viewing the terms in an audible way, the student will reflect on
the meaning of the words rather than expending all of their mental
V. VOICE RECOGNITION AND TEXT resources attempting to They are devoting all of their mental
resources to deciphering the sentences. While this device will
TO SPEECH VOICE RECOGNITION:
help students overcome their reading challenges and gain access
to instructional materials, it does not aid in the development of
Technology in the field of communication is rapidly evolving.
reading skills. The amount of TTS software installed on both
It has become very simple to use voice recognition to enter
Android and Apple devices has steadily increased in recent years.
code,correct pronunciation, and dictate texts. The microphone
It is also become common in the office as a method to assist users
icon is found on most on-screen keyboards, allowing users to
in proofreading their work.
quickly switch from typing to voice recognition. Speech
recognition opens up a world of productive possibilities for
certain disabled people who find it difficult or impossible to VI. CONCLUSION
work with a mouse or keyboard. It may benefit people with
physical disabilities and reduce the risk of repetitive strain We have learned about Voice recognition and text to speech
injury from repeated typing or mouse use by freeing them along with its techniques. Along with these we have also learned
from typing and keyboard use. Dyslexics can write more about some of their applications and their usages. From all the
fluently, correctly, and concisely. Speech recognition can be techniques we had learned about voice recognition and text to
done easilyand is less painful than traditional speech we can conclude as:
handwriting or typing.Enabling voice recognition in systems In voice recognition also known as speech to text, we can
and promoting its use in the workplace can be a quot positive conclude that it works as a better speech signal to text. The only
adjustment quot for employers, avoiding discrimination against drawback in this technique was its feasibility.
disabled employeesand increasing their productivity. Now coming to Text to Speech, it makes use of parallel synthesis
Most devices with capable hardware have voice recognition, which works as best text to speech converter. Also in text to
so higher-end phones and tablets would have strong speech conversion we learned about hybrid machine translation.
microphonesthat enable voice input. This translation technique is used widely because both ruled
Computers, too, often have built-in cameras, microphones, based and statistical based machine translation techniques were
and speakers. Instead of clicking on a keyboard, voice used simultaneously.
recognition can be used. At its most basic level, it offers a
quick way towrite on a computer, tablet, or smartphone. The AKNOWLEDGEMENT
user speaks through a headset, external microphone, or built-in
microphone,and their words appear on the screen as text. This We humbly take this opportunity to express our gratitude to
may be in a search engine text box, a chat or messenger all of the guideposts who served as lightening pillars in guiding
programme, or an email or paper. us through this project, resulting in the fruitful and satisfactory
completion of this research.
Speech recognition is a function of certain systems and
programmes that can be set up to do more than just input text. REFERENCES
It is possible to use it to power computers. With the right
configuration, simple spoken commands will start and shut 1. Bansal, Dipali, Neelam Turk, and Sunanda Mendiratta.
down a computer, as well as open and run various "Automatic speech recognition by cuckoo search optimization
programmes and applications. This is especially important for based artificial neural network classifier." In 2015 International
people with physical disabilities who can control their devices Conference on Soft Computing Techniques and Implementations
using only voice commands. Speech recognition software is (ICSCTI), pp. 29-34. IEEE, 2015.
now integrated into many modern computers, laptops, and 2. Seide, Frank, Gang Li, Xie Chen, and Dong Yu. "Feature
smartphones. However, depending on the system or device, engineering in context-dependent deep neural networks for
specialist software can be required to achieve a high degree of conversational speech transcription." In 2011 IEEE Workshop on
control and functionality. A wide range of potential users will Automatic Speech Recognition & Understanding, pp. 24-29.
benefit greatly from voice recognition. Obviously, someone IEEE, 2011.
with a physical disability who finds typing challenging, 3. Saksamudre, Suman K., P. P. Shrishrimal, and R. R.
painful, or impossible will benefit greatly from it. It may also Deshmukh. "A review on different approaches for speech
help to reduce the risk of developing a repetitive strain injury recognition system." International Journal of Computer
(RSI) orto better control any upper limb condition. Applications 115, no. 22 (2015).
4. Jadhav, Ms Anuja, and Arvind Patil. "Real Time Speech 16. Thomas Hain, Asmaa El Hannani, “Automatic Speech
to Text Converter for Mobile Users." Recognition for scientific purposes.
5. Mache, Suhas R., Manasi R. Baheti, and C. Namrata 17. Anusuya, M. A., and Shriniwas K. Katti. "Speech
Mahender. "Review on text-to-speech recognition by machine, a review." arXiv preprint
synthesizer." International Journal of Advanced Research in arXiv:1001.2267 (2010).
Computer and Communication Engineering 4, no. 8 (2015): 18. Reddy, D. Raj. "Approach to computer speech recognition
54-59. by direct analysis of the speech wave." The Journal of the
6. Kurzekar, Pratik K., Ratnadeep R. Deshmukh, Vishal B. Acoustical Society of America 40, no. 5 (1966): 1273-1273.
Waghmare, and Pukhraj P. Shrishrimal. "A comparative study 19. Weintraub, Mitch, Hy Murveit, Michael Cohen, Patti Price,
of feature extraction techniques for speech recognition Jared Bernstein, Gay Baldwin, and Don Bell. "Linguistic
constraints in hidden Markov model based speech
system." International Journal of Innovative Research in
recognition." In International Conference on Acoustics,
Science, Engineering and Technology 3, no. 12 (2014): 18006-
Speech, and Signal Processing,, pp. 699-702. IEEE, 1989..
18016. 20. Abdulla, Waleed H. "HMM-based techniques for speech
7. Kalyani, Aditi, and Priti S. Sajja. "A review of machine segments extraction." Scientific Programming 10, no. 3
translation systems in india and different translation evaluation (2002): 221-239.
methodologies." International Journal of Computer
Applications 121, no. 23 (2015).
8. Alawneh, Mouiad Fadiel, and Tengku Mohd Sembok.
"Rule-based and example-based machine translation from
English to Arabic." In 2011 Sixth International Conference on
Bio-Inspired Computing: Theories and Applications, pp. 343-
347. IEEE, 2011.
9. Rabiner, Lawrence. "Fundamentals of speech
recognition." Fundamentals of speech recognition (1993).
10. Tokuda, Keiichi, Yoshihiko Nankaku, Tomoki Toda,
Heiga Zen, Junichi Yamagishi, and Keiichiro Oura. "Speech
synthesis based on hidden Markov models." Proceedings of the
IEEE 101, no. 5 (2013): 1234-1252.
11. Anusuya, M. A., and Shriniwas K. Katti. "Speech
recognition by machine, a review." arXiv preprint
arXiv:1001.2267 (2010).
12. Price, Michael, James Glass, and Anantha P. Chandrakasan.
"A low-power speech recognizer and voice activity detector
using deep neural networks." IEEE Journal of Solid-State
Circuits 53, no. 1 (2017): 66-75.
13. Fohr, Dominique, Odile Mella, and Irina Illina. "New
paradigm in speech recognition: deep neural networks."
In IEEE international conference on information systems and
economic intelligence. 2017.
14. Benkerzaz, Saliha, Youssef Elmir, and Abdeslam Dennai.
"A study on automatic speech recognition." Journal of
Information Technology Review 10, no. 3 (2019): 77-85.
15. Katyal, Anchal, Amanpreet Kaur, and Jasmeen Gill.
"Automatic speech recognition: a review." International
Journal of Engineering and Advanced Technology
(IJEAT) 3, no. 3 (2014): 71-74.