Ammar Mohammed Mohd Shahrizal Bin Sunar Md.Sah Hj.Salam.
MaGIC-X UTM-IRDA MaGIC-X UTM-IRDA Faculty of Computing
Universiti Teknologi Malaysia Universiti Teknologi Malaysia Universiti Teknologi Malaysia
Johor Bahru, Malaysia Johor Bahru, Malaysia Johor Bahru, Malaysia
[email protected] [email protected] [email protected]

Abstract - This paper describes challenges and solutions for Orthographic variations and the use of diacritics and glyphs in
building a successful voice search system as applied to Quranic the representation of the language of Classical Arabic increase
verses. The paper describes the techniques used to deal with an the difficulty of stemrning[5]. Many verses are similar and even
finite vocabulary how modelling completely in the voice domain identical.
for language model and dictionary can avoid some system
Searching for similar words (e.g verses) could return thousands
complexity, and how we built dictionaries, language and acoustic
models in the framework. The Holy Quran is written in Arabic ofverses, that when displayed completely or partly as list would
language, and the Arabic is one of the oldest languages In the world make analysis and understanding difficult and confusing.
that presents its own features and challenges while searching for Moreover it would be visually impossible to instantly figure out
Arable-based content. The most search systems for the Holy the overall distribution of the Identified or retrieved verses in
Quran Is organized around text words (contained in the target the Quran. [4]
verses) but no system organized around voice while there is need
to search in quranic recitation. The speech recognition approaches
are applied to build the dictionary and test the system, while the
MFCC and stemming techniques will be will be applied to find the II. LITERATURE REVIEW
stem of the word. The development of voice search for Quranic
recitation led to a significant simplification of the original process
to build a system to retrieving audio information related to the Using our voice to access information has been a part ofscience
Qur'an, it also helps to build a system to serve people with special fiction ever. Today, with powerful smartphones, cloud-based
needs such as bllnd,paralyzed, and others who can not use the computing, and speech recognition techniques science fiction is
keyboard. becoming reality.
One of the important voice applications is is a speech
Keywords- voice search; Quranic recitation; Quranic acuistic; recognition technology that allows users to search by saying
Speech recognition terms aloud rather than typing them into a search field. The
information normally exists in a large database, and the query
has to be compared with a field in the database to obtain the
relevant information[6, 7].
The proliferation of smart phones and other small, Web-enabled
mobile devices has spurred interest in voice search.
voice search the ability to perform web searches simply by
speaking, first appeared around 2008 on iPhone and Android The voice applications are available in the markets which
phone in US English, Soon after a several of researchers recites the holy Quran. One of the most popular and commonly
focusing on developing voice search systems for other used is Quran Auto Reciter (QAR) [8], and some ofapplications
languages.[I] depend on speech recogniton techniques like Hafas[9] and
Arabic language is not similar to US English, it is a semantic Ehafiz[IO, II].
language with a composite morphology . Arabic words are This study proposes the system the ability to search in recitation
categorized as particles, nouns, or verbs. Unlike most western by voice, And the expected benefit of Quranic voice search
languages, Arabic script writing orientation is from right to left. system is using in retrieving audio information related to the
There are 28 characters in Arabic. The characters are connected Holy Quran, as well as a systems to serve people with special
and do not start with capital letter as in English. Most of the needs who can not use the keyboard.
characters differ in shape based in their position in the sentence
and adjunct letters.[2, 3]
The Quran is the holy book of Islam,originally written in
In 2008, a new version of GOOG-411 has been deployed which
Classical Arabic language, consists of 6236 verses divided into
allowed (and encouraged) the user to state their need in a single
114 chapters called suras. Each surah also differs from one
utterance rather than in sequential utterances that split apart the
another in terms of the number of verses (ayat) [3,4].
location and the business. This was motivated by our desire to


