Project 1

Received December 29, 2021, accepted January 17, 2022, date of publication January 25, 2022, date of current
version February 4, 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3146320
Smart Assistive System for Visually Impaired

People Obstruction Avoidance Through Object
Detection and Classification
USMAN MASUD 1,2 , TAREQ SAEED3 , HUNIDA M. MALAIKAH4 ,
FEZAN UL ISLAM1 , AND GHULAM ABBAS1
1 Department of Electrical Communication Engineering, University of Engineering and Technology, Taxila 47050, Pakistan
2 Department of Electrical Communication Engineering, University of Kassel, 34127 Kassel, Germany
3 NonlinearAnalysis and Applied Mathematics (NAAM)-Research Group, Department of Mathematics, Faculty of Science, King Abdulaziz University,
Jeddah 21589, Saudi Arabia
4 Department of Mathematics, Faculty of Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
Corresponding author: Usman Masud ([email protected])

This work was supported in part by the German Academic Exchange Service [Deutsche Akademische Austausch Dienst (DAAD)],
and in part by the EU.
ABSTRACT Recent progress in innovation is making the life prosper, simpler and easier for common indi-
vidual. The World Health Organization (WHO) statistics indicate that a large amount of people experience
visual losses, because of which they encounter many difficulties in everyday jobs. Hence, our goal is to
structure a modest, secure, wearable, and versatile framework for visually impaired to help them in their
daily routines. For this, the plan is to make an effective system which will assist visually impaired people
through obstacle detection and scenes classification. The proposed methodology utilizes Raspberry-Pi 4B,
Camera, Ultrasonic Sensor and Arduino, mounted on the stick of the individual. We take pictures of the scene
and afterwards pre-process these pictures with the help of Viola Jones and TensorFlow Object Detection
algorithm. The said techniques are used to detect objects. We also used an ultrasonic sensor mounted on a
servomotor to measure the distance between the blind person and obstacles. The presented research utilizes
simple calculations for its execution, and detects the obstructions with a notably high efficiency. When
contrasted with different frameworks, this framework is a minimal effort, convenient, and simple to wear.
INDEX TERMS Smart system, visual losses, biomedical sensor, object recognition, tensorflow, Viola Jones,
ultrasonic sensor.
I. INTRODUCTION simple and durable, but they were deficient in terms of usage
Life of any individual depends on basic five senses in which and accuracy.
ability of vision is probably the most important one. Visual As the modern world relies on computer and artificial
impairment is a decreased ability to see something to a level intelligence, these techniques became more reliable and effi-
that the eye is not able to see even using usual means, such cient. However, many gaps remain in these technologies.
as lenses or glasses. Visually impaired individuals [1] lack During our study to previous literature, we found that RFID
this sense of vision. Hence, fulfilling the daily tasks of life sensors, Logitech cameras, embedded systems [5] were used
becomes extremely hard for them. This can lead to difficulties in past to design an efficient system. We know that as the
which can only be temporarily subdued by some assisting per- visual impairments diseases are increasing with the passage
sonnel, and cases exist where certain situations might be fatal, of time. Therefore, our motivation of this project is to make
not only for the individual, but also anyone in the surrounding a system which will assist visually defected people by classi-
environment [2]. With the innovations in scientific area, much fying scenes and avoiding obstacles through object detection,
research was proposed that specifies the designing of gadgets to lead a life like other normal people in world. If there is an
for visually impaired individuals. These gadgets [2]–[4] were obstacle in the user’s path, the system will track it, and convey
that user about it.
The associate editor coordinating the review of this manuscript and During this process, there are numerous obstacles that
approving it for publication was Liang-Bi Chen . can appear across the length of the distance [5], [6].
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
13428 VOLUME 10, 2022
U. Masud et al.: Smart Assistive System for Visually Impaired People Obstruction Avoidance
TABLE 1. List of symbols. a sign and a wearable smart glass, a light transmission mecha-
nism works in this manner. In another system [18], Internet of
Things (IOT) has been used which is heavily reliant on data
science and analytics. Data collected may be analyzed and
utilized to identify impediments and improves basic naviga-
tion with haptic and vocal feedback [19]. Using this approach,
identifying everyday things becomes less complicated.
Infrared energy cannot be seen with the naked eye, but it
may be observed using special cameras that convert light into
visible colors. Using this technique [20], infrared technology
is utilized to improve vision impairment, utilizing mobile
phones and an application that runs on an Android-based
smartphones. This leads to picture captioning which refers to
the process of creating a written explanation for an image.
It employs Machine Learning techniques such as Natural
Language. For our dataset photos, National Digital Library
Federation (NDLF) [21] provides relevant and accurate cap-
tions. This technique has an accuracy of 82.39%.
Because of signal losses, a Global Positioning Sys-
tem (GPS) does not operate effectively indoors [22], [23].
To deal with such situations, there is a technique called Pre-
liminary Data Requirements (PDR) that estimates a pedes-
This information is important for the user. For instance, there trian’s position using inertial sensors [2], [25]. Inertial sensors
is a stool across the path of the user and must be recognized. include motion sensors (accelerometers) and rotation sensors
Otherwise, the user can be harmed. This obstacle must be (gyroscope sensors) to uninterruptedly measure by dead reck-
recognized in time, in order that it can be avoided after- oning the position, the orientation, and the velocity (direction
wards [7]–[10]. On the contrary, if there is a something small and speed of movement) of a moving object. Using this
as a pebble, this can be ignored by the system. The reason technology, people with vision impairments may walk around
being that normal persons also ignore small things that do not securely indoors. As it uses GPS for outdoor navigations and
intend to cause any significant harm. Therefore, the system PDR technique for indoor navigation, so it cannot identify the
will also recognize the scenes [2], [11], [12] for the intended objects and assist the vision impairer properly. Thus, it cannot
user. All this information will be communicated in the form be implemented locally.
of voice to the person, in order to avoid any damage. A deep learning approach [24] is used to evaluate safe
and reliable procedure through the data od RGB images and
II. RELATED WORK establish a semantic map that helps in getting good results
Several techniques are used for the navigation of visu- in both indoor and outdoor scenes. In a similar manner, the
ally impaired individuals by health departments, for smart glass system [26], which uses a convolutional neural
instance, Electronic Orientation Aids (EOAs), Place Loca- network to transmit auditory signal through headphones, can
tor Devices (PLDs) and Electronic Travel Aids (ETAs) read the name of the drug. Because of the complexity and
[13]–[15]. The suggested technology not only provides indi- limited processing resources, visual object recognition still
viduals with directions, but it also monitors the health of remains an unresolved topic.
visually impaired persons. A clever ETA is proposed in this Embedded systems are utilized in walking sticks by
study [13], where an ultrasonic sensor is used to detect and vision-impaired people in another technique [27], where noti-
echo sound waves that can detect an obstruction at 5 to 35 cm. fication is done by speech over headphones or speakers,
The suggested approach detects things at the ground and with a voice recorder and playback IC. Raspberry Pi 4 was
waist levels up to a maximum of 2 m using two specified used, which has improved computational capability and is
placements of ultrasonic sensors. There is another approach connected to the pi camera, GPS, and GSM modules. With the
that is based on non-intrusive wearable devices [16]. Here, aid of GPS, this gadget [28] tracks the user’s present location.
three dimensional and semantic data is used to represent When impediments are identified in the path, it also emits a
the scenes in front of the user, and the information is then warning sound.
transmitted to user through text to speech system or haptic In another work [29], Zhe-Wang gathered inside course
feedback. structures for astonished customers that use RGB (Red,
A small and lightweight transmission system that aids the Green, Blue) and depth cameras to get consistent pictures.
blind in sign interpretation is presented in [17] in which wire- The collected images were then pre-handled, utilizing multi
less transmission across the visible light frequency range has scaling technology, and the yield was delivered to the buyer
been proposed for communication. To communicate between through speaker. This structure gives a precision of 73.6
VOLUME 10, 2022 13429

RGB profundity cameras are used by Tian to recognize comes across obstacles on every side, it gives four beeping
steps including walker crosslines [30], [31]. Hough Trans- sounds.
formation calculation was used for setting up the pictures At the same time, our system is also taking pictures
and afterward gathering is finished utilizing the SVM and form the scene, and afterwards sends these pictures for pre-
achieved an accuracy of 93.90 processing. The system pre-processes these pictures with
In another investigation, James [32] created a scene the help of Viola Jones algorithm and TensorFlow Object
enhancement structure that employs the Simultaneous Local- Detection algorithm [9].
ization and Mapping method (SLAM) to detect deterrents. We used Viola Jones for face detection purpose. In this
Camera and glasses transfer images to a PDA [ [34], [36], algorithm [10], the input image is scanned into Grayscale
which then sends these images to the client through earphone. image and then features are extracted by using Haar-like
Investigators create a PC vision-based assistance for seem- feature [11]. The features are extracted by comparing pixels
ingly disabled people that employs a Logitech camera for of different parts of face. Then sum of these pixels is calcu-
image collection [38]. The images were then processed using lated by using an integral image. Afterwards, we trained the
the Hough Transform computation, and the removed features Ada-boost classifier [12] to identify different features from
were requested using the Fuzzy Logic Classifier. the said image. This cascade classifier differentiates between
A multi-object detection technique uses Artificial Intelli- face and non-face regions. Hence, by using the four steps of
gence and smart navigation for visually Impaired people in Viola Jones algorithm mentioned above, we can detect the
which multiple images of objects that are highly relevant to face from the image.
the visually impaired people are used to train a deep learning In the TensorFlow object detection API, the classifier first
model [37]. Point Locus Wearable GPS Pathfinder System divides the picture into a large number of small bounding
has been intended to assist individuals with visual disabilities boxes and then features are extracted from each bounding
to travel outside [39]. In a language of vibration, the frame- box. Afterwards, the overlapping boxes are bounded into a
work imparts to the client so it can direct the user. This uses single box. In this way, objects are detected by using Ten-
the concept of feeling for touch for visually impaired clients. sorFlow Object Detection. The presented research utilizes
simple calculations for its execution. When contrasted with
III. CONSIDERATIONS FOR SYSTEM DESIGN different frameworks, this framework is a minimal effort,
For the sake of optimal performance in our system, we tried convenient, reliable and simple to wear.
different hardware to design a system that will help these So, our next step is the hardware implementation of the
visually impaired people. The hardware, which should be proposed system. The proposed system uses the following
selected, must be efficient in terms of battery life, night vision components for this purpose. We have assembled the com-
and weight. Similarly, we discussed and explored different ponents on a stick including Raspberry pi 4 (4GB RAM),
techniques, but we selected the ones that were simplest and ultrasonic sensor (HC-SR04), Raspberry Pi camera (V2),
easy-to-use algorithms for implementation. Our proposed servo motor (sg90) and battery as a power source. The soft-
project utilizes Raspberry-Pi 4B, camera, Ultrasonic sensor ware part of the project, on which experimental procedures
and Arduino [6]. are done, is performed on a computer having 8GB RAM,
The proposed block diagram of our scheme has been shown 4-cores processor, and a default graphic card of Intel GeForce
in figure 1. Our system measures distance between the blind GT 730M. Keeping these specifications in mind, our compu-
person and the obstacle along the path and detects objects tational process doesn’t seem to be lagging or causing any
using Viola Jones algorithm [7] and TensorFlow object detec- time delays. After training our raspberry pi, it performed even
tion API [8]. In first step, this system measures the distance better. Further details are enlisted about working of hardware
from the obstacle using Ultrasonic sensor and Arduino for part of our project.
this purpose. We programmed Ultrasonic sensor and Arduino 1) We use a Raspberry pi camera to take pictures of the
in such a way that it gives a beeping sound when there is an scenario, whether it is an object or a person. This
obstacle along its path. We used a buzzer for beeping sound. camera is mounted on the stick at some height so that it
First it checks an obstacle on the front side, and if there is can capture the video as in real time. We have consulted
one, then it gives a beeping sound. After checking front side, an eye specialist and according to his recommendations
it checks right side and then left side for obstacles, on similar we placed ultrasonic sensor at ground level as it helps
grounds. At this level, it is important to differentiate the detection in bumps, stairs, and hurdles in the pathway,
obstacles on the left and right side respectively. We handle while camera is positioned in the middle of human
this situation by changing the number of sounds. If there is an body. As we know that a normal lens camera captures
obstacle on the left side, it gives two beeping sounds and for the scene at 120 degrees from all sides. So, at this
right side it gives three beeping sounds. In this way, the user position, it will capture the images perfectly and also
can differentiate between the placement of the obstacle on helps avoiding any contact with user’s hands or body.
either side. Further, we programmed different type of sound 2) The servo motor is attached under the ultrasonic sensor
for each side so the blind person can easily differentiate each so that it rotates in such a way that it tracks and left and
side from the other. In addition, for a situation where the user right of the person in order to clear the path.
13430 VOLUME 10, 2022

FIGURE 1. Proposed block diagram of assistive navigation system.
3) The battery is also used as a power source in order to

provide the required voltage for the operation of the
system which can be recharged and have impressive
amount of battery timing.
4) The Raspberry pi board behaves like a focal preparing
unit for observing and controlling gadgets joined to it.
A. WORKING SCHEME
The camera takes a picture, and sends it to the board along
with the estimated data. The pictures taken by the camera and
snags recognized by the ultrasonic sensor can be checked by
the board for identifying objects around individuals as shown
in Figure 2.
When the ultrasonic sensor detects any object that comes
in front of the person, it will alarm the buzzer. At the same
time, it will check the way on the left and right sides of
the individual, and will advise the individual to turn towards
FIGURE 2. The upper part of the setup is shown. The camera is connected
left or right, where there is more space in the pathway. This with Raspberry Pi module. The HDMI port connects with the computer,
ultrasonic sensor is utilized to gauge the distance of the and electricity is provided through the power cable.
impediments, and therefore, identifies the hindrances in the
climate to give the data regarding deterrent discovery to the results at the distance of one meter. The whole module is
outwardly weakened individuals. In Figure 3, an ultrasonic connected to battery which supplies a power of 10 Volts.
sensor is mounted on a servo-motor (SG-90) [40] that rotates We have attached our ultrasonic sensor with Arduino because
it at 90 degrees (clockwise and anticlockwise) towards both our sensor is mounted on a servomotor which operates with
sides. In this way, this ultrasonic sensor detects an object from a speed of 0.1 second per 60 degrees (0.1 s/60 degree) and
the distance of 100 cm approximately which is equal to one its operating voltage is 4.8V. In this manner, we have used
meter. This range can be increased or decreased depending a buck-boost voltage regulator for the whole module which
upon the requirements but as in our system, it gives better consists of an Arduino uno, servomotor, ultrasonic sensor,
VOLUME 10, 2022 13431

and battery of the smart cane. For further clarification, when

we use servomotor on pi, its movement fluctuates a lot.
However, we needed it to move exactly 90 degrees
right (clockwise) and 90 degrees left (anticockwise), and then
coming back to its original position if no object has been
detected. On account of these technical conditions, we used
Arduino separately for the sensor.
FIGURE 3. The lower part of the setup is shown, in which the ultrasonic
sensor is connected with the servomotor. For the sake of completeness,
the battery and clamps have been indicated as well.
We use a Raspberry pi camera to take pictures of the

scenario, whether it is an object or a person. This camera
is mounted on the stick at some height so that it can cap-
ture the video as in real time. We have consulted two eye
specialists (physicians) and discussed this task with them
in detail, elaborating the technical features. The discussion
reveals that the visually impaired personnel are very much
concerned about their movement, and the device must be
developed to help them. According to their recommendations,
we placed ultrasonic sensor at ground level as it helps detec-
tion in bumps, stairs, and hurdles in the pathway, thereby
helping the person to check the area of movement. On the
contrary, the camera being a sensitive device, must be placed
at a location that is subject to least vibrations, and away FIGURE 4. The complete cane (sum of figures 2 and 3) is shown, with its
components placed at specific locations for the handicapped, and the
from unintentional touch. Under the light of these facts, it is reasons for the placement being justified.
positioned in the middle of cane, which makes its distance
apparently farthest from human body. Another fact that must
be worth-mentioning is that a normal lens camera is capable on camera sensor from environment. There are several steps
to capture the scenes at 120 degrees from all sides, a fact that which are involved before the picture goes to the classifica-
can be perfectly made use of when it has maximum area in tion stage. First, the image is imported in the form of arrays.
the surrounding. Therefore, at this position, it will capture the As some of the images which are taken from the camera differ
images perfectly and also helps in avoiding any contact with in the size, so, we have established a standard size for all
user’s hands or body. This is illustrated in figure 4. images being imported in the algorithm.
After taking the picture, it is very important to remove Afterwards, for noise removal, we used Gaussian Blur [43]
noise and haze from this pictures. The reason is that these that smoothens the images, thereby resulting in a slight
pictures have less degree of perception due to natural impacts reduction in the size of the images. This Gaussian filter acts
13432 VOLUME 10, 2022

as a low pass filter and removes high frequency signals. All parts of the image can be distinguished in this way as
As the picture is in the form of two-dimensional array, the they are not merged with each other. It gives an output that
Gaussian function applies transformation to each pixel in the has a very good quality with appropriate attributes. This pro-
image, thereby resulting in a Gaussian distribution. In case cedure for a precise recognition of upper body area (object)
of the two dimensions, the multiplication of both Gaussian works in particular on the enhancement of face, and dimin-
functions [44] (one in each dimension) results in ishes the difficult assignment of objects identifying upper
2 body areas from unconstrained still pictures and video. The
x + y2

1 cascade object indicator utilizes the Viola-Jones face discov-
G(x, y) = exp − . (1)
2πσ 2 2σ 2 ery calculation to recognize individual’s body and identify
the face object. This model recognizes the upper body area,
Here, σ represents the standard deviation of the Gaussian which is characterized as the region that constitutes the head
distribution. As we are discussing the Gaussian function in and shoulders, just as a sort of rearrangement of chest area
two-dimensional plane, here x and y represent the coordinates and the face. Furthermore, the eyes are recognized as well,
located on X-plane and Y-plane, respectively. dependent on the speculation that they are more obscure than
The next step is the segmentation of the image in which other parts of the face. Hence the eyes are being detected
the background is being separated from objects which helps through rectangular line Haar features. The detection of eyes
to improve our region of interest. This process confines through little changes in the integral image becomes possible
the image by improving edge detection and curves, thereby in this manner. It contains less noise and haze. Therefore,
resulting in a better pre-processed dataset. In this way, the the accuracy of this model is good and has provided us very
picture is now ready to be categorized further, explicitly remarkable results.
undergoing the classification process.
A. DETECTION SCHEME
IV. CLASSIFICATION THROUGH VIOLA JONES 1) First, import the images into the system.
ALGORITHM 2) Apply pre-processing techniques on the images.
The Viola–Jones object detection framework [7], [33], which 3) Pass the images through the Viola Jones algorithm.
was introduced in 2001 by Paul Viola and Michael Jones, 4) Detect the faces using this algorithm.
can be trained to identify the wide range of item types. The 5) If the facial features are detected by the algorithm in
detection framework looks for characteristics that include the the image, then classify it as a face.
sums of picture pixels inside rectangular regions. Viola and 6) If the facial features are not detected, then discard the
Jones’ features are typically more complicated since they said image.
all rely on more than one rectangular region. The value of 7) Check for any possible faces to detect in the image.
each feature is equal to the summation of pixels inside plain 8) If any further facial features are found, classify each of
rectangles less the pixel density within darkened rectangles. them as an image, similar to the steps that have been
This type of rectangular feature is primitive. Furthermore, this provided above.
scheme adapts to vertical and horizontal properties, but with 9) If there are no more facial features which could be
a significantly coarser response. detected by the system, terminate the program.
Viola Jones algorithm is used with slight modification in This algorithm is divided into four stages.
the framework, as shown in figure 5, and provides much 1) Haar Feature Selection
better results. The framework described has achieved remark- 2) Creating an Integral Image
able results because of its implementation and integration 3) Adaboost Training
in OpenCV. The cascaded classifier, which is being used, 4) Cascading Classifiers
has combined the features efficiently. The framework is able
to detect all the parts without any remarkable error. How- B. HAAR FEATURES
ever, it must be pointed out at this level that it takes more All human faces have certain characteristics in common. Haar
time for processing them, so there is a need to reduce time Features can be used to match these regularities. Human faces
delay and the load on our processor. This is important for have a few features, such as the eye region being darker than
the development of the sensor, as it would be used for the the upper cheeks and the nose bridge being whiter than the
visually impaired people. To enhance our idea, we classify the eyes. Numerous qualities combine to produce some match-
features that are being selected through AdaBoost [47]. For able face features like eyes, lips, and nose, and their values are
this purpose, we selected a number of edge-type horizontal determined by directed gradients of pixel intensities. Viola
features and line features. Afterwards, we boosted them in and Jones utilized two-rectangle features. There are three
order to get a strong classification. This is necessary because types: two, three, and four-rectangles,
when there are numerous weak classifiers that are being used X
F(Haar) = FWhite − FBlack , (2)
for face detection, we merge all those weak classifiers and
convert them into a strong classifier. This in turn resulted in where FWhite represents pixels in white area and FBlack repre-
better accuracy and less time consumption. sents pixels in black area.
VOLUME 10, 2022 13433

FIGURE 5. Demonstration of various steps that are involved in Viola Jones Algorithm.
C. CREATING AN INTEGRAL IMAGE a performance edge over more comprehensive versions.

The integral image is an image representation that analy- Because each feature’s rectangular region is always next to
ses rectangle features in real time, thereby providing them at least one other rectangle; any two-rectangle feature, any
13434 VOLUME 10, 2022

three-rectangle feature, and any four-rectangle feature may

be derived in six array references, eight array references, and
nine array references, respectively.
D. ADABOOST TRAINING
To choose the most useful characteristics and train classifiers
that make use of them, the object identification framework
leverages a variation of the learning method AdaBoost [47].
This approach creates a strong classifier by combining
weighted simple weak classifiers in a linear way.
E. CASCADED CLASSIFIER
A Cascade Classifier is a multi-stage classifier that is capable
of fast and precise detection. Each stage consists of a strong
classifier that is produced by the AdaBoost Algorithm. The
number of weak classifiers in a strong classifier grows as it
progresses through the stages. A sequential (stage-by-stage)
FIGURE 6. Detection of objects using cascaded classifier.
evaluation is performed on an input. If a classifier for a
given step returns a negative test, the input is automatically
removed. If the result is favorable, the input is passed on to
the next stage. This multi-stage method, according to Viola the face has been distinguished. Similarly, objects were also
and Jones [7], allows for the development of simpler classi- detected through trained classifiers. Using sequential evalua-
fiers, which can subsequently be used to quickly reject most tion, a maximum efficiency of 92% is achieved for detection.
negative (non-face) data while focusing on positive (facial) Coco (Common objects in Context) dataset [35] is being
data [32]. In other words, we are discriminating the face from used in our system for object detection. This is a large-scale
any other thing in the image that is being analyzed. This dataset that has been used for object identification, image
procedure is diagrammatically illustrated in figure 6. captioning, image segmentation and key point detection in
numerous applications. In our system, we used Coco 2017
V. RESULTS AND DISCUSSION which consists of a diverse range of 123K images.
To recognize frontal upright faces, a multilayer cascaded To check the performance of our system, we took a few
classifier was trained. A set of face and non-facial training images from the camera as to if it identifies the images
pictures are used to train the classifier. The Adaboost training correctly or not. We were satisfied and surprised to see that
technique is used to train each classifier in the cascade. our system gives us quite impressive results as in the pictures
Similar to the discussion for Haar features, there are two types shown below.
of classification of the image pixels. One is Fwhite and other In Figure 9, our system detected a suitcase as shown in the
is FBlack ; a pixel in the image is compared to the gray scale picture with a probability of 97% (which is shown as 0.97 in
and if its value is more than 0.5, it is assigned as FBlack , the user interface of object detection). Here, the decimal in
and if its value is less than 0.5, it is assigned as Fwhite . The points represents the probability of the detected object which
main reason behind this is the fact that as the system works in is basically derived by dividing the efficiency with hundred.
binary values, so we classify the data as binary object whose Moreover, a comparison is also demonstrated to identify the
value can either be 0 or 1. difference between both of these algorithms. We can see a
Following the face detection experiments, different facial clear difference between figures 9a and 9b. In figure 9a,
feature detectors are evaluated in the same way. Since they are there is less opacity and contrast between foreground object
embedded within the face, their results are shown individually and the background, in comparison with figure 9b. After the
to indicate that they belong to patterns of smaller size than a processing is done, noise has been removed from the image
face or a head. and hence the image becomes clearer.
Five distinct targets are considered in the outcomes as Similarly, in figure 10, our system detected the keyboard
shown in the graph in figure 8 that has been presented below; which is in front of the camera with a probability of 76%.
the detection of the left eye, the right eye, the two eyes Here, our accuracy was slightly compromised because of
together, the nose, and the mouth. A comparison is demon- the reason that the taken image was not precisely in the
strated between obtained the results and algorithm accuracies. frame, so it makes difficult for the algorithm to identify it
Thus, we compared our classifications with the original Viola as a keyboard. After discussion among our technical team,
Jones algorithm as shown in Figure 8. another reason could be the difference in reflection from the
This model was utilized to assess the eye recognition and various keys of the keyboard, as these might be recognized
the initial attempts have shown significant outcomes. Each as more than a single object. This is the only drawback of
of the classifiers, identifies over 90% of the pictures once the system in the sense that it identifies the object in this
VOLUME 10, 2022 13435

FIGURE 7. Demonstration of Haar features. The face is detected, and afterwards, the eyes, the nose and the lips are recognized.
FIGURE 8. Accuracy comparison between proposed and Viola Jones algorithm.
particular situation, regardless of the frame. The underlying out of frame, or distortion in the captured image. Moreover,
reason is that when a person who has been visually impaired, we can see enhanced result in figure 10b which is clearer than
is moving along with the stick, the camera will move as well, figure 10a. However, after numerous attempts, we found that
and shake abruptly. This effect can cause the picture to be the keyboard was successfully identified, which is the main
13436 VOLUME 10, 2022

FIGURE 9. Comparison between the captured image in both cases, with a clear difference between the results.
objective in this situation, helping the person to recognize and however, after communicating with three visually handi-
avoid along the specific path. capped persons, the situation becomes very important. A clear
Similarly, our system detected the image of a backpack demarcation between the hindering object and the surround-
with the probability of 0.97 which means an efficiency ing must be identified, in order for the person to pave a way
of 97%, as shown in the results in figure 11. Both images to walk along in an easy way.
are compared and the second one shows better results. It has In figure 12, we have shown that our system detected a
been observed that it contains much less noise and haze and bottle from an image which has been taken from the scenario,
has better level of brightness on foreground object. As a thereby giving an efficiency of 70%. The reason behind this
result, we can obviously see the backpack in fig 11b has is obvious that the picture of bottle is not in a proper frame.
an enhanced view than the one in 11a. This may not seem The picture shows elements of colourlessness. As a result, the
to be a worthwhile outcome at first for a normal person, system finds it hard to differentiate between the background
VOLUME 10, 2022 13437

area and the area which is covered by the bottle. To further there is a variation in the images of cellphones, on which
investigate such scenarios in our system, we have tried to use the dataset is trained, thus better results are obtained. More-
another camera which is not capable of night vision, and the over, we can see that figure 13a looks less bright and con-
outcomes had been worse. After discussing with the technical tains hue and moist noise in the image. On the contrary,
staff of both cameras, we opted for the night vision camera, in fig 13b, the image contains an enhanced clarity. This
which lacks a bit of accuracy, but there is more brightness clearness becomes a pathway for better detection and distinc-
in the image. However, it successfully detects the periphery tion in terms of performance. In comparison to the results
of the object, which, again, is the main aim for the visually relating the detection of the keyboard, the efficiency is bet-
handicapped personnel. ter as the fluctuation in colours does not occur on a fre-
In figure 13, our system detected a cell phone in the quent basis. We checked this by repeating this experiment
hands of user with an efficiency of 84%. In this image, four times, and the results were found to be consistently
we got a better efficiency in detecting cellphones because accurate.
13438 VOLUME 10, 2022

To enhance our detection scheme, we resort to check facts, these are some of the bonus points of our system which
numerous obstructions in a single image. In figure 14, our will help the system to detect the obstructions in a better way,
system detected multiple objects as bed and chair within a and will give a worthwhile assistance to the visually impaired
single image, thereby giving a probability of 0.71 and 1.00, user.
which means an efficiency of 71% and 100%, respectively.
In this image the perspective being shown is that the system VI. CONCLUSION
can detect multiple objects without any difficulty, as we can This work is related to the design of a system for the visually
see that the chair can be perfectly detected. As we can see in impaired person that could help their lifestyle in a much
figure 14a, there is a lack of transparency in the image as well better way. The system combines the functions of various
as sharp edges for certain pixels which makes it less clear or components to create a multifunctional device for blind and
blur. On the contrary, in figure 14b, there is no blurriness or vision impairers. The device is built in such a way that it
haziness in the image, and has a clear view. In light of these may be used on the go. We used Viola Jones algorithm for
VOLUME 10, 2022 13439

detection purpose as the detection framework looks for char- [3] K. Naqvi, B. Hazela, S. Mishra, and P. Asthana, ‘‘Employing real-time
acteristics that include the sums of picture pixels inside rect- object detection for visually impaired people,’’ in Data Analytics and Man-
agement (Lecture Notes on Data Engineering and Communications Tech-
angular regions. Viola Jones algorithm is considered to be nologies), vol. 54, A. Khanna, D. Gupta, Z. Pólkowski, S. Bhattacharyya,
more complicated as more than one rectangular features is and O. Castillo, Eds. Singapore: Springer, 2021, pp. 285–299, doi:
involved in the process, but it provides an ease of implement- 10.1007/978-981-15-8335-3_23.
[4] U. Masud, ‘‘Investigations on highly sensitive optical semiconductor laser
ing under a confined dataset. When obstacles are identified based sensorics for medical and environmental applications,’’ in The
in the path, the gadget will issue a warning through sound Nanonose, vol. 3862195554. Kassel, Germany: Kassel Univ. Press, 2015.
and haptic feedback. Because all the data is saved and pro- [5] N. E. Albayrak, ‘‘Object recognition using TensorFlow,’’ in Proc.
IEEE Integr. STEM Educ. Conf. (ISEC), Aug. 2020, p. 1–5, doi:
cessed on the Raspberry Pi, it does not require an internet 10.1109/ISEC49744.2020.9397835.
access to operate. This is an extra benefit given that the [6] Q. Zhou, J. Qin, X. Xiang, Y. Tan, and N. N. Xiong, ‘‘Algorithm of hel-
internet might not be consistently available along the user’s met wearing detection based on AT-YOLO deep mode,’’ Comput., Mater.
Continua, vol. 69, no. 1, pp. 159–174, 2021.
pathway. Under the scope of various circumstances that have
[7] M. Castrillón, O. Déniz, D. Hernández, and J. Lorenzo, ‘‘A comparison of
been encountered in this work, the overall system gives us face and facial feature detectors based on the Viola–Jones general object
an average efficiency of about 91% which is also a great detection framework,’’ Mach. Vis. Appl., vol. 22, no. 3, pp. 481–494, 2010.
enhancement for our project. Moreover, it has a recharge- [8] M. N. Chaudhari, M. Deshmukh, G. Ramrakhiani, and R. Parvatikar, ‘‘Face
detection using Viola Jones algorithm and neural networks,’’ in Proc. 4th
able battery whose time is around 24 hours, so the user can Int. Conf. Comput. Commun. Control Autom. (ICCUBEA), Aug. 2018,
recharge it during night. As the system is integrated with pp. 1–6, doi: 10.1109/ICCUBEA.2018.8697768.
VNC Viewer, it can be connected to cell phone of the person. [9] N. Sakic, M. Krunic, S. Stevic, and M. Dragojevic, ‘‘Camera-
LIDAR object detection and distance estimation with application in
The KNFB reader may be used to convert text to voice. With a collision avoidance system,’’ in Proc. IEEE 10th Int. Conf. Con-
single tap, the KNFB Reader transforms text into high-quality sum. Electron. (ICCE-Berlin), Nov. 2020, pp. 1–6, doi: 10.1109/ICCE-
voice, providing accurate, rapid, and efficient access to single Berlin50680.2020.9352201.
[10] N. N. Mohd and R. Latchmanan, ‘‘An ultrasonic sensing system for
and multi-page texts. We believe that this system can be a assisting visually impaired person,’’ J. Teknol., vol. 78, nos. 7–4, pp. 1–5,
steppingstone for greater advancements in helping visually Jul. 2016, doi: 10.11113/jt.v78.9433.
impaired individuals. [11] Getting the Model From TensorFlow Hub. [Online]. Available:
https://www.coursera.org/lecture/advanced-computer-vision-with-
To extend this work, we plan to add a text-to-speech system
tensorflow/getting-the-model-from-tensorflow-hub-tZDzl
and embed it with the GSM module so that blind person can [12] Z. Li, F. Song, B. C. Clark, D. R. Grooms, and C. Liu, ‘‘A wearable device
actually hear the directions in form of voice. In this way, the for indoor imminent danger detection and avoidance with region-based
user can connect with the family and loved ones by sharing ground segmentation,’’ IEEE Access, vol. 8, pp. 184808–184821, 2020,
doi: 10.1109/ACCESS.2020.3028527.
the precise location, through GSM. This can also be helpful [13] Face Detection System Based on Viola—Jones Algorithm, Interna-
if the visually impaired person loses the specific route that tional Journal of Science and Research (IJSR), Ahmedabad, Gujarat,
has to be followed. For connecting the text-to-speech system 2016.
[14] U. Masud and M. I. Baig, ‘‘Investigation of cavity length and mode
with the cell phone, the user can use a paid application like spacing effects in dual-mode sensor,’’ IEEE Sensors J., vol. 18, no. 7,
the KNFB reader [45], [46], which can be used to convert pp. 2737–2743, Apr. 2018.
text to voice. With a single tap, the KNFB reader transforms [15] U. Masud, M. I. Baig, and A. Zeeshan, ‘‘Automatization analysis of the
extremely sensitive laser-based dual-mode biomedical sensor,’’ Lasers
text into high-quality voice, providing accurate, rapid, and Med. Sci., vol. 35, no. 7, pp. 1531–1542, Dec. 2019, doi: 10.1007/s10103-
efficient access to single and multi-page texts. We believe that 019-02945-8.
this system can be a steppingstone for greater advancements [16] Z. Bauer, A. Dominguez, E. Cruz, F. Gomez-Donoso, S. Orts-Escolano,
in helping visually impaired individuals. This would enhance and M. Cazorla, ‘‘Enhancing perception for the visually impaired with
deep learning techniques and low-cost wearable sensors,’’ Pattern Recog-
possibilities for helping not only the people with eyesight, but nit. Lett., vol. 137, pp. 27–36, Dec. 2020, doi: 10.1016/j.patrec.2019.
also paves ways to enhance the ideas of biomedical sensor for 03.008.
the elderly citizens [48]. [17] U. Pilania, A. Kaushik, Y. Vohra, and S. Jadaun, Smart Blind
Stick for Blind People (Lecture Notes in Networks and Systems).
Springer, 2021. [Online]. Available: https://books.google.com.pk/books?
ACKNOWLEDGMENT id=71IzEAAAQBAJ&dq=U.+Pilania,+A.+Kaushik,+Y.+Vohra,+and+
Besides the family’s moral support, the detailed discussion on S.+Jadaun,+Smart+Blind+Stick+for+Blind+People+(Lecture+Notes+in+
Networks+and+Systems)&source=gbs_navlinks_s
a continual basis with the involved technical members of the [18] P. Viola and M. Jones, ‘‘Rapid object detection using a boosted cascade
group, as well as the vendors of the used gadgetry, is greatly of simple features,’’ in Proc. Comput. Soc. Conf. Comput. Vis. Pattern
appreciated. Recognit., 2001, pp. 1–5.
[19] M. Aghagolzadeh, H. Soltanian-Zadeh, and B. N. Araabi, ‘‘Multiscale face
detection: A new approach to robust face detection,’’ in Proc. Int. Conf.
REFERENCES Fuzzy Syst., Jul. 2006, pp. 1229–1234.
[1] H. Sharma, M. Tripathi, A. Kumar, and M. S. Gaur, ‘‘Embedded assistive [20] K. S. Yadav and J. Singha, ‘‘Facial expression recognition using mod-
stick for visually impaired persons,’’ in Proc. 9th Int. Conf. Comput., ified Viola-john’s algorithm and KNN classifier,’’ Multimedia Tools
Commun. Netw. Technol. (ICCCNT), Jul. 2018, pp. 1–6, doi: 10.1109/ICC- Appl., vol. 79, no. 19, pp. 13089–13107, May 2020. [Online]. Available:
CNT.2018.8493707. https://link.springer.com/article/10.1007/s11042-019-08443.
[2] S. K. Jarraya, W. S. Al-Shehri, and M. S. Ali, ‘‘Deep multi- [21] I. Chawla, ‘‘Face detection & recognition using tensor flow: A review,’’
layer perceptron-based obstacle classification method from partial Int. J. Comput. Technol., vol. 18, pp. 7381–7388, Nov. 2018.
visual information: Application to the assistance of visually [22] A. Nishajith, J. Nivedha, S. S. Nair, and J. Mohammed Shaffi, ‘‘Smart cap-
impaired people,’’ IEEE Access, vol. 8, pp. 26612–26622, 2020, doi: wearable visual guidance system for blind,’’ in Proc. Int. Conf. Inventive
10.1109/ACCESS.2020.2970979. Res. Comput. Appl. (ICIRCA), Jul. 2018, pp. 275–278.
13440 VOLUME 10, 2022

[23] O. Butt, T. Saeed, H. Elahi, U. Masud, and U. Ghafoor, ‘‘A predictive [47] Y. Freund and R. E. Schapire, ‘‘A short introduction to boost-
approach to optimize a HHO generator coupled with solar PV as a stan- ing,’’ Tech. Rep., 1999. [Online]. Available: http://citeseerx.ist.psu.edu/
dalone system,’’ Sustainability, vol. 13, Oct. 2021, Art. no. 12110, doi: viewdoc/summary?doi=10.1.1.41.5846.
10.3390/su132112110. [48] U. Masud, F. Jeribi, M. Alhameed, F. Akram, A. Tahir, and
[24] Y. Lin, K. Wang, W. Yi, and S. Lian, ‘‘Deep learning based wearable M. Y. Naudhani, ‘‘Two-mode biomedical sensor build-up:
assistive system for visually impaired people,’’ in Proc. IEEE/CVF Int. Characterization of optical amplifier,’’ Comput., Mater. Continua,
Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 2549–2557, doi: vol. 70, no. 3, pp. 5487–5489, 2022.
10.1109/ICCVW.2019.00312.
[25] Embedded Assistive Stick for Visually Impaired Persons. [Online].
Available: https://ieeexplore.ieee.org/document/8493707/
[26] P. Adarsh, P. Rathi, and M. Kumar, ‘‘YOLO V3-tiny: Object detection and
recognition using one stage improved model,’’ in Proc. 6th Int. Conf. Adv.
Comput. Commun. Syst. (ICACCS), Mar. 2020, pp. 687–694.
[27] P. Banerjee, Face Space Boundary Selection for Face Detection and Recog-
nition. Boca Raton, FL, USA: CRC Press, 2015, pp. 161–186. USMAN MASUD received the B.Sc. degree in
[28] G. Hua, ‘‘Face recognition by discriminative orthogonal rank-one tensor electrical engineering from the University of Engi-
decomposition,’’ in Recent Advances in Face Recognition. 2008. neering and Technology, Taxila, in 2005, the
[29] R. Khan, A. Ghafoor, and N. I. Rao, ‘‘A blind image adaptive watermarking M.S. degree in electrical engineering, in Germany,
scheme for audio using wavelet transform,’’ in Proc. Int. Conf. Digit. Image in 2010, and the Ph.D. degree, in 2014. His areas
Process., Mar. 2009, pp. 61–72. of expertise include laser systems, biomedical sen-
[30] L. Dan, L. Chang, Z. Wei-Dong, Y. Tao, and L. Y. Hong, ‘‘Supervised non- sors, spectroscopic applications, and wireless net-
negative tensor learning with orthogonal constraint for face detection,’’ works. He has been involved in multiple research
J. Converg. Inf. Technol., vol. 8, no. 3, 2013. areas at the moment and finds deep interest in
[31] S. Vaidya, N. Shah, N. Shah, and R. Shankarmani, ‘‘Real-time object detec-
laser-based biomedical applications. He has been
tion for visually challenged people,’’ in Proc. 4th Int. Conf. Intell. Com-
an active member of Verband der Elektrotechnik, Elektronik und Informa-
put. Control Syst. (ICICCS), May 2020, pp. 311–316, doi: 10.1109/ICI-
CCS48265.2020.9121085. tionstechnik e.V. (VDE) for several years.
[32] H. Li, ‘‘Improving the generalization capability of face spoofing detec-
tion,’’ Ph.D. dissertation, Nanyang Technol. Univ., Singapore, 2018.
[Online]. Available: https://dr.ntu.edu.sg/handle/10356/89790
[33] N. S. Ahmad, N. L. Boon, and P. Goh, ‘‘Multi-sensor obstacle detection
system via model-based state-feedback control in smart cane design for
the visually challenged,’’ IEEE Access, vol. 6, pp. 64182–64192, 2018, doi:
10.1109/ACCESS.2018.2878423. TAREQ SAEED received the B.S. degree in mathematics from King Abdu-
[34] H. Ahmad, A. Tonelli, M. Crepaldi, C. Martolini, E. Capris, and laziz University (KAU), Jeddah, Saudi Arabia, the M.S. degree in financial
M. Gori, ‘‘Audio-visual thumble (AVT): A low-vision rehabilita-
mathematics from the University of Wollongong, and the Ph.D. degree
tion device using multisensory feedbacks,’’ in Proc. 42nd Annu.
from Griffith University, in 2018. Currently, he is working as an Assistant
Int. Conf. Eng. Med. Biol. Soc. (EMBC), 2020, pp. 3913–3916, doi:
10.1109/EMBC44109.2020.9175475.
Professor with the Mathematics Department, KAU.
[35] Common Objects in Conext: Object Detection. Accessed: Oct. 2,2002.
[Online]. Available: https://cocodataset.org/#overview
[36] U. Masud, M. Ali, and M. Ikram, ‘‘Calibration and stability of highly
sensitive fibre based laser through relative intensity noise,’’ Phys. Scr.,
vol. 95, Feb. 2020, Art. no. 055505, doi: 10.1088/1402-4896/ab7540.
[37] R. Joshi, S. Yadav, M. Dutta, and C. Travieso-Gonzalez, ‘‘Efficient
multi-object detection and smart navigation using artificial intelligence HUNIDA M. MALAIKAH received the B.Sc. degree from King Abdulaziz
for visually impaired people,’’ Entropy, vol. 22, p. 941, Oct. 2020, doi: University, and the M.Sc. degree in applied numerical computation and the
10.3390/e22090941. Ph.D. degree in financial mathematics from The University of Manchester,
[38] S. Singh and B. Singh, ‘‘Intelligent walking stick for elderly and blind
U.K. His research interests include the stock price prediction, the stock
people,’’ Int. J. Eng. Res. Technol., vol. 9, no. 3, Mar. 2020. [Online].
price distribution, the volatility models, numerical solution for the fractional
Available: https://www.ijert.org/intelligent-walking-stick-for-elderly-and-
blind-people differential equations and stochastic differential equations, and financial
[39] Z. Xu, H. Chen, and Z. Li, ‘‘Blind image deblurring using group sparse rep- models.
resentation,’’ Digit. Signal Process., vol. 102, Oct. 2020, Art. no. 102736,
doi: 10.1016/j.dsp.2020.102736.
[40] Servo Motor SG-90. Accessed: Oct. 24 2018. [Online]. Available:
https://components101.com/motors/servo-motor-basics-pinout-datasheet
[41] Buck-Boost Converter: What is it. Accessed: Oct. 5, 2020. [Online].
Available: https://www.electrical4u.com/buck-boost-converter/
[42] Raspberry Pi 4 Your Tiny, Dual-Display, Desktop Computer. FEZAN UL ISLAM received the bachelor’s degree from the Faculty of
Accessed: Aug. 15, 2020. [Online]. Available: https://www.raspberrypi.
Electrical and Electronics Engineering, University of Engineering and Tech-
com/products/raspberry-pi-4-model-b/
nology, Taxila.
[43] M. J. Jones and P. Viola, Fast Multi-View Face Detection. Cambridge, MA,
USA: Mitsubishi Electric Research Laboratories, 2003.
[44] M. Karimi, N. Soltanian, S. Samavi, K. Najarian, N. Karimi, and
S. M. R. Soroushmehr, ‘‘Blind stereo image quality assessment inspired by
brain sensory-motor fusion,’’ Digit. Signal Process., vol. 91, pp. 91–104,
Aug. 2019, doi: 10.1016/j.dsp.2019.03.004.
[45] The History of KNFB Reader. Accessed: Aug. 20, 2021. [Online]. Avail-
able: https://knfbreader.com/about-knfb-reader
[46] L. Neat, R. Peng, S. Qin, and R. Manduchi, ‘‘Scene text access: GHULAM ABBAS received the bachelor’s degree from the Faculty of
A comparison of mobile OCR modalities for blind users,’’ in Proc. Electrical and Electronics Engineering, University of Engineering and Tech-
24th Int. Conf. Intell. User Interface, Mar. 2019, pp. 197–207, doi: nology, Taxila.
10.1145/3301275.3302271.
VOLUME 10, 2022 13441

Project 1

Uploaded by

Copyright:

Available Formats

Project 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project 1

Uploaded by

Copyright:

Available Formats

Received December 29, 2021, accepted January 17, 2022, date of publication January 25, 2022, date of current

version February 4, 2022.

Smart Assistive System for Visually Impaired

Corresponding author: Usman Masud ([email protected])

VOLUME 10, 2022 13429

13430 VOLUME 10, 2022

FIGURE 1. Proposed block diagram of assistive navigation system.

3) The battery is also used as a power source in order to

VOLUME 10, 2022 13431

and battery of the smart cane. For further clarification, when

We use a Raspberry pi camera to take pictures of the

13432 VOLUME 10, 2022

VOLUME 10, 2022 13433

C. CREATING AN INTEGRAL IMAGE a performance edge over more comprehensive versions.

13434 VOLUME 10, 2022

three-rectangle feature, and any four-rectangle feature may

VOLUME 10, 2022 13435

FIGURE 8. Accuracy comparison between proposed and Viola Jones algorithm.

13436 VOLUME 10, 2022

VOLUME 10, 2022 13437

13438 VOLUME 10, 2022

VOLUME 10, 2022 13439

13440 VOLUME 10, 2022

VOLUME 10, 2022 13441

You might also like