Development of An Android Application For Recognizing Handwritten Text On Mobile Devices
Development of An Android Application For Recognizing Handwritten Text On Mobile Devices
Development of An Android Application For Recognizing Handwritten Text On Mobile Devices
By
AKARSH SAXENA (1508210013)
DIVYA GUPTA (1508210045)
AAKASH KUMAR (1508210001)
BABITA (1508210037)
(2019-20)
SUPERVISOR
CERTIFICATE
ACKNOWLEDGEMENT
We extend our sincere efforts and gratitude towards VIKAS KUMAR (Head Of Dept.)
forgiving invaluable knowledge and wonderful technical guidance. We are thankful to all the
Technology, Moradabad for their valuable guidance and technical comments with
motivation. We are highly grateful to them for their guidance, constant encouragement, moral
We wish to express our gratitude to all other faculty members of Computer Science &
Engineering Department, who with their encouraging and most valuable suggestions have
contributed directly or indirectly towards completion of this project report. We owe a debt
of gratitude to our parents for their consistent support and meaningful suggestions.
ABSTRACT
further the limits of human outreach in various fields of technology. One such
Character Recognition).
In this fast paced world there is an immense urge for the digitalization of printed
is still some gap in this area even today. OCR techniques and their continuous
improvisation from time to time is trying to fill this gap. This project is about
TABLE OF CONTENTS
Chapter 1: Introduction 8
1.2: Approach 9
2.2: Flask 11
2.3: Python 11
3.1: Introduction 13
5.1: Simulation 25
5.2: Results 31
6.2: Dilation 33
6.3: Segmentation 33
Chapter 8: Segmentation 37
13.1: Introduction 48
13.1.1: Purpose 48
13.1.2: Scope 48
13.1.3: References 49
13.1.4: Overview 49
13.4.2.1 Security 51
13.4.2.2 Reliability 51
13.4.2.3 Maintainability 51
13.4.2.4 Portability 51
Project Overview 53
References 56
vii
LIST OF FIGURES
CHAPTER 1
INTRODUCTION
The idea is to device efficient algorithms which get input in digital image format. After
that it processes the image for better comparison. Then after the processed image is
compared with already available set of font images. The last step gives a prediction of the
character in percentage accuracy.
9
The objective of this project is to identify handwritten characters with the use of neural
networks. We have to construct suitable neural network and train it properly. The program
should be able to extract the characters one by one and map the target output for training
purpose. After automatic processing of the image, the training dataset has to be used to
train “classification engine” for recognition purpose. The program code has to be written
in JAVA and supported with the usage of Graphical User Interface (GUI) using android.
Another objective of this project is to translate language, so this application can also be
used as a language translator.
1.2 APPROACH
• Feature extraction
• Recognition
• Language translator
10
CHAPTER 2
Android Studio is the official integrated development environment (IDE) for Google's
Android operating system, built on JetBrains' IntelliJ IDEA software and designed
and Linux based operating systems. It is a replacement for the Eclipse Android
Development Tools (ADT) as the primary IDE for native Android application
development.
Features
• Lint tools to catch performance, usability, version compatibility and other problems
Built-in support for Google Cloud Platform, enabling integration with Firebase Cloud
Android Virtual Device (Emulator) to run and debug apps in the Android studio.
Android Studio supports all the same programming languages of IntelliJ (and C Lion) e.g.
Java, C++, and more with extensions, such as Go; and Android Studio 3.0 or later
11
supports Kotlin and "Java 7 language features and a subset of Java 8 language features
because it does not require particular tools or libraries. It has no database abstraction
layer, form validation, or any other components where pre-existing third-party libraries
provide common functions. However, Flask supports extensions that can add application
features as if they were implemented in Flask itself. Extensions exist for object-relational
mappers, form validation, upload handling, various open authentication technologies and
several common framework related tools. Extensions are updated far more regularly than
Applications that use the Flask framework include Pinterest, LinkedIn, and the
2.3 PYTHON
Guido van Rossum and first released in 1991, Python has a design philosophy that
that enable clear programming on both small and large scales. Van Rossum led the
Python interpreters are available for many operating systems. C Python, the reference
development model. Python and C Python are managed by the non-profit Python Software
Foundation.
13
CHAPTER 3
3.1 INTRODUCTION
An early phase of Neural Network was developed by Warren McCulloch and Walter Pitts
in 1943 which was a computational model based on Mathematics and algorithm. This
model paved the way for research which was focused on the application of Neural
Networks in Artificial Intelligence.
Artificial neural network is basically a mesh of large number of interconnected cells. The
arrangement of cells are such that each cell receives an input and drives an output for
subsequent cells. Each cell has a pre-defined
The diagram below is a block diagram that depicts the structure and work flow of a
created Artificial Neural Network. The neurons are interconnected with each other in a
serial manner. The network consist of a number of hidden layers depending upon the
resolution of comparison of inputs with the dataset.
Below is a set of characters written in binary form of 7x5 sized matrix of 26 capital
English letters:
15
0 0 1 0 0
0 1 0 1 0
0 1 0 1 0
1 0 0 0 1
1 1 1 1 1
1 0 0 0 1
1 0 0 0 1
1 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 1 1 1 0
0 1 1 1 0
1 0 0 0 1
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 1
0 1 1 1 0
1 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 1 1 1 0
16
1 1 1 1 1
1 0 0 0 0
1 0 0 0 0
1 1 1 1 0
1 0 0 0 0
1 0 0 0 0
1 1 1 1 1
1 1 1 1 1
1 0 0 0 0
1 0 0 0 0
1 1 1 1 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 0 0
1 0 0 0 0
1 0 1 1 1
1 0 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 1 1 1 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
17
0 1 1 1 0
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
0 1 1 1 0
1 1 1 1 1
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
1 0 1 0 0
0 1 0 0 0
1 0 0 0 1
1 0 0 1 0
1 0 1 0 0
1 1 0 0 0
1 0 1 0 0
1 0 0 1 0
1 0 0 0 1
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 1 1 1 1
18
1 0 0 0 1
1 1 0 1 1
1 0 1 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 1 0 0 1
1 1 0 0 1
1 0 1 0 1
1 0 0 1 1
1 0 0 1 1
1 0 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
0 1 1 1 0
1 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 1 1 1 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
19
0 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 1 0 1
1 0 0 1 0
0 1 1 0 1
1 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 1 1 1 0
1 0 1 0 0
1 0 0 1 0
1 0 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 0 0 0
0 1 1 1 0
0 0 0 0 1
1 0 0 0 1
0 1 1 1 0
1 1 1 1 1
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
20
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
0 1 0 1 0
0 0 1 0 0
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 0 1 0 1
1 1 0 1 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
1 0 0 0 1
1 0 0 0 1
21
1 0 0 0 1
1 0 0 0 1
0 1 0 1 0
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
1 1 1 1 1
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
0 1 0 0 0
1 0 0 0 0
1 1 1 1 1
CHAPTER 4
Pre-processing of the sample image involves few steps that are mentioned as follows:
Binarization
Binarization of an image converts it into an image which only have pure black and pure
white pixel values in it. Basically during binarization of a grey-scale image, pixels with
intensity lower than half of the full intensity value gets a zero value converting them into
black ones. And the remaining pixels get a full intensity value converting it into white
pixels.
Inversion
Inversion is a process in which each pixel of the image gets a colour which is the inverted
colour of the previous one. This process is the most important one because any character
on a sample image can only be extracted efficiently if it contains only one colour which is
distinct from the background colour. Note that it is only required if the objects we have to
identify if of darker intensity on a lighter background.
The flow chart shown below illustrates the physical meaning of the processes that are
mentioned above:
Features of a character depicts the morphological and spatial characteristics in the image.
Feature extraction is a method of extracting of features of characters from the sample
image. There are basically two types of feature extraction:
Statistical feature extraction
Structural feature extraction
This is a process of creating a boundary around the characters identified in an image. This
helps by making cropping of characters easier. After boxing the characters are cropped out
for storing them as input variables for recognition.
24
Reshaping is done to change the dimensions of the acquired character in desired shape.
Resizing is done to reduce the size of characters to a particular minimum level.
‘
25
CHAPTER 5
5.1 SIMULATION
First of all the image on which the characters are written by hand is required. Below is the
example of one case in which an image sample is taken.
Also the size of the character image should be maintained to a size of 7X5 because the
ideal character set is defined as a set of images with 7X5 sized 2D matrix with binary
values.
For this to be achieved first the images are reshaped to a 7 by 5 aspect ratio image then
resized into a 7 by 5 size image
28
The input is fed through the network which traverses through each neuron as it compares
the input image with each neuron and gives the value in terms of a percentage of
similarity between the input image and the neurons.
The neuron with having highest percentage of similarity to the input image is considered
or estimated as the most favorable output which is most likely to that input.
In our case a network with 26 neurons and one hidden layer is enough.
It is important to note that the network would not be immune to noisy hand written input
if it is not trained properly. Or in other words, if the network is not trained with noisy set
of characters along with the ideal set of characters, the network will not show the correct
output every time. In fact all the handwritten characters remain irregular. So to make the
network identify irregular shaped characters properly we must have to train the network
with a noisy set of characters. In this case a set of noisy characters is obtained by adding
some noise programmatically with some non-zero value of mean and variance.
5.2 RESULTS
After proper training and testing of the network, the pixelated 7 by 5 sized image of ‘A’
is fed to the network as input. Then the out we get is the resultant 2D matrix plot same
as the character ‘A’ from the ideal dataset which was fed to the network as training
dataset.
CHAPTER 6
MODULES OF PROJECT
Basically our project has been divided into five modules which are as follows:
ADAPTIVE THRESHOLDING
DILATION
SEGMENTATION
FEATURE
EXTRACTION
NEAURAL NETWORK
Fig.6.1 Modules
33
Firstly the optimal threshold for binarization is computed by using the Otsu
method.
Threshold is calculated to separate the handwriting from the background.
With this threshold, the image is converted to black and white, thus highlighting
the handwritten characters which it contains.
6.2 DILATION
• The value of the output pixel is the maximum value of all pixels in the
neighborhood. In a binary image, a pixel is set to 1 if any of the neighboring pixels
have the value 1.
• Morphological dilation makes objects more visible and fills in small holes in
objects.
6.3. SEGMENTATION
The next step is to segment the areas corresponding to the letters of the
handwritten words from the image converted to black and white.
For this we scan the image from left to right and from bottom to top, and finding a
black pixel, will consider it as the original area delimiting the character from which is
part of.
This area is further expanded in three directions, namely top, left and right, so as
to include the rest of the pixels that are part of the handwritten character.
Save the new area
Convert the new area into a matrix of 0 and 1.
34
This module was designed to extract the features from the segmented areas of the image
containing the characters to be recognized, traits that serve to distinguish an area
corresponding to a letter from an area corresponding to other letters. To begin with, the
first n components of the discrete cosine transformation of a segmented area are
considered to be the features that describe it. In the next phase, certain statistical details
of the area are added to the discrete cosine transformation components to define its
features.
It is mainly used for the purpose of machine learning i.e., we train the machine
with a particular problem by giving it the inputs with the respective outputs.
And once the machine is trained for such kind of input, and then we can use it as a
solution to that problem.
So the main task here is to train the machine and create the dataset for that.
It mainly consists of two methods
ANN_TRAIN(String arg[]) – It is used for training of the svm and the path
of the file with it name is passed as the parameter to it. It generates a model file which
is used as reference to the trained dataset.
ANN_TEST(String arg[]) – IT is used for the testing purpose and the path
of model file, input file , output file is passed as the parameter to it. It writes the result
of the input in the output file.
35
CHAPTER 7
ADAPTIVE THRESHOLDING
This method solves the problem of finding the optimum threshold that minimises the
error of classifying a background pixel as belonging to the foreground and vice versa
(Cheriet et al., 2007).Without loss of generalization, handwriting is defined as being the
dark characters placed on light background. For an image with gray levels in
G= {0,1,..., L-1} , handwriting and the background can be represented by two classes
as follows: C= { 0, 1,..., t} and C={t+1,t+2,...,L-1}. The within class variance,
between-class variance, and total-variance reach the maximum at equivalent threshold t.
The weights represent the probabilities of the two classes separated by a threshold t
and the variance of these classes. Otsu shows that minimising the within-class
variance is equivalent to maximising the between-class variance.
36
CHAPTER 8
SEGMENTATION
The solution for the segmentation of the areas of the characters in the image was given
by an implementation of a new algorithm that, scanning the image from left to right and
from bottom to top, and finding a black pixel, will consider it as the original area
delimiting the character from which is part of. This area is further expanded in three
directions, namely top, left and right, so as to include the rest of the pixels that are part
of the handwritten character. Expansion in one direction is stopped when, among the
new pixels brought by that expansion there's no black one. Expansion in that direction
is resumed when the expansions in the other directions bring in its border new black
pixels.
This process ends when either no more expansions in any direction can be done or
when the algorithm finishes scanning the entire picture.
1-Scan the image from left to right and from bottom to top;
2-For each black pixel encountered which is not part of an area already found do:
2.4-Repeat steps 2.2 - 2.3 as long as there is at least one direction marked for
expansion;
2.5-Save the new area in a list and advance the current pixel coordinates over this one;
2.6- Resume algorithm from step 2.
39
CHAPTER 9
CHARACTER NORMALIZATION
Normalization (Cheriet et al., 2007) is a process that results in regulating the size,
position and shape of the segmented images of the characters so as to reduce the
variation in size of the images belonging to the same class thus facilitating the
extraction of features and increasing the accuracy of classification. Mainly there are two
types of methods: linear and non-linear.
as:
In the so-called “Aspect Ratio Adaptive Normalization” (ARAN), the aspect ratio of
the normalized character is computed adaptively based on the original character
Using one of the functions in Table 8.1. In implementing this method, normalized
character image is placed over a plan with flexible
sizes , then the plan is moved so that it is superimposed on the standard plan by
aligning the centre. If the image fills one dimension of the normalized standard plane,
then L is considered to be equal to max and the other dimension is centred in
the standard plane. With and L, we can calculate min using the formula
given above. Thus, we can obtain the size of the normalized character.
Table 9.1
Functions For Aspect Ratio Mapping
Coordinate transformation from the original plan on the character in the normalized one
is done using forward or backward mapping. If we denote the original image,
respectively, the normalized one, by , the normalized image is
41
In case of the forward mapping, the x and y coordinates take discrete values, but
are not necessarily the same, while in the case of backward mapping
the reverse is true. And furthermore, in the case of direct mapping the coordinates
do not necessarily occupy all the space in the normalized plane. Thusly, for
using the normalization we need to implement mesh coordinates and pixel
interpolation. By meshing, mapped coordinates , (x , y) are approximated by the
nearest integer .
In case of the mesh in the forward mapping, the discrete coordinates (x, y) scan the
Original image pixels and the pixel value f(x,y) are assigned to all the pixels that fall
The forward mapping is mostly used because of the fact that meshing the mapped
coordinates (x, y) can be easily done.
The functions for the forward and backwards mapping are given in table. Denoted by
in the table, they are given by:
42
Table 9.2
Functions For Coordinate Mapping
For extracting the features that define the characters in the image we used the discrete
cosine transformation (Watson, 1994), which is a technique that converts a signal into
its elementary frequency components. Each line of M pixels from an image can be
represented as a sum of M weighted cosine functions, assessed in discrete points, as
shown by the following equation (in the one-dimensional case):
It can be said that the transformed matrix elements with lower indices correspond to
coarser details in the image and those with higher indices to finer details. Therefore, if
we analyze the matrix T obtained by processing different blocks of an image, we see
that in the upper left corner of the matrix we have high values (positive or negative) and
the more we explore down to the bottom right corner the values start to decline even
more, tending to 0. The next step is the actual selection of certain elements in the array.
The first operation that can be done is to order the elements of the matrix into a one
dimensional array so as to highlight as many values of zero as possible. The ordering is
done by reading the matrix in zigzag. To extract the necessary features for character
recognition we can select the first N values from this array. As N increases, so does the
recognition accuracy, but that happen at the expense of increasing the training time of
the support vector machine.
44
CHAPTER 10
FEATURE EXTRACTION
This module was designed to extract the features from the segmented areas of the image
containing the characters to be recognized, traits that serve to distinguish an area
corresponding to a letter from an area corresponding to other letters. To begin with, the
first n components of the discrete cosine transformation of a segmented area are
considered to be the features that describe it. In the next phase, certain statistical details
of the area are added to the discrete cosine transformation components to define its
features:
• mean of the horizontal positions of all the "on" pixels relative to the centre of the
image and to its width;
• mean of the vertical positions of all the "on" pixels relative to the centre of the
image and to its height;
• mean product between the square of horizontal and vertical distances between all
"on" pixels;
45
• mean product between the square of vertical and horizontal distances between all
"on" pixels;
• mean number of margins met by scanning the image from left to right;
• Sum of vertical positions of the margins met by scanning the image from left to
right;
• mean number of margins met by scanning the image from bottom to top;
• sum of horizontal positions of the margins met by scanning the image from top to
bottom.
One last operation implemented by this module is the normalization of the results
obtained up until now so as they correspond to the format accepted by the support
vector machine module.
46
CHAPTER 11
The module offers the possibility of selecting different types of kernel functions, such
as the sigmoid, RBF, linear functions, and the setting of the various parameters of these
kernels (Hsu et al., 2010). After setting the type of kernel and its parameters, the
support vector machine is trained with the set of features given by the other modules.
Once the training is over, the support vector machine can be used to classify new sets of
characters.
ANN_TRAIN(String arg []) – It is used for training of the ann and the path of the file
with it name is passed as the parameter to it. It generates a model file which is used as
reference to the trained dataset.
ANN_TEST(String arg[]) – IT is used for the testing purpose and the path of model
file, input file , output file is passed as the parameter to it. It writes the result of the
input in the output file.
47
CHAPTER 12
LANGUAGE TRANSLATION
Translation is the communication of meaning from one language (the source) to another
language (the target). Translation refers to written information, whereas interpretation
refers to spoken information.
The purpose of translation is to convey the original tone and intent of a message, taking
into account cultural and regional differences between source and target languages.
Translation has been used by humans for centuries, beginning after the appearance of
written literature. Modern-day translators use sophisticated tools and technologies to
accomplish their work, and rely heavily on software applications to simplify and
streamline their tasks.
48
CHAPTER 13
13.1 INTRODUCTION
13.1.1 Purpose
The objective of this project is to identify handwritten characters with the use of
neural networks. We have to construct suitable neural network and train it properly.
The program should be able to extract the characters one by one and map the target
output for training purpose. After automatic processing of the image, the training
dataset has to be used to train “classification engine” for recognition purpose. The
program code has to be written in JAVA and supported with the usage of Graphical
User Interface (GUI) using android.
13.1.2 Scope
In this fast-paced world, there is an immense urge for the digitalization of printed
documents and documentation of information directly in digital form. And there is
still some gap in this area even today. OCR techniques and their continuous
improvisation from time to time is trying to fill this gap. This project is about
devising an algorithm for recognition of hand written characters also known as
HCR (Handwritten Character Recognition) leaving aside types of OCR that deals
with recognition of computer or typewriter printed characters.
49
13.1.3 References
• https://ieeexplore.ieee.org
• https://www.kaggle.in/datasets
• https://developer.android.com/docs
13.1.4 Overview
The rest of the SRS examines the specifications of Handwritten Text Recognition
and Translation. Section 2 of the SRS presents the overall description of the
Handwritten Text Recognition and Translation. Section 3 outlines the detailed,
specific functional, performance, system and other related requirements of the
Handwritten Text Recognition and Translation.
An Android smartphone
The proposed software installed along with all permissions granted.
Minimum API 15
Minimum Android version 4.0.3
NO HARDWARE REQUIRED
The user's performance will be dependent on the speed of their Internet connection as
well as the performance of the network. The project will be designed to be compatible
51
with all major Internet browsers and for a low common denominator of phone
performance, so as to be widely accessible.
13.4.2.1 Security
13.4.2.2 Reliability
13.4.2.3 Maintainability
13.4.2.4 Portability
The use case diagram representation of the project can be represented as shown in
figure.
PROJECT OVERVIEW
REFERENCES
[1] Eugen-Dumitru Tautu And Florin Leon(May 7,2012), Handwritten Text Recognition Using
Artificial Neural Networks.
[3] Plamondon, Réjean, and Sargur N. Srihari. " Handwritten Text Recognition: a comprehensive
survey." Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.1 (2000): 63-
84.
[4] Madhvanath, Sriganesh, and VenuGovindaraju. "The role of holistic paradigms in precision
text recognition." Pattern Analysis and Machine Intelligence, IEEE Transactions on 23.2
(2001): 149-164.
[5] Park, Jaehwa, Venu Govindaraju, and Sargur N. Srihari. " Handwritten Text Recognition in a
hierarchical feature space." Pattern Analysis and Machine Intelligence, IEEE Transactions on
22.4 (2000):400-407.
[6] Cohen, G., Afshar, S., Tapson, J., & van Schaik, A. (2017). Handwritten Text Recognition
using advanced techniques involving deep learning.
[7] Monali Paul, Santosh K. Vishwakarma, Ashok Verma (2015), ‘Analysis of Handwritten Text
using Data Mining Approach’, International Conference on Computational Intelligence and
Communication Networks
[8] A.T.M Shakil Ahamed, Navid Tanzeem Mahmood, Nazmul Hossain, Mohammad Tanzir
Kabir, Kallal Das, Faridur Rahman, Rashedur M Rahman (2015) , ‘Applying Data Mining
Techniques to Recognize Handwritten Text by collecting samples from various writers’ ,
(SNPD) IEEE/ACIS International Conference.