Arabic OCR Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Sana’a University

Faculty of Engineering
Mechatronics Department

MSD-2

Arabic Handwritten Characters Recognition


Project

Supervised By:

Adopted By:
In this project we studied an Arabic Handwritten Characters Recognition project
using OpenCV and CNN or, Convolution Neural Network, which can be then
scanned or recognized as an image and converted into letter characters, words,
or texts.

Handwritten Arabic character recognition systems face several challenges,


including the unlimited variation in human handwriting and large public
databases. In this work, a model of a deep learning architecture that was
effectively applied to recognizing Arabic handwritten characters. A
Convolutional Neural Network (CNN) is a special type of feed-forward multilayer
trained in supervised mode. The CNN trained and tested our database that
contain 16800 of handwritten Arabic characters. In this case, the optimization
methods implemented to increase the performance of CNN.
CNNs typically use the following types of layers:
Input layer – This layer takes the raw image data as it is.
Convolutional layer – This layer computes the convolutions between the
neurons and the various patches in the input. The convolutional layer
basically computes the dot product between the weights and a small
patch in the output of the previous layer.

Rectified Linear Unit layer – This layer applies an activation function to


the output of the previous layer. This function is usually something like
max (0, x). This layer is needed to add non-linearity to the network so that
it can generalize well to any type of function.

Pooling layer – This layer samples the output of the previous layer
resulting in a structure with smaller dimensions. Pooling helps us to keep
only the prominent parts as we progress in the network. Max pooling is
frequently used in the pooling layer, where we pick the maximum value
in a given KxK window.
Fully Connected layer – This layer computes the output scores in the last
layer. The resulting output is of the size 1x1xL, where L is the number of
classes in the training dataset.

Fig.1 (CNN-Layers)

Building a machine learning model is through some primary steps:


• Extracting features:
o Preprocessing, choosing features, exploring and
manipulating dataset.
• Split dataset:
o Splitting dataset into Training dataset, and Testing dataset.
• Train the model:
o Training dataset is inputted into a chosen machine learning
model or technique such as Neural-Networks.
• Evaluate the model:
o Evaluating the performance of the trained model through:
▪ Calculating average errors or the predictions,
▪ Calculate the percent of apartments did the model
predict within a 10% margin.
▪ Determining the performance threshold once metrics
chosen.

• Data Exploration:
o Import libraries necessary for this project:
o Load Arabic Letters dataset files into data frames:

o Convert csv values to an image writing a method to be used later if


we want visualization of an image from its pixel’s values:
• Data Preprocessing:
o Image Normalization:

o Image Normalization:
▪ from the label’s csv files we can see that labels are categorical
values and it is a multi-class classification problem.
▪ the outputs are in the form of: Letters from '‫ 'أ‬to '‫ 'ي‬have
categories numbers from 0 to 27
▪ Here we will encode these categories values using One Hot
Encoding with keras.
▪ One-hot encoding transforms integer to a binary matrix where
the array contains only one ‘1’ and the rest elements are ‘0’.

o Reshaping Input Images to 32x32x1:


▪ When using TensorFlow as backend, Keras CNNs require a 4D
array (which we'll also refer to as a 4D tensor) as input, with
shape (nb_samples, rows, columns, channels)
▪ where nb_samples correspond to the total number of images
(or samples), and rows, columns, and channels correspond to
the number of rows, columns, and channels for each image,
respectively.
▪ So, we will reshape the input images to a 4D tensor with shape
(nb_samples, 32, 32 ,1) as we use grayscale images of 32x32
pixels.
• Designing Model Architecture:
o

o The first hidden layer is a convolutional layer. The layer has 16 feature
maps, which with the size of 3×3 and an activation function which is
relu. This is the input layer, expecting images with the structure
outlined above.
o The second layer is Batch Normalization which solves having
distributions of the features vary across the training and test data,
which breaks the IID assumption. We use it to help in two ways faster
learning and higher overall accuracy.
o The third layer is the MaxPooling layer. MaxPooling layer is used to
down-sample the input to enable the model to make assumptions
about the features so as to reduce overfitting. It also reduces the
number of parameters to learn, reducing the training time.
o The next layer is a Regularization layer using dropout. It is configured
to randomly exclude 20% of neurons in the layer in order to reduce
overfitting.
o Another hidden layer with 32 feature maps with the size of 3×3 and a
relu activation function to capture more features from the image.
o Other hidden layers with 64 and 128 feature maps with the size of 3×3
and a relu activation function to capture complex patterns from the
image which will describe the digits and letters later.
o More MaxPooling, Batch Normalization, Regularization and
GlobalAveragePooling2D layers.
o The last layer is the output layer with (number of output classes) and
it uses softmax activation function as we have multi-classes. Each
neuron will give the probability of that class.
o we used categorical_crossentropy as a loss function because its a
multi-class classification problem. I used accuracy as metrics to
improve the performance of our neural network.

• Model Summary and Visualization:


o
o Keras support plotting the model in keras.utils.vis_utils module which
provides utility functions to plot a Keras model using graphviz:
• Parameters Tuning:
o We will try different models with different parameters to find the best
parameter values.
o From the above results we can see that best parameters are:
• Optimizer: Adam

• Kernel_initializer: uniform
• Activation: relu
Let's create the model with the best parameters obtained.

• Training the Model:


o Fitting the Model.
o Train the model using batch_size=30 to reduce used memory and
make the training more quick. We will train the model first on 15
epochs to see the accuracy that we will obtain.
o Plotting Loss and Accuracy Curves with Epochs:
o Load the Model with the Best Validation Loss:

• Test the Model:


o We get test accuracy of 97.71% after training on 15 epochs.

o With increasing the epochs we train on, we get :


• Testing the Model again:
o

• Saving the Final Model:


o Let's save the model on .json format to be used later instead of
creating the model again from scratch:

o Save the model weights to file:


o if we want to load the model with the last obtained weights at any
time, we will run the following code cell:

• Predict Image Classes:


o Making a method which takes a model, data and its true labels
(optional for using in testing). Then it gives the predicted classes of the
given data using the given model:

o Making a method which will print all metrics (precision, recall, f1-score
and support) with each class in the dataset:
❖ Notes:
o Since the Arabic letters are total of 28 letters, their label names are
represented by integers of 0-27 as (‫ي‬-‫)أ‬.
o From the resulted output we see there somehow error in expected
letters or labels.
o Since the dataset is something little and may need more training
dataset in predicting the letter’s label even if accuracy is something
high.

Common machine learning methods usually apply a combination of feature


extractor and trainable classifier. The use of CNN leads to significant
improvements across different machine-learning classification algorithms. Our
proposed CNN is giving an average 5.1% misclassification error on testing data.

o www.packt.com
o Artificial Intelligence with Python, Second Edition, Alberto
Artasanchez, Prateek Joshi
o https://www.kaggle.com/mloey1/ahcd1

You might also like