Offline Handwriting Recognition With Emphasis On Character Recognition A Comprehensive Survey
Offline Handwriting Recognition With Emphasis On Character Recognition A Comprehensive Survey
Offline Handwriting Recognition With Emphasis On Character Recognition A Comprehensive Survey
Abstract—Handwriting has continued to be a means of The survival of handwriting is based on the use of
communication as well as a talisman of an individual. It copybooks and writing methods such as Palmer method
has also been tool to record information. Machine used to teach handwriting. The reason that handwriting still
recognition of handwriting has found its presence in has an upper hand on digital devices is the convenience and
PDA, in portal addresses on envelopes, in amounts in the ease of use as compared to keyboards. Handwriting
bank checks, in handwritten notes and fields. Character serves as a talisman and the standard of conformity of an
recognition is a process by which computer recognizes individual [4] and hence can be used for Handwriting
letters, numbers or symbols and turn them into digital Verification.
form. It has gained a lot of use in pattern recognition. It
is one of the well liked and challenging area of research C. Death of Handwriting
Handwriting facilitated communication using messages as
Keywords—Character, Character Recognition,
tools formed using linguistic rules. Handwriting manifests
Preproecessing, Segmentation and Classification.
credibility since the classification of anything into pre-
I. INTRODUCTION history and history is based on the presence of handwritten
records. However, in recent times, while computer
ownership is on the rise, while only 8% of American
A. Nature of Handwriting household owned a computer in 1984, in 2011, 75%
An individual's style of writing with hand with the help of households have some form of computing device which
writing instruments (Pen or pencil) is known as Handwriting certainly shows increased use of typing. [5]
[1]. Handwriting was developed to serve two purposes: a)
To enlarge human memory to serve the need to store The Washington Post says that cursive handwriting, which
information permanently and to aid communication since was a mandatory part of elementary education, has been
knowledge was transferred from one generation to next disappearing. According to the Common Core Standards in
verbally, before the advent of handwriting. [1] The 2011, abolished obligatory teaching of cursive handwriting
communication was facilitated with the help of symbols and after the 1st grade.
rules (to combine) assigned to the language. These symbols In August 2013, Netherland witnessed commencement of
are now known as characters which are combined using iPad schools which relied heavily on digital education. It
linguistic rules to form words henceforth was not mandatory for students to be present in class. Dutch
government also gave green signal to the idea. The creator
B. History of Handwriting of iPad schools, Dehond, stated that only 4% of the entire
Before the onset of handwriting, the culture, stories, norms, coursework relied on handwritten material [7]
rituals etc were passed verbally from one generation to the
next. With the evolution of cultures around, humans felt the D. Handwriting Analysis
need to standardize communication. The standardization The analysis of handwriting (graphoanalysis) analysis
was achieved by using pictographs which was an evolved physical characteristics and patterns of handwriting with the
form of simple drawings, beginning the use of handwriting. purpose of identifying the writer and/or to indicate
[2] psychological state at the time of writing and involving
The first systematic handwriting system was “Sumerian” physical characters of the writer. [8] Graphology is
pictographic system, which used clay tablets. This system considered a pseudoscience [9] [10] [11] [12] [13]. The four
was then modified into an evolved form, called “Cuneiform” forms of analysis on handwriting are-Handwriting
in 3200 B.C. The earliest alphabetical system was recognition is the task of transforming graphical marks into
developed by “Phoenicians” in eleventh century. This symbolic representations as understood by the language. For
system lacked vowels and consisted on 22 alphabets [3]. English Orthography, this symbolic representation is the 8
Hebrew and Aramaic scripts are heavily influenced by this bit ASCII representation of characters. The characters of
system. most written languages of the world can be represented in
the form of 16-bit Unicode [14] format. Handwriting
Identification is the mechanism of connoting the
handwritten text and Handwriting Identification implies There are two models of handwriting modeling: Bottom-up
determining the author of the handwritten text from a pool and Top-down[20]. Bottom-up features focus on features of
of writers, considering each writer’s handwriting is unique. human hand wring such as slant, pressure, velocity and tries
Signature verification involves determining whether the to reproduce it with observation. The top-down approach
signature belongs to a given person or not. Identification and focus on the psychological aspect of handwriting such as
verification are heavily used in forensic science[15]. To motor learning, motor movement, planning etc.
determine the nature, character of a specific writer[16]
III. OFFLINE HANDWRITING RECOGNITION
Recognition and interpretation may be used in daily life,
example being a pharmacist decodes the medicine written
In offline handwriting recognition, the data to be recognized
on a prescription. The techniques are primarily used to
have been scanned and stored as an image. The system relies
eradicate variations or possibilities. For applying these
on prior knowledge of the domain, where task specific
techniques the knowledge of subject domain is mandatory
constraints are available. To solve the limitations of offline
handwriting recognition, several system have been
E. Handwriting Input
proposed. A complete overview of the early work has been
There are two approaches of providing handwritten input: referenced from [Bunke (2003); Plamondon and Srihari
The first being “Off-line” involves scanning a handwritten (2000); Steinherz et al. (1999); Suen et al. (2000);
text or printed information and converting into digital form. Vinciarelli (2002)].[23]
The second called “On-Line” involves writing with a pen
shaped device called stylus on an electronic surface called [K.Sirlantzis and Hoque (2001)] proposed a multiple
digitizer or a Personal digital assistant having LCD. The classifier system trained on Freeman chain-code
strokes (used to form handwritten text) is analysed by a representation of the character contours combined with sn
software considering it as electronic ink. tuple classifiers which increased the recognition rate. Gunter
and Bunke (2004b performed an in depth investigation on
The online handwritten input, the information (two the variability of quantum of training set on the
dimensional coordinates of the successive points) is stored classification of information using multiple classifier
as a function of time. In the case of offline input, a static system. All these classifier had some constarints for
image is used to extract the data (the luminance of the classifiers, to overcome this [Bertolami & Bunke (2005)]
points) . proposed a model using multiple classifiers for an
In terms of storage requirement of raw data, in online unconstrained handwritten line recognition. Algorithms
system the space requirement is less. The data required for were then developed for each stage of recognition to achieve
a cursively written in online case is few hundred bytes and precision. For preprocessing to imporve the image, an
in the offline scenario few hundred kilobytes. In the offline algorithm to normalize the intensity of background light
case, if a document 8.27x11.69 inch page is scanned at a using adaptive linear function was proposed. This algorithm
resolution of 12.8 M (4128 x 3096) results in the scanned was based on approximation and was primarily used for
image of 1.6 MB. The resolution is the smallest font size recognition of historic data. [59]
that needs reliable recognition, as well as bandwidth Other stages such as normalization witnessed methods to
required for transmission and storage. correct structural properties of handwritten text using
The recognition rate of online recognition system is much gradient orientation of the digitized image. Slant and skew
higher than offline. For the on-line, unconstrained, can be corrected using this method. [60]
handwritten word recognition problem, recognition rates of
95 percent, 85 percent and 78 percent have been recorded A. Comparability of Recognition Rates
for top choice lexicon sizes of 10, 100 and 1000 A number of studies have been published whose recognition
respectively.[17] In the case of off-line, top choice rates range from 50 to even a perfect 100% (in certain
recognition rate of 80 percent is recorded with a pure conditions). However a perfect system does not exist. For
cursive words and a 21,000 word lexicon[18]. example, a postal address pertaining to a country can be
recognized using the knowledge of the pincodes and
II. HANDWRITING GENERATION AND
improve the accuracy. But for that a certain level of pre-
PERCEPTION
processing is required in the image available.
Handwriting is a learned and practiced skill that involves Three factors which govern the recognition and affect the
coordination of various sub-systems of our central-nervous comparability of the system are: the considered recognition
system called motor [19]. The first step in the production of task, which determines the overall complexity of the
handwriting is at semantic level where the writer intends to problem, the data set which may differ in terms of quality
write a message. At the lexical and syntactic level, the and quantum. And the quantity of data used to train and test
intended message is transformed into words formed using the system.
the lexicons(linguistic symbols) and combined using correct
The performance of a system crucially depends on the
syntax (linguistic rules) . When the individual graphemes
considered task. For example, in the case of isolated digit
are known the writer selects specific allographs. Below the
recognition the performance is usually higher than in
level, the allographs are transformed into movement patterns
unconstrained handwritten text line recognition. When
of our hand.
document and damaged text. It is ideal for processing the state, city, pincode information. A physical
stock market data or finding trends in graphical implementation of the system has been installed in United
patterns. States Postal Service.
B. Bank Cheque Recognition
The main approaches of offline handwritten word
recognition can be divided into two classes: holistic and Bank cheque recognition system consists of various
work Extensive survey on isolated handwritten character operations such as machine printed numeral recognition,
recognition can be found in [46], [47], [48],[49]. signature verification, courtesy amount recognition, legal
amount recognition; The first step is image acquisition. The
V. AAPLICATIONS OF OFFLINE
image is acquired by scanner. Next step in Machine printed
HANDWRITING RECOGNITION
numeral recognition involving bank identification code,
bank agency identification code, check number and
The most prominent applications of offline handwriting
customer account number. The courtesy amount field, legal
recognition are in reading postal address interpretation, bank
amount field and signature field is used to remove
address and forms.
background and remove guidelines. The courtesy amount
recognition module recognizes numerals by hypothesis and
A. Handwritten Address Interpretation
verification. The legal amount recognition module includes
Handwritten address interpretation is used to classify a letter 3 recognizers. Amount validation module accepts amount, if
based on the location. The address consists of Name, it is greater than a threshold. Signature verification uses
country, state, city, pincode, street address and Phone signature image and uses the information from the database
number. [57], [58]. The handwritten address interpretation of users.
uses database as a source of information i.e. collection of all
VI. CONCLUSION date in brief and practical utility and recognition of those
methods. This paper also brings into limelight the practical
This paper addresses the handwriting, its history, current implementation of offline handwritten character recognition
state of art, various uses of it in depth. This paper also implemented in USA, and its limitations with a scope to
highlights generation of handwriting in depth. This paper improve the recognition rates in the field.
surveys character recognition, the methods developed till
[35]. G. Kim V. Govindaraju S.N. Srihari "An [49]. C.Y. Suen M. Berthod S. Mori "Automatic
Architecture for Handwritten Text Recognition Recognition of Handprinted Characters: The State of
Systems" Int'l J. Document Analysis and Recognition the Art" Proc. IEEE, vol. 68 pp. 469-487 1980.
vol. 2 pp. 37-44 1999. [50]. J.C. Simon "Off-Line Cursive Word Recognition"
[36]. U. Mahadevan R.C. Nagabhushanam "Gap Metrics Proc. IEEE, vol. 80 no. 7 pp. 1 150-1 160 1992.
for Word Separation in Handwritten Lines" Proc. Third [51]. Y.S. Huang C.Y. Suen "The Behavior-Knowledge
Int'l Conf. Document Analysis and Recognition pp. Space Method for the Combination of Multiple
124-127 1995-Aug. Classifiers" Proc. IEEE Conf. Computer Vision Pattern
[37]. G. Seni E. Cohen "External Word Segmentation of Recognition, pp. 347-352 1993.
Off-Line Handwritten Text Lines" Pattern Recognition [52]. R. Plamondon F.J. Maarse "An Evaluation of
vol. 27 no. 1 pp. 41-52 1994. Motor Models of Handwriting" IEEE Trans. Systems
[38]. G. Kim V. Govindaraju "Handwritten Phrase Man and Cybernetics vol. 19 no. 5 pp. 1 060-1 072
Recognition as Applied to Street Name Images" Pattern 1989.
Recognition vol. 31 no. 1 pp. 41-51 1998. [53]. M. Gilloux J.M. Bertille M. Leroux "Recognition
[39]. G. Kim V. Govindaraju "A Lexicon Driven of Handwritten Words in a Limited Dynamic
Approach to Handwritten Word Recognition for Real Vocabulary" Proc. Third Int'l Workshop Frontiers of
Time Applications" IEEE Trans. Pattern Analysis and Handwriting Recognition (IWFHRIII) pp. 417-422
Machine Intelligence vol. 19 no. 4 pp. 366-379 Apr. 1995-May.
1997. [54]. R. Bozinovic S.N. Srihari "Off-Line Cursive Script
[40]. J. Franke L. Lam R. Legault C. Nadal C.Y. Suen Recognition" IEEE Trans. Pattern Analysis and
"Experiments with CENPARMI Data Base Combining Machine Intelligence vol. 11 no. 1 pp. 68-83 1989.
Different Classification Approaches" Proc. Third Int'l [55]. M. Eden "Handwriting and Pattern Recognition"
Workshop Frontiers of Handwriting Recognition IRE Trans. Information Theory, vol. 8 1962.
(IWFHRIII) pp. 305-311 1995-May.
[56]. P.D. Gader M. Mohammed J.H. Chiang
[41]. T.K. Ho J.J. Hull S.N. Srihari "Decision "Handwritten Word Recognition with Character and
Combination in Multiple Classifiers Systems" IEEE Inter Character Neural Networks" IEEE Trans. System
Trans. Pattern Analysis and Machine Intelligence vol. Man and Cybernetics, vol. 27 no. 1 pp. 158-164 1997.
16 no. 1 pp. 66-75 Jan. 1994.
[57]. E. Cohen J.J. Hull S.N. Srihari "Understanding
[42]. Y.S. Huang C.Y. Suen "The Behavior-Knowledge Handwritten Text in a Structured Environment:
Space Method for the Combination of Multiple Determining ZIP Codes from Addresses" Int'l J. Pattern
Classifiers" Proc. IEEE Conf. Computer Vision Pattern Recognition and Artificial Intelligence, vol. 5 pp. 221-
Recognition pp. 347-352 1993. 264 1991.
[43]. J. Kittler M. Hatef R.P.W. Duin "Combining [58]. A.C. Downton R.W.S. Tregidgo C.G. Leedham
Classifiers" Proc. 13th Int'l Conf. Pattern Recognition Hendrawan "Recognition of Handwritten British Postal
pp. 897-901 1996-Aug. Addresses" From Pixels to Features III, pp. 129-144
[44]. R.K. Powalka N. Sherkat R.J. Whitrow "Multiple 1992.
Recognizer Combination Topologies" Handwriting and [59]. Z. Shi, V. Govindaraju “Historical document image
Drawing Research: Basic and Applied Issues. pp. 329- enhancement using background light intensity
342 1996. normalization, Pattern Recognition, 2004. ICPR 2004.
[45]. R. Casey E. Lecolinet "A Survey of Methods in Proceedings of the 17th International Conference
Strategies in Character Segmentation" IEEE Trans. on”,26 Aug. 2004 .
Pattern Analysis and Machine Intelligence vol. 18 pp. [60]. Changming Sun, Deyi Si, “Skew and slant
690-706 1996. correction for document images using gradient
[46]. A. Amin "Off-Line Character Recognition: A direction”, Document Analysis and Recognition, 1997.,
Survey" Proc. Fourth Int'l Conf. Documents Analysis Proceedings of the Fourth International Conference on,
and Recognition (ICDAR '97) pp. 596-599 1997-Aug. 8-20 Aug. 1997.
[47]. V.K. Govindan A.P. Shivaprasad "Character
Recognition: A Review" Pattern Recognition, vol. 23
no. 7 pp. 671-683 1990.
[48]. S. Mori K. Yamamoto M. Yasuda "Research on
Machine Recognition of Handprinted Characters", IEEE
Trans. Pattern Analysis and Machine Intelligence, vol.
6 no. 4 pp. 386-405 Apr. 1984.