Yerrijdnewpaper

1.
INTRODUCTION
Advances in Technology have lead to increased usage of smart phones, tablets and digital
cameras, resulting in large collection of heterogeneous data consisting of video images, natural
scene images and web based images with text. These images contain useful text that can be used
for numerous applications such as machine language translation, safe driving, license plate
tracking and recognition, blind navigation, spot identification, house number tracking from maps
etc. Text detection and recognition in video/scene images is a major research issue. This provides
the basis and significant clues for content-based retrieval applications.
Given an image, the goal of text detection is to determine the existence of text and return the
location if it is present. The text recognition identifies and generates text from these images. In
other words, Text detection task is to find a minimum sized region of interest with all of the text
in the image inside it. Text Detection and Recognition find all areas in an image, mark
boundaries of the text areas and output a sequence of characters associated with its content.
Fig 1: Text detection and recognition in scene image
Types of Text images:

1.Document images
2.Born Digital images
3. Heritage images
4. Superimposed /caption text
5. Scene images: The text that naturally exists in the scene at the time of capturing the video is
referred as scene text. The scene text includes sign boards, traffic information, advertisements,
banners, number plate, location name board etc.
Scene text reading promotes many compute vision application such as image retrieval, intelligent
transportation
Characteristics of scene Text contains varying features as mentioned below:
Style: Text in images appears either in printed block letters or in handwritten cursive form.
Size - Text in images can acquire any percentage of image area
Spacing - Inter-character and inter-word spacing’s together with the size of text can make
detection difficult.
Colour: Text in scene images can have multiple font colour.
Background: Scene images has complex background where Text is embedded and sometimes get
merged with background. Hence text detection against complex background with low resolution
is challenging.
fig 2: Scene Text images with variation in background, multi font color, orientation
Due to variation in text size, color, and font style, multiple orientations of text, complex
background, and geometric distortions in images, there is tendency of failing to detect true text
regions. Further, the growth in Optical Character Recognition (OCR) systems has made
computers to read text from images. Since Images may have many other non-character textures,
it is difficult for the OCR to read text. We need to extract character strings from images.
In the recent, researchers have proposed several approaches for detecting and recognizing the
text in natural scene images. In this paper, we present the extensive review of the literature of
text detection and recognition in scene images in the recent past. Secondly, the paper also
detailizes the various datasets used for the study. Thirdly, we also compare the performances of
various algorithms for scene text detection and recognition
2. Structural problems of STR:
Text detection:
Text Recognition:
End-End text system:
3.. Related work on Text detection and recognition in scene images:

Text embedded in images is rich source of semantic information which is extracted and used for
a variety of applications. Scene Text detection has been an active research area in the field of
computer vision and deep learning. Many states of art approaches and models have been
proposed by the researchers of scene text community. The survey is done in three parts namely 1.
Text detection 2. Text Recognition 3. End to End Recognition
3.1 Text detection

Text detection algorithms are concerned with locating and identifying the regions of text in
image. Text detection is an important part of scene understanding. Its objective is to find a set of sub-
regions in the image with text and minimal non-text content. In general, text detection can be done in
three ways: Text detection by locating text in bounding boxes , Text extraction by binarizing the scene
image such that all text pixels are foreground and the rest are background Text region proposals methods
giving multiple possible text bounding boxes . The three main categories of Text detection
approaches are: region based algorithms, Texture based and proposal based approaches.
3.1.1 Region based algorithms
Region based algorithms are of two types namely Sliding Window and Connected Component
methods. Sliding window approaches detect text in scene images by extracting number of non
overlapping image rectangles of different scales, across all positions of an image for feature
extraction. The sliding window approach is effective in detecting the text regions. The drawback
is sliding window is computational intensive since the window of varying size has to be
considered. Connected component methods first group pixels with same size, colour, stroke
width into connected components and later classify them as text or non text based on the high
level features extracted using Deep Convolution networks. Maximally stable extremal regions
(MSER) and stroke width transform (SWT) are two connected component methods. These are
efficient than Sliding window and are capable of detecting pixel level texts.
Arpit Jain et al 2014 [17], proposed an end-to-end system for text detection and
recognition from videos. Maximally Stable External Regions (MSER) can detect text in very low
illuminated background with view point variations .Then a Super vector machine (SVM)
classifier is used to classify the text /non text regions using shape descriptors and reduces their
dimensionality using Partial Least Squares (PLS) technique for achieving increased performance.
Finally, the detected text is binarized and sent to OCR for text recognition. The proposed
approach is efficient in detecting pixel level text and word recognition task
Juli P et al 2016 [5], used stroke width transform (SWT) to detect text in natural scenes. Here the
deskwing algorithm is used for deskewing in order to detect text for image irrespective of its
orientation..The algorithm is able to detect text of any font, orientation, direction and scale

Yerrijdnewpaper

Uploaded by

Copyright:

Available Formats

Yerrijdnewpaper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Yerrijdnewpaper

Uploaded by

Copyright:

Available Formats

1.

Fig 1: Text detection and recognition in scene image

Types of Text images:

3.. Related work on Text detection and recognition in scene images:

3.1 Text detection

You might also like