Full Chapter Text Segmentation and Recognition For Enhanced Image Spam Detection An Integrated Approach Mallikka Rajalingam PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Text Segmentation and Recognition for

Enhanced Image Spam Detection: An


Integrated Approach Mallikka
Rajalingam
Visit to download the full and correct content document:
https://textbookfull.com/product/text-segmentation-and-recognition-for-enhanced-ima
ge-spam-detection-an-integrated-approach-mallikka-rajalingam/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Practical Machine Learning and Image Processing: For


Facial Recognition, Object Detection, and Pattern
Recognition Using Python Himanshu Singh

https://textbookfull.com/product/practical-machine-learning-and-
image-processing-for-facial-recognition-object-detection-and-
pattern-recognition-using-python-himanshu-singh/

Medical image recognition, segmentation and parsing :


machine learning and multiple object approaches 1st
Edition Zhou

https://textbookfull.com/product/medical-image-recognition-
segmentation-and-parsing-machine-learning-and-multiple-object-
approaches-1st-edition-zhou/

Metaheuristics for Data Clustering and Image


Segmentation Meera Ramadas

https://textbookfull.com/product/metaheuristics-for-data-
clustering-and-image-segmentation-meera-ramadas/

Metaheuristic Algorithms for Image Segmentation Theory


and Applications Diego Oliva

https://textbookfull.com/product/metaheuristic-algorithms-for-
image-segmentation-theory-and-applications-diego-oliva/
Plant systematics : an integrated approach Singh

https://textbookfull.com/product/plant-systematics-an-integrated-
approach-singh/

An Introduction to Applied Semiotics Tools for Text and


Image Analysis 1st Edition Louis Hébert

https://textbookfull.com/product/an-introduction-to-applied-
semiotics-tools-for-text-and-image-analysis-1st-edition-louis-
hebert/

An Integrated Solution Based Irregular Driving


Detection 1st Edition Rui Sun (Auth.)

https://textbookfull.com/product/an-integrated-solution-based-
irregular-driving-detection-1st-edition-rui-sun-auth/

Coastal Wetlands: An Integrated Ecosystem Approach


Gerardo Perillo

https://textbookfull.com/product/coastal-wetlands-an-integrated-
ecosystem-approach-gerardo-perillo/

An Integrated Approach for an Archaeological and


Environmental Park in South Eastern Turkey Tilmen Höyük
Nicolò Marchetti

https://textbookfull.com/product/an-integrated-approach-for-an-
archaeological-and-environmental-park-in-south-eastern-turkey-
tilmen-hoyuk-nicolo-marchetti/
EAI/Springer Innovations in Communication and Computing

Mallikka Rajalingam

Text Segmentation
and Recognition
for Enhanced
Image Spam
Detection
An Integrated Approach
EAI/Springer Innovations in Communication
and Computing

Series Editor
Imrich Chlamtac, European Alliance for Innovation, Ghent, Belgium
Editor’s Note
The impact of information technologies is creating a new world yet not fully
understood. The extent and speed of economic, life style and social changes already
perceived in everyday life is hard to estimate without understanding the technological
driving forces behind it. This series presents contributed volumes featuring the
latest research and development in the various information engineering technologies
that play a key role in this process.
The range of topics, focusing primarily on communications and computing
engineering include, but are not limited to, wireless networks; mobile communication;
design and learning; gaming; interaction; e-health and pervasive healthcare; energy
management; smart grids; internet of things; cognitive radio networks; computation;
cloud computing; ubiquitous connectivity, and in mode general smart living, smart
cities, Internet of Things and more. The series publishes a combination of expanded
papers selected from hosted and sponsored European Alliance for Innovation (EAI)
conferences that present cutting edge, global research as well as provide new
perspectives on traditional related engineering fields. This content, complemented
with open calls for contribution of book titles and individual chapters, together
maintain Springer’s and EAI’s high standards of academic excellence. The audience
for the books consists of researchers, industry professionals, advanced level students
as well as practitioners in related fields of activity include information and
communication specialists, security experts, economists, urban planners, doctors,
and in general representatives in all those walks of life affected ad contributing to
the information revolution.
Indexing: This series is indexed in Scopus, Ei Compendex, and zbMATH.

About EAI
EAI is a grassroots member organization initiated through cooperation between
businesses, public, private and government organizations to address the global
challenges of Europe’s future competitiveness and link the European Research
community with its counterparts around the globe. EAI reaches out to hundreds of
thousands of individual subscribers on all continents and collaborates with an
institutional member base including Fortune 500 companies, government
organizations, and educational institutions, provide a free research and innovation
platform.
Through its open free membership model EAI promotes a new research and
innovation culture based on collaboration, connectivity and recognition of excellence
by community.

More information about this series at http://www.springer.com/series/15427


Mallikka Rajalingam

Text Segmentation and


Recognition for Enhanced
Image Spam Detection
An Integrated Approach
Mallikka Rajalingam
Department of Computer Science & Engineering
Bharathidasan University
Tiruchirappalli, India

ISSN 2522-8595     ISSN 2522-8609 (electronic)


EAI/Springer Innovations in Communication and Computing
ISBN 978-3-030-53046-4    ISBN 978-3-030-53047-1 (eBook)
https://doi.org/10.1007/978-3-030-53047-1

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

This book has proposed an efficient spam detection technique which is a combination
of character segmentation, recognition and classification (CSRC) that could detect
whether an email (text- and image-based) is a spam mail or not. The present work is
presented with a fourfold process. First, the text character is extracted from the image
by segmentation process which includes a combination of discrete wavelet transform
(DWT) and skew detection. Thus, the image features of a specific shape can be iso-
lated and the regular curves such as circles, lines and ellipses can be detected. Second,
the text characters are recognized via text recognition and visual feature extraction
approach which relies on contour analysis with improved local binary pattern (LBP).
Third, the extracted text features are classified using improvised K-nearest neighbour
search (KNN) and support vector machine (SVM) classifiers, and the text data for
both classification and regression process are analysed. Fourth, the performance of
the proposed method is validated by the measure of metrics such as sensitivity, speci-
ficity, precision, recall, F-measure, accuracy, error rate and correct rate.

v
Contents

1 Introduction����������������������������������������������������������������������������������������������    1
1.1 Introduction��������������������������������������������������������������������������������������    1
1.2 Characteristics of Image Spam ��������������������������������������������������������    2
1.3 Problem Statement����������������������������������������������������������������������������    3
1.4 Objectives������������������������������������������������������������������������������������������    5
1.5 Motivation ����������������������������������������������������������������������������������������    6
1.6 Research Contribution����������������������������������������������������������������������    6
1.7 Research Scope ��������������������������������������������������������������������������������    7
1.8 Novelty and Significance������������������������������������������������������������������    8
1.9 Outline of the Chapters ��������������������������������������������������������������������    8
References��������������������������������������������������������������������������������������������������    9
2 Review of Literature��������������������������������������������������������������������������������   11
2.1 Character Segmentation��������������������������������������������������������������������   11
2.1.1 Classifier-Based Approach����������������������������������������������������   12
2.1.2 Artificial Neural Networks Classifier������������������������������������   14
2.1.3 Support Vector Machines Classifier��������������������������������������   15
2.1.4 Decision Tree������������������������������������������������������������������������   17
2.1.5 Non-Classifier-Based Approach��������������������������������������������   18
2.2 Character Recognition����������������������������������������������������������������������   21
2.2.1 Pre-processing����������������������������������������������������������������������   22
2.2.2 OCR-Based Character Recognition��������������������������������������   23
2.2.3 Low-Level Image Features ��������������������������������������������������   24
2.2.4 Text Extraction����������������������������������������������������������������������   25
2.2.5 Other Studies������������������������������������������������������������������������   27
2.3 OCR Technique��������������������������������������������������������������������������������   27
2.3.1 Low-Level Image Feature ����������������������������������������������������   28
2.3.2 Text Extraction����������������������������������������������������������������������   28
2.4 Deep Learning Methods for Spam Detection ����������������������������������   31
2.5 Prototypes ����������������������������������������������������������������������������������������   32
2.5.1 HoneySpam��������������������������������������������������������������������������   32

vii
viii Contents

2.5.2 Phonetic String Matching ����������������������������������������������������   33


2.5.3 ProMail ��������������������������������������������������������������������������������   33
2.5.4 Zombie-Based Approach������������������������������������������������������   34
2.5.5 SMTP Logs Mining Approach����������������������������������������������   34
2.6 Previous Works ��������������������������������������������������������������������������������   34
2.6.1 Integrated Approach��������������������������������������������������������������   35
2.7 Research Gap������������������������������������������������������������������������������������   36
2.8 Summary ������������������������������������������������������������������������������������������   37
References��������������������������������������������������������������������������������������������������   37
3 Methodology ��������������������������������������������������������������������������������������������   43
3.1 Introduction��������������������������������������������������������������������������������������   43
3.2 Proposed Design ������������������������������������������������������������������������������   45
3.2.1 Data Set��������������������������������������������������������������������������������   45
3.2.2 Corpus ����������������������������������������������������������������������������������   47
3.2.3 Preprocessing������������������������������������������������������������������������   48
3.3 Experimental Set-Up and Performance Evaluation��������������������������   49
3.3.1 Performance Evaluation Measures—Character
Segmentation and Recognition ��������������������������������������������   49
3.4 Summary ������������������������������������������������������������������������������������������   51
References��������������������������������������������������������������������������������������������������   51
4 Character Segmentation��������������������������������������������������������������������������   55
4.1 Introduction��������������������������������������������������������������������������������������   55
4.2 Proposed Hybrid-Based Character Segmentation����������������������������   55
4.2.1 RGB to Greyscale ����������������������������������������������������������������   56
4.2.2 Binarization and Removal of Connected Components ��������   56
4.2.3 Discrete Wavelet Transform (DWT) ������������������������������������   58
4.2.4 Hough-Based Line and Character Segmentation������������������   60
4.2.5 Spatial Frequency Correlation����������������������������������������������   61
4.2.6 Overall Hybrid Algorithm����������������������������������������������������   62
4.3 Experimental Results and Analysis��������������������������������������������������   63
4.3.1 Experimental Set-Up������������������������������������������������������������   63
4.3.2 Experimental Task����������������������������������������������������������������   64
4.3.3 Results of Preprocessing Component ����������������������������������   64
4.3.4 Results of Character Segmentation Component ������������������   65
4.4 Summary ������������������������������������������������������������������������������������������   67
References��������������������������������������������������������������������������������������������������   68
5 Character Recognition����������������������������������������������������������������������������   71
5.1 Introduction��������������������������������������������������������������������������������������   71
5.2 Proposed Method—Using a Combination of Text Recognition
and Visual Feature Extraction for Character Recognition����������������   71
5.2.1 Contour Analysis������������������������������������������������������������������   72
5.2.2 Improved Local Binary Pattern��������������������������������������������   72
5.3 Experiment����������������������������������������������������������������������������������������   74
Contents ix

5.3.1 Experimental Set-Up������������������������������������������������������������   74


5.3.2 Results of Thinning/Contour Extraction������������������������������   75
5.3.3 Results of Vector Representation������������������������������������������   75
5.3.4 Results of Average Gradient Magnitude of Contour
Pixels������������������������������������������������������������������������������������   75
5.3.5 Results of Gradient Direction Variance of Contour
Pixels������������������������������������������������������������������������������������   77
5.3.6 Results of Number of Contour Pixels ����������������������������������   77
5.3.7 Results of Character Recognition ����������������������������������������   78
5.4 Summary ������������������������������������������������������������������������������������������   78
References��������������������������������������������������������������������������������������������������   79
6 Classification/Feature Extraction Using SVM and K-NN
Classifier ��������������������������������������������������������������������������������������������������   81
6.1 Introduction��������������������������������������������������������������������������������������   81
6.2 Proposed Method: A Complete Character Segmentation
Detection ������������������������������������������������������������������������������������������   81
6.2.1 Feature Extraction����������������������������������������������������������������   81
6.2.2 SVM��������������������������������������������������������������������������������������   83
6.2.3 Nearest Neighbour Search����������������������������������������������������   83
6.3 Experiment����������������������������������������������������������������������������������������   84
6.3.1 Experimental Set-Up������������������������������������������������������������   84
6.3.2 Results of K-NN and SVM Classifier����������������������������������   84
6.4 Summary ������������������������������������������������������������������������������������������   85
Reference ��������������������������������������������������������������������������������������������������   86
7 Experimentation and Result Discussion������������������������������������������������   87
7.1 Introduction��������������������������������������������������������������������������������������   87
7.2 Evaluation ����������������������������������������������������������������������������������������   87
7.3 Experimentation��������������������������������������������������������������������������������   89
7.4 Results Discussions��������������������������������������������������������������������������   89
7.4.1 HAM Images������������������������������������������������������������������������   95
7.5 Summary ������������������������������������������������������������������������������������������   96
References��������������������������������������������������������������������������������������������������   96
8 Conclusion������������������������������������������������������������������������������������������������   99

Appendixes�������������������������������������������������������������������������������������������������������� 101

Index������������������������������������������������������������������������������������������������������������������ 111
Chapter 1
Introduction

1.1 Introduction

With the present advancement in internet, there is an increased utilization of email


communication which has become one among the fastest modes of communica-
tions. However, an increase in the usage of email communication has led to the
increased rate of spam-based issues all over the world. According to Rekha and
Negi [Rek, 14] around 90% of emails that arrive at the mailbox of email users are
spam emails wherein these emails contain junk information that tends to affect the
normal computing utilities of email users. While spam emails are generally based
on advertising content, in many cases they also contain malicious code and virus
which might harm the users’ account [Fir, 10]. With advancements in the technolo-
gies to detect email spam emerged, spammers developed the concept of image
spamming which tends to complicate the processes of detecting spam in image
mails. Though previous researchers attempted to develop novel techniques for the
detection of image spam, there is still a gap to develop an efficient image spam
detection system which could detect spam in images wherein the scalability of the
method should improve despite the type of image spam that is sent. In this regard,
the present research in this chapter introduces a brief overview about the research
topic, a clear understanding on the concepts of spam detection, the characteristics of
spam detection, the challenges in development of image spam detection, the contri-
bution of the research and outline that will be followed in the research.
Email communication is the most prominent way of communicating with others.
Global email account raised from 3.3 billion in 2012 to 4.3 billion in 2016 [Rad, 12]
with 6% yearly growth rate. In this regard with such an alarming usage of email
communication, managing emails against fraudulent activities has become an
important task. The unwanted emails which are sent to the users are considered as
spam messages. A spam mail is defined as an unsolicited/irrelevant/unwanted mail
message received by users [Kam, 10]. Spam emails usually contain commercial or

© The Editor(s) (if applicable) and The Author(s), under exclusive license to 1
Springer Nature Switzerland AG 2021
M. Rajalingam, Text Segmentation and Recognition for Enhanced Image Spam
Detection, EAI/Springer Innovations in Communication and Computing,
https://doi.org/10.1007/978-3-030-53047-1_1
2 1 Introduction

profitable campaigns of uncertain products, dating services, get-rich-quick schemes


and advertising. Spam emailing is also used to spread malicious or virus codes and
is intended for fraudulence in financial transaction or phishing. Spamming is
­considered to regulate losses over the internet especially when they tend to turn
malicious for business organizations. Several losses are mostly collateral damages
not focusing on a particular network or any organization. Spam emails occupy more
network bandwidth during transmission. It also consumes user time in terms of
searching. Statistical reports show, as of December 2014, spam messages accounted
for 66.41% of email traffic worldwide, and Asia constitutes 54% of the total
­percentage [Sta, 17]. A recent study by Biggio et al. [Big, 11] reveals the fact that
most of the users receive more spam emails than non-spam emails.
Spam is unwanted, unsolicited commercial email and messages sent massively,
directly or indirectly, by an unknown sender which clutters inbox and affects email
server [Bos, 14; Das, 14]. Emails that the recipient does not like to get are called
spam emails. A huge number of same messages is forwarded to many receivers by
email. Growing amount of similar spam emails is creating grave issues for internet
service providers, internet users and the entire internet backbone network. One of
the instances of this may be refusal of service where spammers cause an enormous
traffic to an email server, thereby deferring valid messages to reach planned receiv-
ers. Spam emails not just squander sources like bandwidth, storage space and
­computation force, but may include deceitful plans, false proposals, and strategy.
Other than that, the time and zeal of email recipients is squandered. Because they
have to trace valid emails among the spam and take steps to get rid of the spam, it is
an extremely hard job to handle spam and categorize it. Besides, one pattern cannot
deal with the issue as fresh spam is continuously coming up and these spams are
time and again energetically customized. Therefore, they are not discerned includ-
ing furthermore obstacle to exact discovery [Rek, 15].

1.2 Characteristics of Image Spam

Though text-based spam emails are detected by most methods of email spam detec-
tion, spammers have identified new routes towards sending spam messages through
images. Such a form of sending spam messages through images is called as image
spamming, and images embedded with spam characteristics are known as spam
images or Image spam. Most algorithms find it easy to identify spam in text email.
However, the same in image spam emails is a daunting task. A spam image carries a
message which is intended to reach client systems and displays the same. One another
complexity of spam detection techniques is though they are better methods to detect
spam; they may also intend to block ham messages wherein the process is known as
false positive [Meh, 08]. The characteristics of image spam are shown in Table 1.1.
However, detection of image spam is a difficult task as the messages or token (char-
acters) is embedded within the images. The token or character embedded in the image
needs to be extracted and should be converted (also known as character recognition)
1.3 Problem Statement 3

Table 1.1 Characteristics of Image spam


Characteristics Description
Image spam is text All spam image emails contain text messages which are intended to depict
messages with noise the information shared by the spammer. Most spam images are
advertisements and are generally blacklisted (e.g. Cialis pills, drug store,
stock tip)
Image spam are Spammers take utmost care to uniquely design each spam image using
distinct from one sophisticated algorithms so as to ensure that the image spam is unique.
another Several techniques have been utilized so as to arrange the elements in the
image spam email such as noise, background, colours of the fonts and so
on. Adding these features transform the image spam complex to be
identified [Fum, 06]
I-spam messages I-spam utilizes MIME for transporting the attached image data with the
use HTML HTML formatting and text which is non-suspicious. Such text is different
effectively from what is actually present in the email
I-spam messages are The colour space of natural images is smooth and hence is distinct from
different from image spam messages which are generally sharp and clear objects
natural images

into ASCII form. Character recognition within an image is indeed a challenging task
as it involves image processing as the first process which involves character segmenta-
tion to mark the character in the image and the second process known as character
recognition which is to convert the marked character into ASCII form. In the final
process, ASCII forms are ready to be processed for identifying spam emails. Detecting
spam emails especially image spam as shown in Fig. 1.1 is the focus of the present
research which is a challenging task when compared with other conventional spam
detection techniques.

1.3 Problem Statement

The problem of spam detection has acquired immense attention wherein specific
challenges such as text classification or categorization require attention. Though
researchers have addressed such challenges in a more generic manner, following are
the problems faced:
1. Spammers all over the world tend to create new techniques to spam through
images and text.
2. Text embedded in images were subjected to noise such as background pattern,
colour, font variations and imperfections in a font size so as to eliminate the
chances of being identified as spam by filtering techniques.
Hence, an algorithm to appropriately detect image spam emails should be pro-
posed which became the premise of the present research; however, this requires the
combination of one or more algorithms and the development of a system which
could appropriately detect image-based spam mail. In this regard, any image-based
4 1 Introduction

Fig. 1.1 Sample spam


email. (a) Text-based spam
email; (b) image-based
spam email

spam detection method takes into consideration three major processing steps that
could regulate image spam detection. Firstly, character segmentation is the prelimi-
nary task performed in the process flow of spam detection. Character segmentation
is the process that marks or segmented every character in the image. According to
Casey and Lecolinet [Cas, 96], character segmentation is a procedure in which a
considered image is decomposed into sub-images possessing individual symbols of
the text. Character segmentation which is the first procedure in the proposed system
should take into consideration several criteria which are as follows: Source adopted
from [Cas, 96].
1.4 Objectives 5

Steps for Character Segmentation


1. Identify the pattern of characters provided in image spam with the resem-
blance of symbols in a system.
2. Character pattern matches should be appropriate. For example, both ‘cl’
and ‘d’ may look the same in an image.
3. Cursive character patterns should also be identified accurately.

Owing to the text differences including style, size, alignment, less contrast and
complex background image, segmentation technique turned into an exigent task
which implies the need for an algorithm that could detect line and curve separating
each alphabet in the image.
Once each character/object in the image is segmented, the next step is to identify
the marked object and change to character (ASCII form). This is known as character
recognition. Character recognition is a technique which involves classification of
input formation on the basis of requirements of the systems which are imposed
­during such classification. Character recognition is performed with the context that
not always shall the decision taken for recognition is accurate, but character recog-
nition techniques should impart some algorithms that could recognize a character
with better accuracy. This is better explained as follows:
‘Assume a set M of objects which are segregated into n- different non-­intersecting
subsets known as characters or object classes. Each character is designated by a
character description x which should be compiled as a multi-dimensional vector.
Object description should not necessarily be unique and may correspond with other
classes of objects’ [Nad, 15]. In general, characters are typically monotonic on a
fixed background, and hence character recognition in images is potentially far more
complicated which includes other possible variations such as changes in back-
ground, lighting, texture and font.
Once character segmentation and character recognition are fully operational, the
next step will be to combine them as a single image spam detection system. The
combined system should enable identification of an image mail as Ham or Spam.
The refined extracted characters should be preprocessed for email detection.

1.4 Objectives

The objectives of this research are mentioned as follows:


• To design an efficient character segmentation, recognition and spam detection
algorithm for the segmentation and recognition of image spam emails using
improvised DWT, Hough transforms along with spatial frequency cross-­
correlation for automatic segmentation, contour analysis with an improved local
binary pattern for text recognition and improvised SVM and KNN classifiers for
visual feature extraction.
6 1 Introduction

• To analyse the proposed technique’s performance using precision, f-measure,


recall and accuracy.
• To evaluate the limitations of the proposed research thereby recommending
future researches.

1.5 Motivation

The number of spam messages are increasing in present days that hinder the normal
operations of mail users. With the development of new techniques to restrict
­text-­based spam messages, spammers identified new techniques wherein spam
images are embedded in images and are sent to email users. Though there is immense
literature that attempted to mitigate the issues arising out of image spam, there is
still an unaddressed gap which is the inability of algorithms and techniques pro-
posed to identify spam emails from legitimate emails. A need persists to devise a
novel technique which could recognize image spam emails which motivated the
researcher to identify the various techniques used till date and the development of a
novel algorithm-­based technique to recognize ham and spam image mails.

1.6 Research Contribution

Nowadays the number of online spamming cases is increasing which is hazardous


to safe internet utilization. Spam that is created in excessive amounts is an issue for
the reduction of information quality and creates a concern for web users. Spammers
utilize image-based email for the collection of private data and perform phishing
attacks. There is hence a need to develop a system which could appropriately detect
image spam and neglect ham images which is the motivation for the present research.
In this context, various techniques were examined from the literature, and the solu-
tion to image spamming issue is the combination of various techniques which on the
whole could contribute to better image spam identification.
The current work has a major contribution to developing an image-based spam
detection system which is a combination of character segmentation, recognition and
classification (CSRC) that could detect whether an email (text and image based) is
a spam mail or not. In this regard, the present research is presented with three meth-
ods. The proposed methods are distinct to each other and present three contributions
to the body of knowledge thereby achieving the outlined research objectives. The
contributions of this study are as follows:
• This study has the major motive which is to solve the spam detection from the
image which can be easily by proposing a novel image spam filtering technique
that is scalable and adaptable. The framed detection approach made extraction of
embedded text along with colour, texture, shapes which are utilized to estimate
1.7 Research Scope 7

similarity with the query image. The feature extractions are utilized to train the
classifier that classifies the online message as spam or authorized.
• We propose a novel unified-step framework in image spam detection based on
the combination of robust and improvised DWT, Hough transforms along with
spatial frequency cross-correlation for automatic segmentation, while contour
analysis with an improved local binary pattern for text recognition. Visual feature
extraction using improvised SVM and KNN classifiers. Thus, the present
research proposed a spontaneous, constant, rapid response automatic segmenta-
tion, feature extraction and classification to detect spam from the images and the
text. The proposed method was compared with other traditional methods.
• A novel algorithm DWT with skew detection for character segmentation was
proposed. Character segmentation from images are done using DWT, which
includes morphological dilation operators and the logical AND operators to
remove the non-text regions, and Hough transforms along with spatial frequency
cross-correlation. Further, to reduce the size of images, skew detection specifi-
cally applying a fusion of Hough transform with spatial frequency cross-­
correlation was proposed. Previously skew detection algorithms such as Hough
transforms, clustering, projection profiles, wavelet decompositions, morphology,
moments, space parallelograms and Fourier analysis work on the assumption
that images are black and white and enhanced for documents among which text
is prominent and aligned in the form of parallel straight lines. However, previous
algorithms could not make an exact solution in case of its usage in suitable docu-
ments. For skew detection, specifically, Hough transform with spatial frequency
cross-correlation was proposed. The fusion-based proposed method considers
polygons. Image’s structure or texture and threshold for separating it into poly-
gons or connected areas.
• The research proposed contour analysis with an improved local binary pattern
for text recognition and visual feature extraction. To acquire the image’s smooth
contours, double filter bank, Laplacian pyramid (LP), directional filter bank
(DFB) provide better multiscale decomposition and remove the low frequency.
LBP considers the effects of central pixels, and presents complete structure pat-
terns to enhance the discriminative ability.
• The extracted features are classified using SVM with a KNN classifier. KNN was
used to extract features by predicting the nearest neighbour SVM and analyse the
data for classification and regression.
• The proposed methods have both training and testing phase.

1.7 Research Scope

The goal of this research is to improve the accuracy of email spam detection. More
precisely, the present research tends to assess the different methods that are capable
of identifying individual text and image-based emails; however, image-based spam
detection is the main focus of the research. The project hence limits its scope
8 1 Introduction

towards identifying image-based spam emails and does not intend to identify the
entity that actually spreads spam messages. Email legitimacy is determined by the
proposed approach. Furthermore, the proposed approach is a new contribution to
secure email usage as detection accuracy of proposed technique outperformed the
existing approaches.

1.8 Novelty and Significance

The present research has its novelty towards manipulating several techniques of
character segmentation and recognition wherein spam images are recognized using
shape-based feature extraction methods. Such combination of techniques such as
DWT and Hough transforms, and Template matching and Contour analysis is a
­relatively new method in the field of research wherein the proposed model is also
hypothesized to bring better results in terms of accuracy of spam detection. This
method is also significant towards bringing insights for future researchers to ­conduct
research. However, for the improvisation of the performance of the segmentation
and recognition processes, additional methods are used such as spatial frequency
cross-correlation, improved local binary pattern and so on.

1.9 Outline of the Chapters

The book is organized into eight chapters with appendices.


Chapter 1 outlines an introduction to the text and image-based email classifica-
tion, followed by the motivations and problem statement, research objectives,
research scope, novelty and significance of the research.
Chapter 2 is the literature review to identify the strengths and limitations of the
current text and image-based email classification approaches wherein previous
researchers are assessed and explored. It first elaborates on the concepts and defini-
tions pertaining to the present research topic and describes the overview of spam
detection, character segmentation and character recognition methods. Furthermore,
the chapter provides a detailed description of image-based email detection tech-
niques. The limitations of previous researches examined are detailly explained in
the research gap and the summary of the chapter.
Chapter 3 covers the information on the data sets used, and the steps involved in
preprocessing performed in the present research.
Chapter 4 presents in detail an enhanced character segmentation algorithm that
improves the detection efficiency of image-based emails using the hybrid approach
of DWT and Hough transform methods with pixel count analysis technique; in
addition, the researcher used Spatial frequency cross-correlation to improve the
­processes of segmentation wherein text is segmented from the image-based email
efficiently. The component of the algorithm and their functions are discussed.
References 9

The experimental results of this algorithm are also presented and compared with
related methods of the literature in terms of segmentation accuracy wherein the
performance of the algorithm used is assessed.
Chapter 5 presents in detail the processes involved in character recognition. The
segmented characters are corrected using skew detection and correction. The
­combined approach of template matching and contour analysis is used to recognize
the character wherein error corrections and improved local binary pattern will be
applied. The components of this algorithm and their functions are discussed. In
addition, the experimental outcomes of the framed technique are presented and
­collate with related methods in terms of recognition accuracy which is a means to
examine the accuracy of the proposed algorithm.
Chapter 6 presents in detail a detection algorithm for image-based ham/spam
emails using classification/feature extraction using SVM and KNN classifier. The
structure and texture of an image will be examined, and the detection technique
encompasses optimisation, nearest neighbour search, handling inconsistent
­constraints and error corrections. The proposed technique’s performance is also
assessed.
Chapter 7 discusses the entire approach with the discussion of the different
­algorithms used followed by testing the entire system based on parameters such as
False Positive (FP), False Negative (FN), True Positive (TP), True Negative (TN),
Recall and Precision which are used to evaluate the performance of the pro-
posed work.
Chapter 8 furthermore concludes the investigation and suggests recommenda-
tions for the upcoming task with esteem to this research.
‘Appendix’ section covers snippets of code used in the image-based ham/spam
detection approach.

References

[Rek, 14] Rekha, & Negi, S. (2014). A review on different spam detection approaches.
International Journal of Engineering Trends and Technology, 11(6), 315. Retrieved from
http://www.ijettjournal.org/volume-11/number-6/IJETT-V11P260.pdf.
[Fir, 10] Firte, L., Lemnaru, C., & Potolea, R. (2010). Spam detection filter using KNN algorithm
and resampling. In: Proceedings of the 2010 IEEE 6th International Conference on Intelligent
Computer Communication and Processing. [Online]. August 2010, IEEE. Retrieved from
http://ieeexplore.ieee.org/document/5606466/.
[Rad, 12] Radicati, S., & Hoang, Q. (2012). Email statistics report. [Online]. PALO
ALTO. Retrieved from http://www.radicati.com/wp/wp-content/uploads/2012/04/Email-
Statistics-Report-2012-2016-Executive-Summary.pdf.
[Kam, 10] Kamboj, R. (2010). A rule based approach for spam detection. Patiala: Thapar
University.
[Sta, 17] Statista. (2017). Global spam volume as percentage of total e-mail traffic from January
2014 to September 2016, by month. [Online]. 2017. The Statistics Portal. Retrieved January 3,
2017, from http://www.statista.com/statistics/420391/spam-email-traffic-share/.
10 1 Introduction

[Big, 11] Biggio, B., Fumera, G., Pillai, I., & Roli, F. (2011). A survey and experimental evaluation
of image spam filtering techniques. Pattern Recognition Letters, 32(10), 1436–1446. Retrieved
from http://linkinghub.elsevier.com/retrieve/pii/S0167865511000936.
[Bos, 14] Bosworth, S., Kabay, M. E., & Whyne, E. (2014). Computer security handbook, set
(6th ed.). New York: Wiley.
[Das, 14] Das, M., & Prasad, V. (2014). Analysis of an image spam in email based on content
analysis. International Journal on Natural Language Computing, 3(3), 129–140. Retrieved
from http://www.airccse.org/journal/ijnlc/papers/3314ijnlc13.pdf.
[Rek, 15] Rekha, & Negi, S. (2015). A review on different glaucoma detection. International
Journal of Engineering Trends and Technology., 11(6), 2–7.
[Meh, 08] Mehta, B., Nangia, S., Gupta, M., & Nejdl, W. (2008). Detecting image spam using
visual features and near duplicate detection. In Proceeding of the 17th international con-
ference on World Wide Web—WWW ‘08 (pp. 497–506). New York, NY, USA: ACM Press.
Retrieved from http://portal.acm.org/citation.cfm?doid=1367497.1367565.
[Fum, 06] Fumera, G., Pillai, I., & Roli, F. (2006). Spam filtering based on the analysis of text
information embedded into images. Journal of Machine Learning Research, 7(1), 2699–2720.
Retrieved from http://www.jmlr.org/papers/volume7/fumera06a/fumera06a.pdf.
[Cas, 96] Casey, R., & Lecolinet, E. (1996). A survey of methods and strategies in character seg-
mentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7), 1–31.
Retrieved from http://perso.telecom-paristech.fr/~elc/papers/pami96.pdf.
[Nad, 15] Nadeem, D., & Rizvi, S. (2015). Character recognition using template matching. New
Delhi: Jamia Millia Islamia. Retrieved from https://pdfs.semanticscholar.org/c1b5/dcd918da-
02f72a9579ed5eeeab111da3c7cb.pdf.
Chapter 2
Review of Literature

This chapter encompasses the overall review of spam emails and the variety of
existing email classification techniques used for spam detection with intense
­analysis of their strengths and weaknesses. Furthermore, the chapter elucidates the
concept of text-based email classification with various machine learning approaches
including Naïve Bayes, Decision Tree and SVM (Support Vector Machine). The
chapter also presents a detailed description of image segmentation, character recog-
nition and image-based email detection which is deliberated as the foundation of the
present research.

2.1 Character Segmentation

Segmentation technique divided the digital image into various multiple segments.
The major motive of this technique is to demonstrate the image in a simple manner
and for the easier analysing purpose. Segmenting cursive characters from the image
is a difficult task, and segmenting low-resolution characters is also challenging in
document image processing. Detecting empty space from the document image is a
problematic responsibility in character segmentation process.
The study of machine learning is a subfield from artificial intelligence, with an
intention to make technologies capable of absorbing like that of a human brain.
Knowledge through machine learning means to observe, understand and signify
information about the statistical occurrence. Unsupervised learning algorithms try
to find out the unseen orderliness (clusters) or identify the abnormalities in data
such as spam messages or network interference. In the filtration of emails there may
be some sack of words or the subject-line investigation. There are two important
features in email classification which are typically separated into numerous s­ ubtasks.
Firstly, group of data and its demonstration are regularly problematic in p­ articular

© The Editor(s) (if applicable) and The Author(s), under exclusive license to 11
Springer Nature Switzerland AG 2021
M. Rajalingam, Text Segmentation and Recognition for Enhanced Image Spam
Detection, EAI/Springer Innovations in Communication and Computing,
https://doi.org/10.1007/978-3-030-53047-1_2
12 2 Review of Literature

(i.e. email messages). Secondly, email feature selection and future deduction
­challenge to decrease the features quantity for durable task steps. Authentic map-
ping within the training and testing set has been identified by email classification
phase. Machine learning techniques used to serve the aforementioned tasks are
elaborated in the following section. Character segmentation is categorized into two
subsections: classifier-based and non-classifier-based techniques.

2.1.1 Classifier-Based Approach

2.1.1.1 Naive Bayes Classifier

The first Naïve Bayes classifier for spam recognition is proposed in 1998. Bayesian
classifier operates on proceedings which are dependent and possibility of event
which may occur in the future or which can be identified after the earlier happening
of the similar occasion [Alm, 11]. This procedure sorts spam emails by analysing
the words in the mail as a central rule and also checks for the words that have been
frequently occurring in the spam and in the ham. If there are repetitive words found
in the mail, then the received email is declared as spam. Naïve Bayes classifier
method has been developed as a widespread technique for email filtration, and
Bayesian filter is qualified to work successfully. Each word has certain possibilities
that occur in the ham or spam email in the database. Doubt in the chances of the
whole count of words exceeds a definite border; the filter places the email to any
group (ham/spam). There are just two groups of emails. It could be either a spam or
ham. In effect, all the spam filters grounded on stats employ Bayesian likelihood
computation to include definite token’s info to a universal score [Awa, 11a]. On the
basis of the universal score, conclusions are drawn. The info is generally interesting
for a token T because of its spamming or rating of spam (score) that is computed as
shown below:

Cspam ( T )
S [T ] = (2.1)
Cspam ( T ) + CHam ( T )

where Cspam(T) and CHam(T) are the count of spam or ham messages including
token T, respectively. To estimate the chance for a message M with tokens {T1 …
TN}, individual desires to connect the individual token’s spamming to determine the
whole message spamming [Awa, 11a]. A humble method to produce categorizations
is to estimate the item of individual token’s spamming and to match it with the item
of individual token’s hamminess that is furnished below:

 N

 H [ M ] = ∏ (1 − S [TI ])  (2.2)
 I =1 
2.1 Character Segmentation 13

An email communication is indicated as spam if the comprehensive spamming


items S [M] is higher than the hamminess items H [M]. The overhead interpretation
is employed in the algorithm mentioned below [Awa, 11b].

Stage 1: Training

Examine for every email into its essential tokens.


Create a probability for every token W

Cspam ( W )
S [W ] = (2.3)
(C (W ) + C (W ) )
ham spam

Store spamminess values to a database.

Stage 2: Filtering

For each communication or message M.


While (M not end) do
Scan message for the following token Ti
Request the database for spamminess S(Ti)
Compute collected communication or message probabilities
S[M] and H[M].
Compute the overall communication or message filtering sign by:
I[M] = f(S[M], H[M]).
f is a filter-dependent function,
such as I[M] = (1 + S[M] – H[M])/2.
If I[M] > threshold
msg is noticeable as spam
else
msg is noticeable as ham
Researcher [Vij, 18] used Naïve-based classifier with three-layer framework for
detecting bulk spam email. They experimented with real-time data set for the detec-
tion of legitimate and spam email. To improve the accuracy, feature extraction was
used to extract the features based on bucket classification. Self acknowledgeable
Internet Mail System was implemented to know the status of the sender mail.

2.1.1.2 K-Nearest Neighbour Classifier

The k-nearest neighbour (K-NN) classifier is an example grounded one that employs
the instructing records for the evaluation than an obvious group exhibition that includes
the classification summaries employed by earlier classifiers; however, there is no real
14 2 Review of Literature

preparation phase. Whenever there is a necessity to categorize a new record, the k most
similar records (neighbours) are created to be verified, if the large section of them are
assigned to a certain group, the new record is assigned to this group. Besides, finding
out the nearest neighbours can quicken the employment of conventional indexing pro-
cedures. The categorization of spam or ham messages is determined with the category
of the messages that are so near to it. The assessment among the vectors is a real
method [Ger, 17]. This is the notion of the k-nearest neighbour algorithm:

Stage 1: Training

Keep the training email or communication.

Stage 2: Filtering

Provide a communication or message x, command its k-nearest neighbours betwixt


the emails in the preparation group. If there is further unsolicited email or spam
preparation group or if there are further spam betwixt these neighbours, categorize
provided email or message as spam. Or else, categorize it as ham.
Indexing procedure decreases the duration of comparability with the intricacy of
O (m), where m refers to model dimension. The procedure is in addition mentioned
as memory-grounded classifier as all the preparation instances are saved in the
memory [Pat, 13b; Kho, 07]. One issue with the submitted algorithm is that there is
no such criterion which could perhaps decrease the count of fake positives. However,
the issue may simply be dealt with by altering the categorization principle to the
l/k-rule mentioned below: If I or further messages among the k-nearest neighbours
of x are spam, categorize x as spam, or else, categorize it as valid emails. The k-­
nearest neighbour principle has detected broad usage in common categorization
jobs. In addition, it is one of the scarce globally persistent categorization principles.

2.1.2 Artificial Neural Networks Classifier

An artificial neural network (ANN), alias ‘Neural Network’ (NN), is a computa-


tional grounded simulation of biological neural network methods that functions on
the rule of studying by instance [Meh, 17]. NN is a flexible arrangement that
­comprises an interrelated gathering of artificial neurons in order to alter its frame-
work grounded on info that runs by means of the network throughout the studying
stage. Although there are several types of neural networks, the traditional kinds are
perceptron and multilayer perceptron.
The concept of the perceptron is to find a linear use of the feature vector
f(x) = wTx + b in a manner that f(x) > 0 for vectors of one category [Mar, 09] and
f(x) < 0 for vectors of other category. Here, w = (w1, w2, … wm) is the vector of
2.1 Character Segmentation 15

coefficients of the formula, and b is bias. The algorithm indicates the categories by
numbers +1 and −1, declares that hunt for a decision function d(x) = sign (wTx + b).
The perceptron studying is carried out with an iterative algorithm begins with arbi-
trarily selected criteria (w0, b0) of the conclusion and updates them iteratively. On
the nth iteration of the algorithm, a training sample (x,c) is selected in a manner
that the present decision function does not categorize it properly (i.e.
sign(wnx + bn) ≠ c). The criteria (wn,bn) are then updated employing the principle:

wn + 1 = wn + cx (2.4)

bn + 1 = bn + c (2.5)

The algorithm ceases when a decision function is detected that rightly catego-
rizes all the training samples. The above explanation is employed in the algorithm
given below [Car, 06]:

2.1.2.1 Stage 1: Training

Initialize w and b (to random value or to 0).


Determine a training example (x,c) for which sign (wTx + b)
If there is no such example, then training is finished
Save the final w and halt
Or else go to next step
Update (w,b): w:=w + cx, b:=b + c Go to earlier step.

2.1.2.2 Stage 2: Filtering

Provided a message x, find its category as sign (wTx + b)

2.1.3 Support Vector Machines Classifier

Support Vector Machines are applied with the idea of conclusion planes that describe
conclusion borders [Tia, 12]. A conclusion plane divides group of objects as catego-
ries possessing various memberships at which point the SVM modelling algorithm
ascertains a maximum hyper plane with the highest edge to divide two categories
that needs unravelling the optimization problem mentioned below:
Maximize
n n

∑α α α y y K ( x ,x )
2∑
i −1 i j i j i j (2.6)
i =1 i , j =1
16 2 Review of Literature

Subject to
n

∑α y
i =1
i i =0

where 0 ≤ α i ≤ b, i = 1, 2,.…n

where αi is the weight of training sample x1. If αi > 0, x1 is known as a support


vector, b is a principle criterion employed to compromise the training precision and
the sample intricacy in order to attain the supreme generalization ability. The kernel
function K gauges the conformity betwixt two models. A famous radial basis func-
tion (RBF) kernel function,

( )
K ( xi ,x j ) = exp −γ xi − x j 2 , γ > 0

After the weights are determined [Say, 11], a test sample x is classified by

 n 
y = sign  ∑α i yi K ( xi ,x j )  ,
 i =1 
(2.7)
+1, if a > 0
Sign ( a ) = 
−1, otherwise

A cross-verification procedure is carried out to ascertain the values of <γ, b> on


the training data set. Cross-verification calculates the generalization ability on fresh
models which are not in the training data set. A k-fold cross-verification unsystem-
atically divides the training data set into k roughly same-sized subsets, drops one
subset, constructs a classifier on the balance models in order to assess the categori-
zation execution on the new subset [And, 17]. This procedure is iterated k times for
every subset to attain the cross-verification execution over the entire training data
set. If the training data set is huge, a diminutive subset can be employed for cross-­
verification to reduce calculating charges. The algorithm mentioned below can be
employed in the categorization procedure.

Input: Sample x to classify.


Training set T = {(x1,y1), (x2,y2), … (xn,yn)};
Number of nearest neighbours k.
Output: Decision yp∈{−1,1}.
Find k sample (xi,yi) with minimal values of K(xi,xi) – 2 *K(xi,x).
Train an SVM model on the k-selected samples.
Classify x using this model, get the result yp.
Return yp.
2.1 Character Segmentation 17

2.1.4 Decision Tree

A decision tree is a prognostic sample that widens a tree of decision and their likely
effects, containing possibilities of event results and source rates. The result of the
decision tree can be separate or as in case of regression trees conjunction of ­elements
results in the categorizations at diverse stages [Sar, 12]. Prevalent decision tree
studying procedures are C4.5, ID3 and J48.
The decision tree produced by C4.5 can be employed for diverse categorization
issues. The algorithm selects a quality at every node of the tree which can furthermore
divide the models into subsets. Every leaf node depicts a categorization or conclusion.
Certain premises direct this algorithm, like the ones listed below [Chr, 10]:
• If all instances are of the identical category, then the tree is a leaf and hence the
leaf is given back with the marked category.
• Compute the possible info for every quality (grounded on the chances of every
instance possessing a specific value for the quality).
• Compute info gain for every quality (grounded on the chances of every instance
with a specific value for the quality being of a specific category).
• Relying on the present choosing parameter, ascertain the best quality to branch on.
• J48 is an open resource execution of C4.5. Decision tree is constructed by exam-
ining data nodes that are employed to assess the importance of present elements.
J48 constructs decision trees from a group of training stats employing the idea of
info chaos. J48 verifies the standardized info gain that outcomes from selecting a
quality for dividing the stats. It employs the reality that every quality of info can be
employed to make a conclusion by dividing the stats into smaller subsets. J48
­classifier recursively categorizes till each leaf is clean. It means that the stat has been
classified as near to ideal as likely [Mah, 13]. Employing the idea of info entropy, J48
constructs decision trees from a group of training stats in the identical method as
ID3. The training stats is a set (S = s1, s2, …) of already categorized models. Every
model (si = x1, x2, …) is a vector, where x1, x2, … depict qualities or elements of the
model. The training stats is increased with a vector (C = c1, c2, …), where c1, c2, …
depict the category to which every model belongs. At every node of the tree, J48
selects a quality of the stats which most efficiently divides its group of models into
subsets adorned in one category or the other. Its parameter is the standardized info
gain (disparity in entropy) that ensues from selecting a quality for dividing the info.
The quality with the most standardized info gain is selected to make the conclusion.
The J48 algorithm then reoccurs on the smaller sublists [Kum, 17b].
This algorithm has some base instances:
• All the models in the list pertain to the identical category. When this takes place, it
plainly produces a leaf node for the decision tree saying to select that category.
• None of the elements gives any info gain. In this instance, J48 produces a deci-
sion node higher up the tree employing the anticipated value of the category.
• Case of formerly unseen category confronted. Moreover, J48 produces a decision
node higher up the tree employing the anticipated value.
18 2 Review of Literature

2.1.5 Non-Classifier-Based Approach

2.1.5.1 Discrete Wavelet Transform

Character partition algorithm in real-time DSP grounded licence plate employing


2D Haar Wavelet Transform could be employed [Wri, 17]. Improved image borders
and enhanced LP area detection for its aptness in real-time application are the com-
ponents of the algorithm. The Haar WT discerns three kinds of edges employing
one filter, whereas conventional procedures like Sobel would need above one mask
for the undertaking. DWT is a particular instance of sub-bank filtering and compu-
tation carried out employing filter bank. The sign is transferred through high-pass
and low-pass filters simultaneously to create filtered yield. The procedure of LP
detection is edge detection within LP area by means of greyscale differences to
verify the edges. DWT is a specialized case of sub-band filtering and calculation
done using filter bank. The signal is passed through high-pass and low-pass filters
concurrently to generate filtered output. The LP detection technique is the detection
of an edge in LP section through greyscale variation, and the Haar edges are com-
pared with greyscale variations to validate the edges. If edges are matched, then a
rectangle of connecting edges is drawn. Histogram analysis verifies the character
extraction and computes bounding box. The experimental results showed an
improvement in 2D Haar WT of character segmentation. Results proved that the
method could identify maximum edges in the image, less noise and increased char-
acter segmentation ratio. The challenging factor of character segmentation in licence
plate is due to raindrops, number plate broken due to accidents or uneven luminance.
Discrete Wavelet Transform and Gradient method extracted text from images. The
input image is preprocessed, and the Daubechies DWT is applied that attains edges
and texture in three different types [Sya, 14]. Compared with Haar wavelet,
Daubechies wavelet contains higher frequency coefficient spectrum. The signal has
been decomposed into LL, HL, LH, HH segments of frequency domains. In high
contrast of text region, Gradient difference technique was applied to show the differ-
ence from non-text regions. By using Otsu thresholding, non-textual information will
be removed. The drawback of the proposed method is lesser pixel value compared
with a global threshold value which is observed as noise and made the removal of the
text region. However, elimination of false positive remains as a challenging task.

2.1.5.2 Hough Transform

Text segmentation in the document image is based on Hough Transform techniques


[Sah, 10]. Image acquisition for document image recognition is digitized through
the scanner by manual process. The image is preprocessed to convert colour images
to greyscale image. Otsu’s method is applied to binarize the image and edges are
detected. The Hough transform is implemented for extraction of line and word as a
set of connected words and stored as bmp file for performance analysis [Gur, 13].
2.1 Character Segmentation 19

Generalized Hough transforms (GHT) has been applied for Arabic printed
d­ ocument segmentation [Aye, 17]. The voting process gives the Hough transform
forcefulness of missing edge points. Segmenting a character by recognition
­techniques, an indexed dictionary was created for character recognition. Dynamic
sliding window technique is used to recognize cursive Arabic characters. The
method is grounded on identifying starting and finishing characters of the sub-
words, then middle characters are detected. For every last character saved in the
dictionary, the similar method is replicated from left restriction of the starting char-
acter to recognize the character in the centre. GHT can be employed in OCR not just
to identify characters but in addition to search this particular quality for the Arabic
cursive character without renovating in the partition phase [Isl, 16]. For experimen-
tation, Arabic printed characters of different font, and different sizes were used
wherein 93% of recognition accuracy was achieved. Ali et al. [Ali, 15] proposed
document processing concept using optical character recognition system. This con-
cept works like storing the document in computer storage, then reading the content
and finally searching the content. For languages other than English to process the
information, they used a software called character recognition system.

2.1.5.3 Integrated Approach

A combined method of Licence plate detection is suggested by Panchal et al. [Pan, 16]
employing Harris Corner and character partition from a picture. As the result of open
structure, an Automatic Licence Plate Recognition (ALPR) has turned out being a
crucial investigation focal point. Many arrangements were presented for licence plate
recognition, and each procedure had its own specific aims of concern and restrictions.
The important measure in ALPR arrangement is the elaborate constraint of number
plate, partition, identification. Harris corner algorithm finishes being energetic in
altering movement and brightened lightning circumstances. The accuracy of licence
plate limitation is nurtured forward to the partition stage. The partition is carried out
by a procedure of linked element study united with pixel count, aspect proportion and
height of characters.
The good image and challenged image are taken for experimentation with the
outcome of the success rate of segmentation accuracy obtained at 93.84%.

2.1.5.4 Projection Profile-Based Technique

Projection profile-grounded method is a procedure for text partition employed right


away in run-length contracted, printed English text documents [Jav, 13]. Line parti-
tion is carried out employing the projection profile method. Furthermore, partition
into words and characters is achieved by tracking the white runs by the foundation
area of the text line. Throughout the procedure, a run-grounded area developing
method is used in the special vicinity of the white runs to track the perpendicular
gap betwixt the characters. After finding out the character gaps in the whole text
20 2 Review of Literature

line, the understanding of word gap and character gap is carried out by calculating
the mean character gap. Consequently grounded on the spatial place of the detected
words and characters, their respective contracted portions are taken out. For experi-
mentation, the procedure was tried with 1083 contracted text lines, and F-measure
of 97.93% and 92.86%, respectively, for word and character partition are acquired.
A character segmentation procedure employing projection profile-grounded
method was originated initially by Rodrigues et al. [Rod, 01]. Primary view deci-
sion tree algorithm for cursive script identification grounded on the usage of histo-
gram as a projection profile method was originated. A postal code picture info was
scanned and changed into a two-dimensional matrix depiction to be employed with
a group of algorithms to give complete scope partition. The problems were related
with quality and image handlings such as noise, distortion, variation in style, the
shift of the character, size of the character, rotation, variation in thickness and varia-
tion in texture. For experimentation, 200-dpi pictures were employed with a total of
4320 digits, presuming 8 by strap at which point the executed algorithm took out
3788 ways properly.
For experimentation, 200-dpi pictures were employed with a total of 4320 digits,
presuming 8 by strap at which point the executed algorithm took out 3788 ways
properly. A mixture method of text partition employing edge and texture element info
was suggested by Patel and Tiwari [Pat, 13a]. The texture elements like homogeneity,
difference and vitality for texts are dissimilar from non-text. The texture elements are
employed to discern the text area from picture. The edge-grounded textures possess
several needed elements. The grade magnitudes generally possess higher values in
the edge of the characters, even when the text is embedded in images.
Step 1: Change of colour picture to greyscale of picture employing,
Y = 0.299 * Red +0.587 * Green +0.119 * Blue.
Step 2: Edge detection is carried out by 3*3 Sobel operator.
Step 3: A threshold is employed for eradication of feeble edges.
Step 4: The edge picture is separated into non-overlapping blocks of m*m pixels.
Step 5: Compute the mean magnitude per pixel and mean grade magnitude
per pixel.
Step 6: Separate the filtered grey picture into m*m non-overlapping slabs.
Here, high-pass filter is employed to quash setting.
Step 7: Estimate the element homogeneity and contrast at 00, 450, 900, 1350
directions for every slab of first picture employing grey level co-happening network.
Step 8: Compute the mean of homogeneity and contrast for every slab.
Step 9: Filter the text slabs employing edge-grounded element and texture
elements.
Step 10: Combine the acquired text slabs.
The character identification method is further separated into two wide groups:
methods grounded on OCR devices, low-level picture elements and text extraction
are debated in the upcoming division.
Another random document with
no related content on Scribd:
After endeavouring in vain to induce opticians, both in London
and Birmingham, (where the instrument was exhibited in 1849 to the
British Association,) to construct the lenticular stereoscope, and
photographers to execute binocular pictures for it, I took with me to
Paris, in 1850, a very fine instrument, made by Mr. Loudon in
Dundee, with the binocular drawings and portraits already
mentioned. I shewed the instrument to the Abbé Moigno, the
distinguished author of L’Optique Moderne, to M. Soleil and his son-
in-law, M. Duboscq, the eminent Parisian opticians, and to some
members of the Institute of France. These gentlemen saw at once
the value of the instrument, not merely as one of amusement, but as
an important auxiliary in the arts of portraiture and sculpture. M.
Duboscq immediately began to make the lenticular stereoscope for
sale, and executed a series of the most beautiful binocular
Daguerreotypes of living individuals, statues, bouquets of flowers,
and objects of natural history, which thousands of individuals flocked
to examine and admire. In an interesting article in La Presse,[23] the
Abbé Moigno gave the following account of the introduction of the
instrument into Paris:—
“In his last visit to Paris, Sir David Brewster intrusted the models
of his stereoscope to M. Jules Duboscq, son-in-law and successor of
M. Soleil, and whose intelligence, activity, and affability will extend
the reputation of the distinguished artists of the Rue de l’Odeon, 35.
M. Jules Duboscq has set himself to work with indefatigable ardour.
Without requiring to have recourse to the binocular camera, he has,
with the ordinary Daguerreotype apparatus, procured a great number
of dissimilar pictures of statues, bas-reliefs, and portraits of
celebrated individuals, &c. His stereoscopes are constructed with
more elegance, and even with more perfection, than the original
English (Scotch) instruments, and while he is shewing their
wonderful effects to natural philosophers and amateurs who have
flocked to him in crowds, there is a spontaneous and unanimous cry
of admiration.”
While the lenticular stereoscope was thus exciting much interest
in Paris, not a single instrument had been made in London, and it
was not till a year after its introduction into France that it was
exhibited in England. In the fine collection of philosophical
instruments which M. Duboscq contributed to the Great Exhibition of
1851, and for which he was honoured with a Council medal, he
placed a lenticular stereoscope, with a beautiful set of binocular
Daguerreotypes. This instrument attracted the particular attention of
the Queen, and before the closing of the Crystal Palace, M. Duboscq
executed a beautiful stereoscope, which I presented to Her Majesty
in his name. In consequence of this public exhibition of the
instrument, M. Duboscq received several orders from England, and a
large number of stereoscopes were thus introduced into this country.
The demand, however, became so great, that opticians of all kinds
devoted themselves to the manufacture of the instrument, and
photographers, both in Daguerreotype and Talbotype, found it a most
lucrative branch of their profession, to take binocular portraits of
views to be thrown into relief by the stereoscope. Its application to
sculpture, which I had pointed out, was first made in France, and an
artist in Paris actually copied a statue from the relievo produced by
the stereoscope.
Three years after I had published a description of the lenticular
stereoscope, and after it had been in general use in France and
England, and the reflecting stereoscope forgotten,[24] Mr.
Wheatstone printed, in the Philosophical Transactions for 1852, a
paper on Vision, in which he says that he had previously used “an
apparatus in which prisms were employed to deflect the rays of light
proceeding from the pictures, so as to make them appear to occupy
the same place;” and he adds, “I have called it the refracting
stereoscope.”[25] Now, whatever Mr. Wheatstone may have done
with prisms, and at whatever time he may have done it, I was the
first person who published a description of stereoscopes both with
refracting and reflecting prisms; and during the three years that
elapsed after he had read my paper, he made no claim to the
suggestion of prisms till after the great success of the lenticular
stereoscope. The reason why he then made the claim, and the only
reason why we do not make him a present of the suggestion, will
appear from the following history:—
In the paper above referred to, Mr. Wheatstone says,—“I
recommend, as a convenient arrangement of the refracting
stereoscope for viewing Daguerreotypes of small dimensions, the
instrument represented, (Fig. 4,) shortened in its length from 8
inches to 5, and lenses 5 inches focal distance, placed before and
close to the prisms.”[26] Although this refracting apparatus, which is
simply a deterioration of the lenticular stereoscope, is recommended
by Mr. Wheatstone, nobody either makes it or uses it. The semi-
lenses or quarter-lenses of the lenticular stereoscope include a
virtual and absolutely perfect prism, and, what is of far more
consequence, each lens is a variable lenticular prism, so that, when
the eye-tubes are placed at different distances, the lenses have
different powers of displacing the pictures. They can thus unite
pictures placed at different distances, which cannot be done by any
combination of whole lenses and prisms.
In the autumn of 1854, after all the facts about the stereoscope
were before the public, and Mr. Wheatstone in full possession of all
the merit of having anticipated Mr. Elliot in the publication of his
stereoscopic apparatus, and of his explanation of the theory of
stereoscopic relief, such as it was, he thought it proper to revive the
controversy by transmitting to the Abbé Moigno, for publication in
Cosmos, an extract of a letter of mine dated 27th September 1838.
This extract was published in the Cosmos of the 15th August 1854,
[27] with the following illogical commentary by the editor.

“Nous avons eu tort mille fois d’accorder à notre


illustre ami, Sir David Brewster, l’invention du
stéréoscope par réfraction. M. Wheatstone, en effet, a
mis entre nos mains une lettre datée, le croirait on, du
27 Septembre 1838, dans lequel nous avons lû ces
mots écrits par l’illustre savant Ecossais: ‘I have also
stated that you promised to order for me your
stereoscope, both with reflectors and prisms. J’ai
aussi dit (à Lord Rosse[28]) que vous aviez promis de
commander pour moi votre stéréoscope, celui avec
réflecteurs et celui avec prismes.’ Le stéréoscope par
réfraction est donc, aussi bien que le stéréoscope par
réflexion, le stéréoscope de M. Wheatstone, qui l’avait
inventé en 1838, et le faisait construire à cette époque
pour Sir David Brewster lui-même. Ce que Sir David
Brewster a imaginée, et c’est une idée très ingénieuse,
dont M. Wheatstone ne lui disputât jamais la gloire,
c’est de former les deux prismes du stéréoscope par
réfraction avec les deux moitiés d’une même lentille.”
That the reader may form a correct idea of the conduct of Mr.
Wheatstone in making this claim indirectly, and in a foreign journal,
whose editor he has willingly misled, I must remind him that I first
saw the reflecting stereoscope at the meeting of the British
Association at Newcastle, in the middle of August 1838. It is proved
by my letter that he and I then conversed on the subject of prisms,
which at that time he had never thought of. I suggested prisms for
displacing the pictures, and Mr. Wheatstone’s natural reply was, that
they must be achromatic prisms. This fact, if denied, may be proved
by various circumstances. His paper of 1838 contains no reference
to prisms. If he had suggested the use of prisms in August 1838, he
would have inserted his suggestion in that paper, which was then
unpublished; and if he had only once tried a prism stereoscope, he
never would have used another. On my return to Scotland, I ordered
from Mr. Andrew Ross one of the reflecting stereoscopes, and one
made with achromatic prisms; but my words do not imply that Mr.
Wheatstone was the first person who suggested prisms, and still less
that he ever made or used a stereoscope with prisms. But however
this may be, it is a most extraordinary statement, which he allows the
Abbé Moigno to make, and which, though made a year and a half
ago, he has not enabled the Abbé to correct, that a stereoscope with
prisms was made for me (or for any other person) by Mr. Ross. I
never saw such an instrument, or heard of its being constructed: I
supposed that after our conversation Mr. Wheatstone might have
tried achromatic prisms, and in 1848, when I described my single
prism stereoscope, I stated what I now find is not correct, that I
believed Mr. Wheatstone had used two achromatic prisms. The
following letter from Mr. Andrew Ross will prove the main fact that he
never constructed for me, or for Mr. Wheatstone, any refracting
stereoscope:—
”2, Featherstone Buildings,
28th September 1854.
“Dear Sir,—In reply to yours of the 11th instant, I
beg to state that I never supplied you with a
stereoscope in which prisms were employed in place
of plane mirrors. I have a perfect recollection of being
called upon either by yourself or Professor
Wheatstone, some fourteen years since, to make
achromatized prisms for the above instrument. I also
recollect that I did not proceed to manufacture them in
consequence of the great bulk of an achromatized
prism, with reference to their power of deviating a ray
of light, and at that period glass sufficiently free from
striæ could not readily be obtained, and was
consequently very high-priced.—I remain, &c. &c.

“Andrew Ross.
“To Sir David Brewster.”

Upon the receipt of this letter I transmitted a copy of it to the


Abbé Moigno, to shew him how he had been misled into the
statement, “that Mr. Wheatstone had caused a stereoscope with
prisms to be constructed for me;” but neither he nor Mr. Wheatstone
have felt it their duty to withdraw that erroneous statement.
In reference to the comments of the Abbé Moigno, it is necessary
to state, that when he wrote them he had in his possession my
printed description of the single-prism, and other stereoscopes,[29] in
which I mention my belief, now proved to be erroneous, that Mr.
Wheatstone had used achromatic prisms, so that he had, on my
express authority, the information which surprised him in my letter.
The Abbé also must bear the responsibility of a glaring
misinterpretation of my letter of 1838. In that letter I say that Mr.
Wheatstone promised to order certain things from Mr. Ross, and the
Abbé declares, contrary to the express terms of the letter, as well as
to fact, that these things were actually constructed for me. The letter,
on the contrary, does not even state that Mr. Wheatstone complied
with my request, and it does not even appear from it that the
reflecting stereoscope was made for me by Mr. Ross.
Such is a brief history of the lenticular stereoscope, of its
introduction into Paris and London, and of its application to
portraiture and sculpture. It is now in general use over the whole
world, and it has been estimated that upwards of half a million of
these instruments have been sold. A Stereoscope Company has
been established in London[30] for the manufacture and sale of the
lenticular stereoscope, and for the production of binocular pictures
for educational and other purposes. Photographers are now
employed in every part of the globe in taking binocular pictures for
the instrument,—among the ruins of Pompeii and Herculaneum—on
the glaciers and in the valleys of Switzerland—among the public
monuments in the Old and the New World—amid the shipping of our
commercial harbours—in the museums of ancient and modern life—
in the sacred precincts of the domestic circle—and among those
scenes of the picturesque and the sublime which are so
affectionately associated with the recollection of our early days, and
amid which, even at the close of life, we renew, with loftier
sentiments and nobler aspirations, the youth of our being, which, in
the worlds of the future, is to be the commencement of a longer and
a happier existence.
CHAPTER II.
ON MONOCULAR VISION, OR
VISION WITH ONE EYE.

In order to understand the theory and construction of the


stereoscope we must be acquainted with the general structure of the
eye, with the mode in which the images of visible objects are formed
within it, and with the laws of vision by means of which we see those
objects in the position which they occupy, that is, in the direction and
at the distance at which they exist.
Every visible object radiates, or throws out in all directions,
particles or rays of light, by means of which we see them either
directly by the images formed in the eye, or indirectly by looking at
images of them formed by their passing through a small hole, or
through a lens placed in a dark room or camera, at the end of which
is a piece of paper or ground-glass to receive the image.
In order to understand this let h be a very small pin-hole in a
shutter or camera, mn, and let ryb be any object of different colours,
the upper part, r, being red, the middle, y, yellow, and the lower
part, b, blue. If a sheet of white paper, br, is placed behind the hole
h, at the same distance as the object rb is before it, an image, br,
will be formed of the same ray and the same colours as the object
rb. As the particles or rays of light move in straight lines, a red ray
from the middle part of r will pass through the hole h and illuminate
the point r with red light. In like manner, rays from the middle points
of y and b will pass through h and illuminate with yellow and blue
light the points y and b. Every other point of the coloured spaces, r,
y, and b, will, in the same manner, paint itself, as it were, on the
paper, and produce a coloured image, byr, exactly the same in form
and colour as the object ryb. If the hole h is sufficiently small no ray
from any one point of the object will interfere with or mix with any
other ray that falls upon the paper. If the paper is held at half the
distance, at b′y′ for example, a coloured image, b′y′r′, of half the size,
will be formed, and if we hold it at twice the distance, at b″r″ for
example, a coloured image, b″y″r″, of twice the size, will be painted
on the paper.

Fig. 4.
As the hole h is supposed to be so small as to receive only one
ray from every point of the object, the images of the object, viz., br,
b′r′, b″r″, will be very faint. By widening the hole h, so as to admit
more rays from each luminous point of rb, the images would
become brighter, but they would become at the same time indistinct,
as the rays from one point of the object would mix with those from
adjacent points, and at the boundaries of the colours r, y, and b, the
one colour would obliterate the other. In order, therefore, to obtain
sufficiently bright images of visible objects we must use lenses,
which have the property of forming distinct images behind them, at a
point called their focus. If we widen the hole h, and place in it a lens
whose focus is at y, for an object at the same distance, hy, it will
form a bright and distinct image, br, of the same size as the object
rb. If we remove the lens, and place another in h, whose focus is at
y′, for a distance hy, an image, b′r′, half of the size of rb, will be
formed at that point; and if we substitute for this lens another, whose
focus is at y″, a distinct image, b″r″, twice the size of the object, will
be formed, the size of the image being always to that of the object as
their respective distances from the hole or lens at h.
With the aid of these results, which any person may confirm by
making the experiments, we shall easily understand how we see
external objects by means of the images formed in the eye. The
human eye, a section and a front view of which is shewn in Fig. 5, a,
is almost a sphere. Its outer membrane, abcde, or mno, Fig. 5, b,
consists of a tough substance, and is called the sclerotic coat, which
forms the white of the eye, a, seen in the front view. The front part of
the eyeball, cxd, which resembles a small watch-glass, is perfectly
transparent, and is called the cornea. Behind it is the iris, cabe, or c
in the front view, which is a circular disc, with a hole, ab, in its centre,
called the pupil, or black of the eye. It is, as it were, the window of
the eye, through which all the light from visible objects must pass.
The iris has different colours in different persons, black, blue, or
grey; and the pupil, ab, or h, has the power of contracting or
enlarging its size according as the light which enters it is more or
less bright. In sunlight it is very small, and in twilight its size is
considerable. Behind the iris, and close to it, is a doubly convex lens,
df, or ll in Fig. 5, b, called the crystalline lens. It is more convex or
round on the inner side, and it is suspended by the ciliary processes
at lc, lc′, by which it is supposed to be moved towards and from h,
in order to accommodate the eye to different distances, or obtain
distinct vision at these distances. At the back of the eye is a thin
pulpy transparent membrane, rr o rr, or vvv, called the retina, which,
like the ground-glass of a camera obscura, receives the images of
visible objects. This membrane is an expansion of the optic nerve o,
or a in Fig. 5, a, which passes to the brain, and, by a process of
which we are ignorant, gives us vision of the objects whose images
are formed on its expanded surface. The globular form of the eye is
maintained by two fluids which fill it,—the aqueous humour, which
lies between the crystalline lens and the cornea, and the vitreous
humour, zz, which fills the back of the eye.
Fig. 5, A.

Fig. 5, B.
But though we are ignorant of the manner in which the mind
takes cognizance through the brain of the images on the retina, and
may probably never know it, we can determine experimentally the
laws by which we obtain, through their images on the retina, a
knowledge of the direction, the position, and the form of external
objects.
If the eye mn consisted only of a hollow ball with a small aperture
h, an inverted image, ab, of any external object ab would be formed
on the retina ror, exactly as in Fig. 4. A ray of light from a passing
through h would strike the retina at a, and one from b would strike
the retina at b. If the hole h is very small the inverted image ab would
be very distinct, but very obscure. If the hole were the size of the
pupil the image would be sufficiently luminous, but very indistinct. To
remedy this the crystalline lens is placed behind the pupil, and gives
distinctness to the image ab formed in its focus. The image,
however, still remains inverted, a ray from the upper part a of the
object necessarily falling on the lower part a of the retina, and a ray
from the lower part b of the object upon the upper part b of the
retina. Now, it has been proved by accurate experiments that in
whatever direction a ray aha falls upon the retina, it gives us the
vision of the point a from which it proceeds, or causes us to see that
point, in a direction perpendicular to the retina at a, the point on
which it falls. It has also been proved that the human eye is nearly
spherical, and that a line drawn perpendicular to the retina from any
point a of the image ab will very nearly pass through the
corresponding point a of the object ab,[31] so that the point a is, in
virtue of this law, which is called the Law of visible direction, seen in
nearly its true direction.
When we look at any object, ab, for example, we see only one
point of it distinctly. In Fig. 5 the point d only is seen distinctly, and
every point from d to a, and from d to b, less distinctly. The point of
distinct vision on the retina is at d, corresponding with the point d of
the object which is seen distinctly. This point d is the centre of the
retina at the extremity of the line aha, called the optical axis of the
eye, passing through the centre of the lens lh, and the centre of the
pupil. The point of distinct vision d corresponds with a small hole in
the retina called the Foramen centrale, or central hole, from its being
in the centre of the membrane. When we wish to see the points a
and b, or any other point of the object, we turn the eye upon them,
so that their image may fall upon the central point d. This is done so
easily and quickly that every point of an object is seen distinctly in an
instant, and we obtain the most perfect knowledge of its form, colour,
and direction. The law of distinct vision may be thus expressed.
Vision is most distinct when it is performed by the central point of the
retina, and the distinctness decreases with the distance from the
central point. It is a curious fact, however, that the most distinct point
d is the least sensitive to light, and that the sensitiveness increases
with the distance from that point. This is proved by the remarkable
fact, that when an astronomer cannot see a very minute star by
looking at it directly along the optical axis dd, he can see it by
looking away from it, and bringing its image upon a more sensitive
part of the retina.
But though we see with one eye the direction in which any object
or point of an object is situated, we do not see its position, or the
distance from the eye at which it is placed. If a small luminous point
or flame is put into a dark room by another person, we cannot with
one eye form anything like a correct estimate of its distance. Even in
good light we cannot with one eye snuff a candle, or pour wine into a
small glass at arm’s length. In monocular vision, we learn from
experience to estimate all distances, but particularly great ones, by
various means, which are called the criteria of distance; but it is only
with both eyes that we can estimate with anything like accuracy the
distance of objects not far from us.
The criteria of distance, by which we are enabled with one eye to
form an approximate estimate of the distance of objects are five in
number.
1. The interposition of numerous objects between the eye and the
object whose distance we are appreciating. A distance at sea
appears much shorter than the same distance on land, marked with
houses, trees, and other objects; and for the same reason, the sun
and moon appear more distant when rising or setting on the horizon
of a flat country, than when in the zenith, or at great altitudes.
2. The variation in the apparent magnitude of known objects,
such as man, animals, trees, doors and windows of houses. If one of
two men, placed at different distances from us, appears only half the
size of the other, we cannot be far wrong in believing that the
smallest in appearance is at twice the distance of the other. It is
possible that the one may be a dwarf, and the other of gigantic
stature, in which case our judgment would be erroneous, but even in
this case other criteria might enable us to correct it.
3. The degree of vivacity in the colours and tints of objects.
4. The degree of distinctness in the outline and minute parts of
objects.
5. To these criteria we may add the sensation of muscular action,
or rather effort, by which we close the pupil in accommodating the
eye to near distances, and produce the accommodation.
With all these means of estimating distances, it is only by
binocular vision, in which we converge the optical axes upon the
object, that we have the power of seeing distance within a limited
range.
But this is the only point in which Monocular is inferior to
Binocular vision. In the following respects it is superior to it.
1. When we look at oil paintings, the varnish on their surface
reflects to each eye the light which falls upon it from certain parts of
the room. By closing one eye we shut out the quantity of reflected
light which enters it. Pictures should always be viewed by the eye
farthest from windows or lights in the apartment, as light diminishes
the sensibility of the eye to the red rays.
2. When we view a picture with both eyes, we discover, from the
convergency of the optic axes, that the picture is on a plane surface,
every part of which is nearly equidistant from us. But when we shut
one eye, we do not make this discovery; and therefore the effect with
which the artist gives relief to the painting exercises its whole effect
in deceiving us, and hence, in monocular vision, the relievo of the
painting is much more complete.
This influence over our judgment is beautifully shewn in viewing,
with one eye, photographs either of persons, or landscapes, or solid
objects. After a little practice, the illusion is very perfect, and is aided
by the correct geometrical perspective and chiaroscuro of the
Daguerreotype or Talbotype. To this effect we may give the name of
Monocular Relief, which, as we shall see, is necessarily inferior to
Binocular Relief, when produced by the stereoscope.
3. As it very frequently happens that one eye has not exactly the
same focal length as the other, and that, when it has, the vision by
one eye is less perfect than that by the other, the picture formed by
uniting a perfect with a less perfect picture, or with one of a different
size, must be more imperfect than the single picture formed by one
eye.
CHAPTER III.
ON BINOCULAR VISION, OR
VISION WITH TWO EYES.

We have already seen, in the history of the stereoscope, that in


the binocular vision of objects, each eye sees a different picture of
the same object. In order to prove this, we require only to look
attentively at our own hand held up before us, and observe how
some parts of it disappear upon closing each eye. This experiment
proves, at the same time, in opposition to the opinion of Baptista
Porta, Tacquet, and others, that we always see two pictures of the
same object combined in one. In confirmation of this fact, we have
only to push aside one eye, and observe the image which belongs to
it separate from the other, and again unite with it when the pressure
is removed.
It might have been supposed that an object seen by both eyes
would be seen twice as brightly as with one, on the same principle
as the light of two candles combined is twice as bright as the light of
one. That this is not the case has been long known, and Dr. Jurin
has proved by experiments, which we have carefully repeated and
found correct, that the brightness of objects seen with two eyes is
only ¹/₁₃th part greater than when they are seen with one eye.[32]
The cause of this is well known. When both eyes are used, the
pupils of each contract so as to admit the proper quantity of light; but
the moment we shut the right eye, the pupil of the left dilates to
nearly twice its size, to compensate for the loss of light arising from
the shutting of the other.[33]
Fig. 6.
This beautiful provision to supply the proper quantity of light when
we can use only one eye, answers a still more important purpose,
which has escaped the notice of optical writers. In binocular vision,
as we have just seen, certain parts of objects are seen with both
eyes, and certain parts only with one; so that, if the parts seen with
both eyes were twice as bright, or even much brighter than the parts
seen with one, the object would appear spotted, from the different
brightness of its parts. In Fig. 6, for example, (see p. 14,) the areas
bfi and cgi, the former of which is seen only by the left eye, d, and
the latter only by the right eye, e, and the corresponding areas on
the other side of the sphere, would be only half as bright as the
portion figh, seen with both eyes, and the sphere would have a
singular appearance.
It has long been, and still is, a vexed question among
philosophers, how we see objects single with two eyes. Baptista
Porta, Tacquet, and others, got over the difficulty by denying the fact,
and maintaining that we use only one eye, while other philosophers
of distinguished eminence have adopted explanations still more
groundless. The law of visible direction supplies us with the true
explanation.
Fig. 7.
Let us first suppose that we look with both eyes, r and l, Fig. 7,
upon a luminous point, d, which we see single, though there is a
picture of it on the retina of each eye. In looking at the point d we
turn or converge the optical axes dhd, d′h′ d, of each eye to the point
d, an image of which is formed at d in the right eye r, and at d′ in the
left eye l. In virtue of the law of visible direction the point d is seen in
the direction dd with the eye r, and in the direction d′d with the eye
l, these lines being perpendicular to the retina at the points d, d′.
The one image of the point d is therefore seen lying upon the other,
and consequently seen single. Considering d, then, as a single point
of a visible object ab, the two eyes will see the points a and b single
by the same process of turning or converging upon them their optical
axes, and so quickly does the point of convergence pass backward
and forward over the whole object, that it appears single, though in
reality only one point of it can be seen single at the same instant.
The whole picture of the line ab, as seen with one eye, seems to
coincide with the whole picture of it as seen with the other, and to
appear single. The same is true of a surface or area, and also of a
solid body or a landscape. Only one point of each is seen single; but
we do not observe that other points are double or indistinct, because
the images of them are upon parts of the retina which do not give
distinct vision, owing to their distance from the foramen or point
which gives distinct vision. Hence we see the reason why distinct
vision is obtained only on one point of the retina. Were it otherwise
we should see every other point double when we look fixedly upon
one part of an object. If in place of two eyes we had a hundred,
capable of converging their optical axes to one point, we should, in
virtue of the law of visible direction, see only one object.
The most important advantage which we derive from the use of
two eyes is to enable us to see distance, or a third dimension in
space. That we have this power has been denied by Dr. Berkeley,
and many distinguished philosophers, who maintain that our
perception of distance is acquired by experience, by means of the
criteria already mentioned. This is undoubtedly true for great
distances, but we shall presently see, from the effects of the
stereoscope, that the successive convergency of the optic axes upon
two points of an object at different distances, exhibits to us the
difference of distance when we have no other possible means of
perceiving it. If, for example, we suppose g, d, Fig. 7, to be separate
points, or parts of an object, whose distances are go, do, then if we
converge the optical axes hg, h′ g upon g, and next turn them upon
d, the points will appear to be situated at g and d at the distance gd
from each other, and at the distances og, od from the observer,
although there is nothing whatever in the appearance of the points,
or in the lights and shades of the object, to indicate distance. That
this vision of distance is not the result of experience is obvious from
the fact that distance is seen as perfectly by children as by adults;
and it has been proved by naturalists that animals newly born
appreciate distances with the greatest correctness. We shall
afterwards see that so infallible is our vision of near distances, that a
body whose real distance we can ascertain by placing both our
hands upon it, will appear at the greater or less distance at which it is
placed by the convergency of the optical axes.
We are now prepared to understand generally, how, in binocular
vision, we see the difference between a picture and a statue, and
between a real landscape and its representation. When we look at a
picture of which every part is nearly at the same distance from the
eyes, the point of convergence of the optical axes is nearly at the
same distance from the eyes; but when we look at its original,
whether it be a living man, a statue, or a landscape, the optical axes
are converged in rapid succession upon the nose, the eyes, and the
ears, or upon objects in the foreground, the middle and the remote
distances in the landscape, and the relative distances of all these
points from the eye are instantly perceived. The binocular relief thus
seen is greatly superior to the monocular relief already described.
Since objects are seen in relief by the apparent union of two
dissimilar plane pictures of them formed in each eye, it was a
supposition hardly to be overlooked, that if we could delineate two
plane pictures of a solid object, as seen dissimilarly with each eye,
and unite their images by the convergency of the optical axes, we
should see the solid of which they were the representation. The
experiment was accordingly made by more than one person, and
was found to succeed; but as few have the power, or rather the art,
of thus converging their optical axes, it became necessary to
contrive an instrument for doing this.
The first contrivances for this purpose were, as we have already
stated, made by Mr. Elliot and Mr. Wheatstone. A description of
these, and of others better fitted for the purpose, will be found in the
following chapter.
CHAPTER IV.
DESCRIPTION OF THE OCULAR, THE
REFLECTING,
AND THE LENTICULAR STEREOSCOPES.

Although it is by the combination of two plane pictures of an


object, as seen by each eye, that we see the object in relief, yet the
relief is not obtained from the mere combination or superposition of
the two dissimilar pictures. The superposition is effected by turning
each eye upon the object, but the relief is given by the play of the
optic axes in uniting, in rapid succession, similar points of the two
pictures, and placing them, for the moment, at the distance from the
observer of the point to which the axes converge. If the eyes were to
unite the two images into one, and to retain their power of distinct
vision, while they lost the power of changing the position of their
optic axes, no relief would be produced.
This is equally true when we unite two dissimilar photographic
pictures by fixing the optic axes on a point nearer to or farther from
the eye. Though the pictures apparently coalesce, yet the relief is
given by the subsequent play of the optic axes varying their angles,
and converging themselves successively upon, and uniting, the
similar points in each picture that correspond to different distances
from the observer.
As very few persons have the power of thus uniting, by the eyes
alone, the two dissimilar pictures of the object, the stereoscope has
been contrived to enable them to combine the two pictures, but it is
not the stereoscope, as has been imagined, that gives the relief. The
instrument is merely a substitute for the muscular power which
brings the two pictures together. The relief is produced, as formerly,
solely by the subsequent play of the optic axes. If the relief were the
effect of the apparent union of the pictures, we should see it by
looking with one eye at the combined binocular pictures—an
experiment which could be made by optical means; but we should
look for it in vain. The combined pictures would be as flat as the
combination of two similar pictures. These experiments require to be
made with a thorough knowledge of the subject, for when the eyes
are converged on one point of the combined picture, this point has
the relief, or distance from the eye, corresponding to the angle of the
optic axes, and therefore the adjacent points are, as it were, brought
into a sort of indistinct relief along with it; but the optical reader will
see at once that the true binocular relief cannot be given to any other
parts of the picture, till the axes of the eyes are converged upon
them. These views will be more readily comprehended when we
have explained, in a subsequent chapter, the theory of stereoscopic
vision.

The Ocular Stereoscope.


We have already stated that objects are seen in perfect relief
when we unite two dissimilar photographic pictures of them, either by
converging the optic axes upon a point so far in front of the pictures
or so far beyond them, that two of the four images are combined. In
both these cases each picture is seen double, and when the two
innermost of the four, thus produced, unite, the original object is
seen in relief. The simplest of these methods is to converge the
optical axes to a point nearer to us than the pictures, and this may
be best done by holding up a finger between the eyes and the
pictures, and placing it at such a distance that, when we see it
single, the two innermost of the four pictures are united. If the finger
is held up near the dissimilar pictures, they will be slightly doubled,
the two images of each overlapping one other; but by bringing the
finger nearer the eye, and seeing it singly and distinctly, the
overlapping images will separate more and more till they unite. We
have, therefore, made our eyes a stereoscope, and we may, with
great propriety, call it the Ocular Stereoscope. If we wish to magnify
the picture in relief, we have only to use convex spectacles, which
will produce the requisite magnifying power; or what is still better, to
magnify the united pictures with a powerful reading-glass. The two
single images are hid by advancing the reading-glass, and the other
two pictures are kept united with a less strain upon the eyes.
As very few people can use their eyes in this manner, some
instrumental auxiliary became necessary, and it appears to us
strange that the simplest method of doing this did not occur to Mr.
Elliot and Mr. Wheatstone, who first thought of giving us the help of
an instrument. By enabling the left eye to place an image of the left-
hand picture upon the right-hand picture, as seen by the naked eye,
we should have obtained a simple instrument, which might be called
the Monocular Stereoscope, and which we shall have occasion to
describe. The same contrivance applied also to the right eye, would
make the instrument Binocular. Another simple contrivance for
assisting the eyes would have been to furnish them with a minute
opera-glass, or a small astronomical telescope about an inch long,
which, when held in the hand or placed in a pyramidal box, would
unite the dissimilar pictures with the greatest facility and perfection.
This form of the stereoscope will be afterwards described under the
name of the Opera-Glass Stereoscope.

Fig. 8.
Description of the Ocular Stereoscope.
A stereoscope upon the principle already described, in which the
eyes alone are the agent, was contrived, in 1834, by Mr. Elliot, as we
have already had occasion to state. He placed the binocular
pictures, described in Chapter I., at one end of a box, and without
the aid either of lenses or mirrors, he obtained a landscape in perfect
relief. I have examined this stereoscope, and have given, in Fig. 8,
an accurate though reduced drawing of the binocular pictures
executed and used by Mr. Elliot. I have also united the two original
pictures by the convergency of the optic axes beyond them, and
have thus seen the landscape in true relief. To delineate these
binocular pictures upon stereoscopic principles was a bold
undertaking, and establishes, beyond all controversy, Mr. Elliot’s
claim to the invention of the ocular stereoscope.
If we unite the two pictures in Fig. 8, by converging the optic axes
to a point nearer the eye than the pictures, we shall see distinctly the
stereoscopic relief, the moon being in the remote distance, the cross
in the middle distance, and the stump of a tree in the foreground.
If we place the two pictures as in Fig. 9, which is the position they
had in Mr. Elliot’s box, and unite them, by looking at a point beyond
them we shall also observe the stereoscopic relief. In this position
Mr. Elliot saw the relief without any effort, and even without being
conscious that he was not viewing the pictures under ordinary vision.
This tendency of the optic axes to a distant convergency is so rare
that I have met with it only in one person.
Fig. 9.
As the relief produced by the union of such imperfect pictures
was sufficient only to shew the correctness of the principle, the
friends to whom Mr. Elliot shewed the instrument thought it of little
interest, and he therefore neither prosecuted the subject, nor
published any account of his contrivance.
Mr. Wheatstone suggested a similar contrivance, without either
mirrors or lenses. In order to unite the pictures by converging the
optic axes to a point between them and the eye, he proposed to
place them in a box to hide the lateral image and assist in making
them unite with the naked eyes. In order to produce the union by
looking at a point beyond the picture, he suggested the use of “a pair
of tubes capable of being inclined to each other at various angles,”
the pictures being placed on a stand in front of the tubes. These
contrivances, however, though auxiliary to the use of the naked
eyes, were superseded by the Reflecting Stereoscope, which we
shall now describe.

Description of the Reflecting Stereoscope.


This form of the stereoscope, which we owe to Mr. Wheatstone,
is shewn in Fig. 10, and is described by him in the following terms:
—“aa′ are two plane mirrors, (whether of glass or metal is not
stated,) about four inches square, inserted in frames, and so
adjusted that their backs form an angle of 90° with each other; these
mirrors are fixed by their common edge against an upright b, or,
which was less easy to represent in the drawing against the middle
of a vertical board, cut away in such a manner as to allow the eyes
to be placed before the two mirrors. c, c′ are two sliding boards, to
which are attached the upright boards d, d′, which may thus be
removed to different distances from the mirrors. In most of the
experiments hereafter to be detailed it is necessary that each upright
board shall be at the same distance from the mirror which is opposite
to it. To facilitate this double adjustment, I employ a right and a left-
handed wooden screw, r, l; the two ends of this compound screw
pass through the nuts e, e′, which are fixed to the lower parts of the
upright boards d, d, so that by turning the screw pin p one way the
two boards will approach, and by turning them the other they will
recede from each other, one always preserving the same distance as
the other from the middle line f; e, e′ are pannels to which the
pictures are fixed in such manner that their corresponding horizontal
lines shall be on the same level; these pannels are capable of sliding
backwards or forwards in grooves on the upright boards d, d′. The
apparatus having been described, it now remains to explain the
manner of using it. The observer must place his eyes as near as
possible to the mirrors, the right eye before the right-hand mirror, and
the left eye before the left-hand mirror, and he must move the sliding
pannels e, e′ to or from him till the two reflected images coincide at
the intersection of the optic axes, and form an image of the same
apparent magnitude as each of the component pictures. The picture
will, indeed, coincide when the sliding pannels are in a variety of
different positions, and, consequently, when viewed under different
inclinations of the optic axes, but there is only one position in which
the binocular image will be immediately seen single, of its proper
magnitude, and without fatigue to the eyes, because in this position
only the ordinary relations between the magnitude of the pictures on
the retina, the inclination of the optic axes, and the adaptation of the
eye to distinct vision at different distances, are preserved. In all the
experiments detailed in the present memoir I shall suppose these
relations to remain undisturbed, and the optic axes to converge
about six or eight inches before the eyes.

Fig. 10.
“If the pictures are all drawn to be seen with the same inclination
of the optic axes the apparatus may be simplified by omitting the
screw rl, and fixing the upright boards d, d′ at the proper distance.
The sliding pannels may also be dispensed with, and the drawings
themselves be made to slide in the grooves.”
The figures to which Mr. Wheatstone applied this instrument were
pairs of outline representations of objects of three dimensions, such
as a cube, a cone, the frustum of a square pyramid, which is shewn
on one side of e, e′ in Fig. 10, and in other figures; and he employed
them, as he observes, “for the purpose of illustration, for had either
shading or colouring been introduced it might be supposed that the
effect was wholly or in part due to these circumstances, whereas, by
leaving them out of consideration, no room is left to doubt that the
entire effect of relief is owing to the simultaneous perception of the
two monocular projections, one on each retina.”
“Careful attention,” he adds, “would enable an artist to draw and
paint the two component pictures, so as to present to the mind of the
observer, in the resultant perception, perfect identity with the object
represented. Flowers, crystals, busts, vases, instruments of various
kinds, &c., might thus be represented, so as not to be distinguished
by sight from the real objects themselves.”
This expectation has never been realized, for it is obviously
beyond the reach of the highest art to draw two copies of a flower or
a bust with such accuracy of outline or colour as to produce “perfect
identity,” or anything approaching to it, “with the object represented.”
Photography alone can furnish us with such representations of
natural and artificial objects; and it is singular that neither Mr. Elliot
nor Mr. Wheatstone should have availed themselves of the well-
known photographic process of Mr. Wedgewood and Sir Humphry
Davy, which, as Mr. Wedgewood remarks, wanted only “a method of
preventing the unshaded parts of the delineation from being coloured
by exposure to the day, to render the process as useful as it is
elegant.” When the two dissimilar photographs were taken they
could have been used in the stereoscope in candle-light, or in faint
daylight, till they disappeared, or permanent outlines of them might
have been taken and coloured after nature.
Mr. Fox Talbot’s beautiful process of producing permanent
photographs was communicated to the Royal Society in January
1839, but no attempt was made till some years later to make it
available for the stereoscope.
In a chapter on binocular pictures, and the method of executing
them in order to reproduce, with perfect accuracy, the objects which
they represent, we shall recur to this branch of the subject.
Upon obtaining one of these reflecting stereoscopes as made by
the celebrated optician, Mr. Andrew Ross, I found it to be very ill
adapted for the purpose of uniting dissimilar pictures, and to be
imperfect in various respects. Its imperfections may be thus
enumerated:—
1. It is a clumsy and unmanageable apparatus, rather than an
instrument for general use. The one constructed for me was 16½
inches long, 6 inches broad, and 8½ inches high.
2. The loss of light occasioned by reflection from the mirrors is
very great. In all optical instruments where images are to be formed,
and light is valuable, mirrors and specula have been discontinued.
Reflecting microscopes have ceased to be used, but large
telescopes, such as those of Sir W. and Sir John Herschel, Lord
Rosse, and Mr. Lassel, were necessarily made on the reflecting
principle, from the impossibility of obtaining plates of glass of
sufficient size.
3. In using glass mirrors, of which the reflecting stereoscope is
always made, we not only lose much more than half the light by the
reflections from the glass and the metallic surface, and the absorbing
power of the glass, but the images produced by reflection are made
indistinct by the oblique incidence of the rays, which separates the
image produced by the glass surface from the more brilliant image
produced by the metallic surface.
4. In all reflections, as Sir Isaac Newton states, the errors are
greater than in refraction. With glass mirrors in the stereoscope, we
have four refractions in each mirror, and the light transmitted through
twice the thickness of the glass, which lead to two sources of error.
5. Owing to the exposure of the eye and every part of the
apparatus to light, the eye itself is unfitted for distinct vision, and the
binocular pictures become indistinct, especially if they are
Daguerreotypes,[34] by reflecting the light incident from every part of
the room upon their glass or metallic surface.
6. The reflecting stereoscope is inapplicable to the beautiful
binocular slides which are now being taken for the lenticular
stereoscope in every part of the world, and even if we cut in two
those on paper and silver-plate, they would give, in the reflecting
instrument, converse pictures, the right-hand part of the picture
being placed on the left-hand side, and vice versa.
7. With transparent binocular slides cut in two, we could obtain
pictures by reflection that are not converse; but in using them, we
would require to have two lights, one opposite each of the pictures,
which can seldom be obtained in daylight, and which it is
inconvenient to have at night.
Owing to these and other causes, the reflecting stereoscope
never came into use, even after photography was capable of
supplying binocular pictures.
As a set-off against these disadvantages, it has been averred
that in the reflecting stereoscope we can use larger pictures, but this,
as we shall shew in a future chapter, is altogether an erroneous
assertion.

Description of the Lenticular Stereoscope.


Having found that the reflecting stereoscope, when intended to
produce accurate results, possessed the defects which I have
described, and was ill fitted for general use, both from its size and its
price, it occurred to me that the union of the dissimilar pictures could
be better effected by means of lenses, and that a considerable
magnifying power would be thus obtained, without any addition to
the instrument.
Fig. 11.
If we suppose a, b, Fig. 11, to be two portraits,—a a portrait of a
gentleman, as seen by the left eye of a person viewing him at the
proper distance and in the best position, and b his portrait as seen
by the right eye, the purpose of the stereoscope is to place these two
pictures, or rather their images, one above the other. The method of
doing this by lenses may be explained, to persons not acquainted
with optics, in the following manner:—
If we look at a with one eye through the centre of a convex glass,
with which we can see it distinctly at the distance of 6 inches, which
is called its focal distance, it will be seen in its place at a. If we now
move the lens from right to left, the image of a will move towards b;
and when it is seen through the right-hand edge of the lens, the
image of a will have reached the position c, half-way between a and
b. If we repeat this experiment with the portrait b, and move the lens
from left to right, the image of b will move towards a; and when it is
seen through the left-hand edge of the lens, the image of b will have
reached the position c. Now, it is obviously by the right-hand half of
the lens that we have transferred the image of a to c, and by the left-

You might also like