Automatic Text Location in Images and Video Frames
Automatic Text Location in Images and Video Frames
Automatic Text Location in Images and Video Frames
2055—2076, 1998
( 1998 Pattern Recognition Society. Published by Elsevier Science Ltd
All rights reserved. Printed in Great Britain
0031-3203/98 $19.00#0.00
PII: S0031-3203(98)00067-3
Abstract—Textual data is very important in a number of applications such as image database indexing and
document understanding. The goal of automatic text location without character recognition capabilities is
to extract image regions that contain only text. These regions can then be either fed to an optical character
recognition module or highlighted for a user. Text location is a very difficult problem because the characters
in text can vary in font, size, spacing, alignment, orientation, color and texture. Further, characters are often
embedded in a complex background in the image. We propose a new text location algorithm that is suitable
in a number of applications, including conversion of newspaper advertisements from paper documents to
their electronic versions, World Wide Web search, color image indexing and video indexing. In many of
these applications, it is not necessary to extract all the text, so we emphasize on extracting important text
with large size and high contrast. Our algorithm is very fast and has been shown to be successful in
extracting important text in a large number of test images. ( 1998 Pattern Recognition Society. Published
by Elsevier Science Ltd. All rights reserved
is sensitive to character font size and style. Further, automatic text location in these problems is sum-
this method is generally time-consuming and cannot marized below.
always accurately give text’s location which may re-
duce the performance of OCR when applied to the
1.1. Conversion of newspaper advertisements
extracted characters. Figure 3(b) shows horizontal
spatial variance for the image in Fig. 3(a) proposed World Wide Web (WWW) is now recognized as an
by Zhong et al.(9) The text location results are shown excellent media for information exchange. As a result,
in Fig. 3(c), where there is some unpredictable offset. the number of applications which require converting
The second method of text location uses connected paper-based documents to hypertext is growing rap-
component analysis.(2,3,5,10,11) This method, idly. Most newspaper and advertisement agencies
which has a higher processing speed and localization would like to put a customer’s advertisements onto
accuracy, however, is applicable to only binary im- their web sites at the same time as they appear in the
ages. Most black and white documents can be re- newspaper. Figure 4(a) shows an example of a typical
garded as two-valued images. On the other hand, newspaper advertisement. Since the advertisements
color documents, video frames, and pictures of natu- sent to these agencies are not always in the form of
ral scenes are multivalued images. To handle various coded text, there is a need to automatically convert
types of documents, we localize text through multi- them to electronic versions which can be further used
valued image decomposition. In this paper we will in automatically generating Web pages. Although
introduce: (i) multivalued image decomposition, these images are mostly binary, both black and white
(ii) foreground image generation and selection, objects can be regarded as foreground due to text
(iii) color space reduction, and (iv) text location reversal. The text in advertisements varies in terms of
using statistical features. The proposed method has font, size, style and spacing. In addition to text, the
been applied to the problem of locating text in a num- advertisements also contain some graphics, logos and
ber of different domains, including classified adver- symbolized rulers. We use a relatively high scan res-
tisements, embedded text in synthetic web images, olution (150 dpi) for these images because (i) they are
color images and video frames. The significance of all binary, so storage requirements are not severe and
Automatic text location in images and video frames 2057
Fig. 3. Text location by texture analysis: (a) original image; (b) horizontal spatial variance; (c) text
location (shown in rectangular blocks).
(ii) all the text in the advertisement, irrespective of their from a magazine cover. Automatically locating text in
font, size and style, must be located for this application. color images has many applications, including image
database search, automatic annotation and image
database organization. Some related work can be
1.2. Web search
found in vehicle license plate recognition.(15)
Since 1993, the number of web servers has been
doubling nearly every three months(12) and now ex-
1.4. Video indexing
ceeds 476,000.(13) Text is one of the most important
components in a web page which can be either coded The goal of video indexing is to retrieve a small
text or pixel text. Through the information superhigh- number of video frames based on user queries. A num-
way, users can access any authorized site to obtain ber of approaches have been proposed which retrieve
information of interest. This has created the problem video frames using texture,(15) shape(17) and color(18)
of automatically and efficiently finding useful pages information contained in the query. At the same time,
on the web. To obtain desired information from this word spotting(19) and speech recognition(20) tech-
humongous source, a coded text-based search engine niques have been used in searching for dialogue and
(e.g. Yahoo, Infoseek, Lycos, and AltaVista) is com- narration for video indexing. Both caption text and
monly used. For instance, AltaVista search engine non-caption text on objects contained in video can be
processes more than 29 million requests each day.(13) used in interactive indexing and automatic indexing,
Because of the massive increase in network band- which is the major objective of text location for video.
widths and disk capacities, more and more web pages Figure 4(d) shows a video frame which contains text.
now contain images for better visual appearance and Some related work has been done for image and video
rich information content. These images, especially the retrieval where the search cues use visual properties
pixel text embedded in them, provide search engines of specific objects and captions in video
with additional cues to accurately retrieve the desired databases.(9,21~23) Lienhart and Stuber(23) assume
information. Figure 4(c) shows one such example. that text is monochromatic and is generated by video
Therefore, a multimedia search engine which can use title machines.
the information from both coded text and pixel text,
image, video and audio is desired for the information
1.5. Summary
superhighway.
Most of web images are computer created and There are essentially two different classes of ap-
called synthetic images. Text in web page images plications involved in our work on automatic text
varies in font, color, size and style even in the same location: (i) document conversion and (ii) web
page. Furthermore, the color and texture of the text searching and image and video indexing. The first
and its background may also vary from one part of class of applications, which mostly involves binary
the page to the other. For these reasons, it is very images, requires that all the text in the input image be
difficult to locate text in Web images automatically located. This necessitates a higher image resolution.
without utilizing character recognition capabilities. On the other hand, it is evident that the most impor-
Only a few simple approaches have been published for tant requirements for the second class of applications
text location in Web images.(14) is (i) high speed of text location, and (ii) extraction of
only important text in the input image. Usually, the
larger the font size of text, the more important it is.
1.3. Color image databases
The text which is very small in size cannot be recog-
A color image can be captured by a scanner or nized easily by OCR engines anyway.(24) Since
a camera. Figure 4(b) shows a color image scanned the important text in images appear mainly in the
2058 A. K. JAIN and B. YU
Fig. 4. Examples of input images for automatic text location applications: (a) classified advertisement
in a newspaper; (b) color scanned image; (c) web image; (d) video frame.
horizontal direction, our method tries to extract only vidual foreground images go through the same pro-
horizontal text of relatively large size. Because some cessing steps, so the connected component analysis
non-text objects can be subsequently rejected by an and text identification modules can be implemented in
OCR module, we minimize the probability of missing parallel on a multiprocessor system to speed up the
text (false dismissal) at the cost of increasing the prob- algorithm. Finally, the outputs from all the channels
ability of detecting spurious regions (false alarm). Fig- are composed together to locate the text in the input
ure 5 gives an overview of the proposed system. The image. Text location is represented in terms of the
input can be a binary image, a synthetic web image, coordinates of its bounding box.
a color image or a video frame. After color reduction, In Section 2 we describe decomposition method for
including bit dropping and color clustering and multivalued images, including color space reduction.
multivalued image decomposition, the input image is Connected component analysis method is applied to
decomposed into multiple foreground images. Indi- foreground images, which is explained in Section 3.
Automatic text location in images and video frames 2059
Section 4 introduces textual features, text identifica- text is shown in Fig. 7(b). Therefore, an image I can
tion and text composition. Finally, we report the always be completely separated into a foreground
results of experiments in a number of applications and image IF and a background image IB , where
discuss the performance of the proposed system in I #I "I and I WI "0. Theoretically, a º-
F B F B
Section 5. valued image can generate up to (2U!2) different
foreground images. A foreground image is called a
real foreground image if it is produced such that
2. MULTIVALUED IMAGE DECOMPOSITION
Fig. 6. A multivalued image and its element images: (a) color image; (b) nine element images.
Fig. 7. Examples of text: (a) a real foreground text; (b) a background-complementary foreground text.
In our system, each element image can be selected as image has only two element images, the given image
a real foreground image if there is a sufficient number and its inverse, each being a real foreground image or
of object pixels in it. On the other hand, we generate at a background-complementary image with respect to
most one background-complementary image for each the other.
multivalued image such that the background image
IB is set as the element image with the largest number
2.2. Pseudo-color images
of object pixels or the union of this element image with
the element image with the second largest number of For web images, GIF and JPEG are the two most
object pixels if it is larger than a threshold. popular image formats because they both have high
compression rates and simple decoding methods. The
latter is commonly used for images or videos of
2.1. Binary images
natural scenes.(25) Most of the web images containing
The advertisement images of interest to us are bi- meaningful text are created synthetically and are
nary images (see Fig. 4(a)) for which º"2. A binary stored in GIF format. A GIF image is an 8-bit
Automatic text location in images and video frames 2061
Fig. 8. Foreground images of the multivalued image in Fig. 6(a): (a) a real foreground image; (b) a
background-complementary foreground image.
pseudo-color image whose pixel values are bounded We extract text in pseudo-color images by combin-
between 0 and 255. A local color map and/or a global ing two methods. One is based on foreground in-
color map is attached to each GIF file to map the formation and the other is based on the background
8-bit image to a full color space. GIF format has two information. Although the pixel values in a GIF im-
versions, GIF87a and GIF89a. The later can encode age can range from 0 to 255, most images contain
an image by interlacing in order to display it in a values only in a small interval, i.e., º@256. Figure 9 is
coarse-to-fine manner during transmission and can the histogram of the pseudo-color image in Fig. 4(c),
indicate a color as a transparent background. As far which shows that a large number of bins are empty.
as the data structure is concerned, an 8-bit pseudo- First, we regard each element image as a real fore-
color image is no different from an 8-bit gray scale ground image. Furthermore, the number of distinct
image. However, they are completely different in values shared by a large number of pixels is small due
terms of visual perception. The pixel values in a gray to the nature of synthetic images. We assume that the
scale image have physical interpretation in terms of characters in a text are of reasonable size and the
light reflectance, so the difference between two gray characters occupy a sufficiently large number of
values is meaningful. However, a pixel value in a pixels. Therefore, we retain those real foreground im-
pseudo-color image is an index to a full color map. ages in which the number of foreground pixels is
Therefore, two pixels with similar pseudo-color values larger than a threshold ¹np ("400). Further, we
may have distinct colors. empirically choose N"8 as the number of real
2062 A. K. JAIN and B. YU
Fig. 10. Decomposition of web image of Fig. 4(c): (a)—(f ) real foreground images; (g) background-
complementary foreground image.
Fig. 11. Foreground extraction from a full color video frame: (a) original frame; (b) bit dropping;
(c) color quantization reduces the number of distinct colors to four; (d)—(g) real foreground images;
(h) background-complementary foreground image.
foreground images. For the text without an unique 2.3. Color images and video frames
color value, we assume that its background has
a unique color value. The area of the background A color image or a video frame is a 24-bit image, so
should be large enough, so we regard the color with the value of º can be very large. To extract only
the largest number of pixels as the background. We a small number of foregrounds from a full color image
also regard the color value with the second largest with the presumption that the color of text is distinct
number of pixels as background if this number is from the color of its background, we implement (i) bit
larger than a threshold ¹bg ("10 000). Thus, a back- dropping for RGB color bands and (ii) color quantiz-
ground-complementary foreground image can be gen- ation. A 24-bit color image consists of three 8-bit red,
erated. At most, we consider only nine foreground green and blue images. For our task of text location,
images (eight real foreground images plus one back- we simply use the highest two bits for each band
ground-complementary foreground image). Each image, which has the same effect as color re-scaling.
foreground is tagged with a foreground identification Therefore, a 24-bit color image is correspondingly
(FI). The image in Fig. 4(c) has 117 element images reduced to a 6-bit color image and the value of º is
(see the histogram in Fig. 9) and only six of them are reduced to 64. Figure 11(b) shows the bit dropping
selected as real foreground images which are shown in results for the input color image shown in Fig. 11(a),
Fig. 10(a)—(f ). One background-complementary fore- where only the highest two bits have been retained
ground image is shown in Fig. 10(g). from each color band. The retained color prototypes
Automatic text location in images and video frames 2063
Fig. 12. Color prototypes: (a) after bit-dropping; (b) after color quantization.
Each run length in the first row of the input image is regarded as a block with a corresponding FI.
For the successive rows in the image M
For each run length r in the current row M
c
If r is 8!connected to a run length in the preceding row and they have the same FI M
c
If r is 8-connected to only one run length r with the same FI and the differences of the horizontal positions of their
c l
beginning and end pixels are, respectively, within a given tolerance ¹ , then r is merged into the block node
a c
n involving r .
i l
Else, r is regarded as a new block node n with a corresponding FI, initialized with edges e(n , n ) to those block
c i`1 i j
nodes Mn N which are 8-connected to r .
j c
N
Else, r is regarded as a new block node n with a corresponding FI.
c i`1
N
N
Fig. 15. Connected component analysis for the foreground image in Fig. 11(f ): (a) connected compo-
nents; (b) connected component thresholding; (c) candidate text lines.
which satisfy the following conditions: (i) ci LB; provide additional information for this classification.
(ii) ∀nj , nk 3ci , there is a path Text identification module in our system determines
whether candidate text lines contain text or non-text
(n , n , n , 2 , n , n )
j j1 j2 jp k based on statistical features of connected components.
such that n 3c for l"1, 2, 2 , p and A candidate text line containing a number of charac-
jl i
ters will usually consist of several connected compo-
e(n , n ), e(n , n ), 2 , e(n , n ), e(n , n ) 3E;
j j1 j1 j2 jp~1 jp jp k nents. The number of such connected components
and (iii) if &e(nj , nk) 3E and nj 3ci then nk 3 ci . The may not be the same as the number of characters in
upper left and lower right coordinates of a connected this text line because some of the characters may be
component ci"MnjN are touching each other. Figure 16(b) illustrates the text
lines and connected components for the text in
Xu (ci )"min MXu (nj )N, Xl (ci )"max MXl (nj )N, Fig. 16(a) where characters are well separated. On the
n |c
j i n |c j i other hand, many characters shown in Fig. 16(c) are
touching each other and a connected component
½u (ci )"min M½u (nj )N, ½l (ci )"max M½l (nj )N. shown in Fig. 16(d) may include more than one char-
nj |ci nj |ci acter. We have designed two different recognition
The extracted connected components for the fore- strategies for touching and non-touching characters.
ground image shown in Fig. 11(f ) is depicted in A candidate line is recognized as a text line if it is
Fig. 15(a). Very small connected components are de- accepted by any one of the strategies.
leted as shown in Fig. 15(b). Assuming that we are
looking for horizontal text, we cluster connected com- 4.1. Inter-component features
ponents in horizontal direction and the resulting com-
ponents are called candidate text lines as shown in For separated characters, their corresponding con-
Fig. 15(c). nected components should be well aligned. Therefore,
we preserve those text lines in which the top and
4. TEXT IDENTIFICATION
bottom edges of the contained connected components
are respectively aligned, or both the width and the
Without character recognition capabilities, it is not height values of these connected components are close
easy to distinguish characters from non-characters to each other. In addition, the number of connected
simply based on the size of connected components. components should be in proportion to the length of
A line of text consisting of several characters can the text line.
Automatic text location in images and video frames 2065
Fig. 16. Characters in a text line: (a) well separated characters; (b) connected components and text
lines for (a); (c) characters touching each other; (d) connected components and text line for (c);
(e) X-axis projection profile and signature of the text in (c); (f ) ½-axis projection profile and signature of
the text in (c).
Fig. 17. Text composition: (a) text lines extracted from the foreground image in Fig. 10(b); (b) text line
extracted from the foreground image in Fig. 10(g); (c) composed result.
Text carrier No. of test images Typical size Accuracy (%) Avg. CPU time (s)
4.2. Projection profile features can be viewed as run lengths of 1s and 0s, where a
For characters touching each other, features are 1 represents a profile value larger than a threshold
extracted based on the projection profiles of the text and a 0 represents a profile value below the threshold.
line in both horizontal and vertical directions. The Therefore, we consider the following features to char-
basic idea is that if there are characters in a candidate acterize text: (i) because text should have many
text line then there will be a certain number of humps humps in the X profile, but only a few humps in the
in its X-axis projection profile and one significant ½ profile, the number of its 1-run lengths in X signa-
hump in its ½-axis projection profile. Figure 16(e) and ture is required to be larger than 5 and the number of
(f ) depict X-axis and ½-axis projection profiles of the its 1-run lengths in ½ signature should be less than 3;
text shown in Fig. 16(c). The signatures of the projec- (ii) since a very wide hump in the X profile of text
tion profiles in both the directions are generated by is not expected, the maximum length of the 1-run
means of thresholding and they are also shown in lengths in X signature should be less than 1.4 times
Fig. 16(e) and (f ). The threshold for X profile is its the height of the text line; and (iii) the humps in the
mean value and the threshold for ½ profile is chosen X profile should be regular in width, i.e. the standard
as one third of the highest value in it. The signatures deviation of the length of 1-run lengths should be less
2066 A. K. JAIN and B. YU
than 1.2 times their mean, and the mean should be less 5. EXPERIMENTAL RESULTS
than 0.11 times the height of the text line.
The proposed system for automatic text location
has been tested on a number of binary images,
4.3. Text composition
pseudo-color images, color images and video frames.
Connected component analysis and text identifica- Since different applications need different heuristics,
tion modules are applied to individual foreground the modules and parameters used in the algorithm
images. Ideally, the union of the outputs from the shown in Fig. 5 change accordingly. Table 1 lists the
individual foreground images should provide the lo- performance of our system. We compute the accuracy
cation of the text. However, the text lines extracted for advertisement images by manually counting the
from different foreground images may be overlapping number of correctly located characters. The accu-
and, therefore, they need to be merged. Two text lines racies for other images are subjectively computed
are merged and replaced by a new text line if their based on the number of correctly located important
horizontal distance is small and their vertical overlap text regions in the image. The false alarm rate is
is large. Figure 17(c) shows the final text location relatively high for color images and is the lowest for
results for the image in Fig. 4(c). Figure 17(a) and (b) advertisement images. At the same time, the accuracy
are the text lines extracted from the two foreground for color image is the lowest because of the high
images shown in Fig. 10(b) and (g); Fig. 17(c) is the complexity of the background. The processing time is
union of Fig. 17(a) and (b). reported for a Sun UltraSPARC I system (167 MHz)
Automatic text location in images and video frames 2067
with a 64 MB memory. More details of our experi- The text along a semicircle at the top of Fig. 18(e)
ments for different text carriers are explained in the cannot be detected by our algorithm. More complic-
following sub-sections. ated heuristics are needed to locate such text. Some
punctuation and dashed lines are missed as expected
because of their small size.
5.1. Advertisement images
The test images were scanned from a newspaper at
5.2. Web images
150 dpi. Some of the text location results are shown in
Fig. 18, where both normal text and reversed text are The 22 representative web images shown in Fig. 19
located and illustrated in red bounding boxes. The were down-loaded through the Internet. The corres-
line of white blocks in the upper part of Fig. 18(b) is ponding results of text location are shown in gray
detected as text because the blocks are regularly ar- scale in Fig. 19. The text in Fig. 19(a) is not com-
ranged in terms of size and alignment. However, this pletely aligned along a straight line. The data
region should be easily rejected by an OCR module. for image in Fig. 19(h) could not be completely
2070 A. K. JAIN and B. YU
Fig. 23. Video frames containing both caption and non-caption text.
conversion and database indexing are two major ap- IEEE ¹rans. Pattern Anal. Machine Intell. 10, 910—918
plications of the proposed text location algorithm. A (1988).
method for text location based on multivalued image 7. I. Pitas and C. Kotropoulos, A texture-based approach
to the segmentation of semitic image, Pattern Recogni-
processing is proposed. A multivalued image includ- tion 25, 929—945 (1992).
ing binary image, gray-scale image, pseudo-color im- 8. A. Jain and S. Bhattacharjee, Text segmentation using
age and full color image can be decomposed into Gabor filters for automatic document processing, Mach.
multiple real foreground and background-com- »ision Applic. 5, 169—184 (1992).
9. Y. Zhong, K. Karu and A. Jain, Locating text in complex
plementary foreground images. For full color images, color images, Pattern Recognition 28, 1523—1535 (1995).
a color reduction method is presented, including bit 10. B. Yu and A. Jain, A robust and fast skew detection
dropping and color clustering. Therefore, the connec- algorithm for generic documents, Pattern Recognition,
ted component analysis for binary images can be used 29, 1599—1629 (1996).
in multivalued image processing to find text lines. We 11. Y. Tang, S. Lee and C. Suen, Automatic document
processing: a survey, Pattern Recognition 29, 1931—1952
have also proposed an approach to text identification (1996).
which is applicable to both separated and touching 12. M. Gray, Internet statistics: growth and usage of the
characters. Text location algorithm has been applied Web and the Internet, at http://www.mit.edu/people/
to advertisement images, Web images, color images mkgray/net/.
13. Altavista Web page, at http://altavista.digital.com/.
and video frames. The application to classified ad- 14. D. Lopresti and J. Zhou, Document analysis and the
vertisement conversion demands a higher accuracy. World Wide Web. Proc. ¼orkshop on Document Analy-
Therefore, we use a higher scan resolution of 150 dpi. sis Systems, Marven, pp. 417—424 (1996).
For other applications, the goal is to find all the 15. E. R. Lee, P. K. Kim and H. J. Kim, Automatic recogni-
important text for searching or indexing. Compared tion of a car license plate using color image processing,
Proc. 1st IEEE Conf. on Image Processing, Austin,
to texture-based method(8) and motion-based ap- pp. 301—305 (1994).
proach for video,(2,3) our method has a higher speed 16. R. W. Picard and T. P. Minka, Vision texture for annota-
and accuracy in terms of finding a bounding box tion, Multimedia Systems 3, 3—14 (1995).
around important text regions. Because of the diver- 17. S. Sclaroff and A. Pentland, Modal matching for corres-
pondence and recognition IEEE ¹rans. Pattern Anal.
sity of colors, the text location accuracy for color Machine Intell. 17, 545—561 (1995).
images is not as good compared to that for other 18. H. Sakamoto, H. Suzuki and A. Uemori, Flexible mon-
input sources. Our method does not work well where tage retrieval for image data, Proc. SPIE Conf. on Stor-
the three-dimensional color histogram is sparse and age and Retrieval for Image and »ideo Databases II,
there there are no dominant prototypes. Vol. SPIE2185, San Jose, pp. 25—33 (1994).
19. A. S. Gordon and E. A. Domeshek, Conceptual indexing
for video retrieval, Proc. Int. Joint Conf. on Artificial
Intelligence, Montreal, pp. 23—38 (1995).
REFERENCES 20. P. Schauble and M. Wechsler, First experiences with
a system for content based retrieval of information from
1. S. Mori and D. Y. Suen and K. Yamamoto, Historical speech, Proc. Int. Joint Conf. on Artificial Intelligence,
review of OCR research and development, Proc. IEEE Montreal, pp. 59—70 (1995).
80, 1029—1058 (1992). 21. A. Jain and A. Vailaya, Image retrieval using color and
2. A. Jain and B. Yu, Document representation and its shape, Pattern Recognition 29, 1233—1244 (1996).
application to page decomposition, IEEE ¹rans. Pat- 22. B. Shahraray and D. Gibbon, Automatic generation of
tern. Anal. Machine Intell. 20, 294—308 (1998). pictorial transcripts of video programs, Proc. SPIE Conf.
3. B. Yu, A. Jain and M. Mohiuddin, Address block loca- on Multimedia Computing and Networking, Vol. SPIE
tion on complex mail pieces, in Proc. 4th Int. Conf. on 2417, San Jose, pp. 2417—2447 (1995).
Document Analysis and Recognition, Ulm, pp. 897—901 23. R. Lienhart and F. Stuber, Automatic text recognition in
(1997). digital videos, Proc. SPIE 2666, San Jose, pp. 180—188
4. S. N. Srihari, C. H. Wang, P. W. Palumbo and J. J. Hull, (1996).
Recognizing address blocks on mail pieces: specialized 24. J. Zhou, D. Lopresti and Z. Lei, OCR for World Wide
tools and problem-solving architectures, Artificial Intel- Web images, Proc. of IS&¹/SPIE Electronic Imaging:
ligence 8, 25—35, 38—40 (1987). Document Recognition I», San Jose (1997).
5. B. Yu and A. Jain, A generic system for form dropout, 25. W. B. Pennebaker and J. L. Mitchell, JPEG: Still Image
IEEE ¹rans. Pattern Anal. Machine Intell. 18, Compression Standard. Van Nostrand Reinhold, New
1127—1134 (1996). York: NY (1993).
6. L. A. Fletcher and R. Kasturi, A robust algorithm for 26. A. K. Jain and R. C. Dubes, Algorithms for Clustering
text string separation from mixed text/graphics images, Data. Prentice Hall, Englewood Cliffs, NJ (1988).
About the Author—ANIL JAIN is a university distinguished Professor and Chair of the Department of
Computer Science at Michigan State University. His research interests include statistical pattern recogni-
tion, Markov random fields, texture analysis, neural networks, document image analysis, fingerprint
matching and 3D object recognition. He received the best paper awards in 1987 and 1991 and certificates
for outstanding contributions in 1976, 1979, 1992, and 1997 from the Pattern Recognition Society. He also
received the 1996 IEEE Trans. Neural Networks Outstanding Paper Award. He was the Editor-in-Chief of
the IEEE Trans. on Pattern Analysis and Machine Intelligence (1990—94). He is the co-author of
Algorithms for Clustering Data, Prentice-Hall, 1988, has edited the book Real-Time Object Measurement
and Classification, Springer-Verlag, 1988, and co-edited the books, Analysis and Interpretation of Range
Images, Springer-Verlag, 1989, Markov Random Fields, Academic Press, 1992, Artificial Neural Networks
and Pattern Recognition, Elsevier, 1993, 3D Object Recognition, Elsevier, 1993, and BIOMETRICS:
2076 A. K. JAIN and B. YU
About the Author—BIN YU received his Ph.D. degree in Electronic Engineering from Tsinghua University
in 1990, M.S. degree in Electrical Engineering from Tianjin University in 1986 and B.S. degree in
Mechanical Engineering from Hefei Polytechnic University in 1983. Dr. Yu was a visiting scientist in the
Pattern Recognition and Image Processing Laboratory of the Department of Computer Science at
Michigan State University from 1995 to 1997. Since 1992, he has been an Associate Professor in the
Institute of Information Science at Northern Jiaotong University where he worked as a Postdoctoral
Fellow from 1990 to 1992. He is now working as a Senior Staff Vision Engineer at Electroglas, Inc., Santa
Clara. His research interests include Image Processing, Pattern Recognition and Computer Vision. Dr. Yu
has authored more than 50 journal and conference papers. He is a Member of the IEEE, a Member of the
Youth Board of the Chinese Institute of Electronics, and a Senior Member of the Chinese Institute of
Electronics.