[PDF][PDF] Applying probabilistic term weighting to OCR text in the case of a large alphabetic library catalogue
E Mittendorf, P Schäuble, P Sheridan - Proceedings of the 18th annual …, 1995 - dl.acm.org
E Mittendorf, P Schäuble, P Sheridan
Proceedings of the 18th annual international ACM SIGIR conference on …, 1995•dl.acm.orgWe report on a probabilistic weighting approach to indexing the scanned images of very
short documents. This fully automatic process copes with short and very noisy texts (67%
word accuracy) derived from the images by Optical Character Recognition(OCR). The
probabilistic term weighting approach is based on a theoretical proof explaining how the
retrieval effectiveness is affected by recognition errors. We have evaluated our probabilistic
weighting approach on a sample of index cards from an alphabetic library catalogue where …
short documents. This fully automatic process copes with short and very noisy texts (67%
word accuracy) derived from the images by Optical Character Recognition(OCR). The
probabilistic term weighting approach is based on a theoretical proof explaining how the
retrieval effectiveness is affected by recognition errors. We have evaluated our probabilistic
weighting approach on a sample of index cards from an alphabetic library catalogue where …
Abstract
We report on a probabilistic weighting approach to indexing the scanned images of very short documents. This fully automatic process copes with short and very noisy texts (67% word accuracy) derived from the images by Optical Character Recognition(OCR). The probabilistic term weighting approach is based on a theoretical proof explaining how the retrieval effectiveness is affected by recognition errors. We have evaluated our probabilistic weighting approach on a sample of index cards from an alphabetic library catalogue where, on the average, a card contains only 23 terms. We have demonstrated over 30% improvement in retrieval effectiveness over a conventional weighted retrieval method where the recognition errors are not taken into account, We also show how we can take advantage of the ordering information of the alphabetic library catalogue.
ACM Digital Library
Showing the best result for this search. See all results