Paper
15 December 2003 Automatic content extraction of filled-form images based on clustering component block projection vectors
Author Affiliations +
Proceedings Volume 5296, Document Recognition and Retrieval XI; (2003) https://doi.org/10.1117/12.527345
Event: Electronic Imaging 2004, 2004, San Jose, California, United States
Abstract
Automatic understanding of document images is a hard problem. Here we consider a sub-problem, automatically extracting content from filled form images. Without pre-selected templates or sophisticated structural/semantic analysis, we propose a novel approach based on clustering the component-block-projection-vectors. By combining spectral clustering and minimal spanning tree clustering, we generate highly accurate clusters, from which the adaptive templates are constructed to extract the filled-in content. Our experiments show this approach is effective for a set of 1040 US IRS tax form images belonging to 208 types.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hanchuan Peng, Xiaofeng He, and Fuhui Long "Automatic content extraction of filled-form images based on clustering component block projection vectors", Proc. SPIE 5296, Document Recognition and Retrieval XI, (15 December 2003); https://doi.org/10.1117/12.527345
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image processing

Infrared imaging

Binary data

Image analysis

Image quality

Cognitive neuroscience

Computer science

Back to Top