A model based framework for table processing in degraded document images

Z Shi, S Setlur, V Govindaraju - 2013 12th International …, 2013 - ieeexplore.ieee.org
2013 12th International Conference on Document Analysis and …, 2013ieeexplore.ieee.org
This paper describes a model based framework for detection and extraction of the contents
of table cells from degraded handwritten document images that contain tables. Given the
very poor quality of the target documents, the table cell detection problem is formulated
conceptually as a two-step process. The first step is to identify the location of the table and
extract the content of table cells given a model of the structure of the table present in the
image. The second step is to identify the model of the table present in a document image …
This paper describes a model based framework for detection and extraction of the contents of table cells from degraded handwritten document images that contain tables. Given the very poor quality of the target documents, the table cell detection problem is formulated conceptually as a two-step process. The first step is to identify the location of the table and extract the content of table cells given a model of the structure of the table present in the image. The second step is to identify the model of the table present in a document image from a list of given table models. A model-based representation for tables is introduced and is used for matching table candidates with the given model to identify and extract the contents of table cells. The approach for detecting potential table candidates is based on the detection of horizontal and vertical table line candidates. The table representation is a matrix of horizontal and vertical table line crossings, and the matching algorithm is formulated as a minimization problem where the optimal table candidate is obtained using the minimal distance between the candidate and model table matrices which is then used for extraction of the table cell contents. A similar approach is used to solve the model selection problem where the best fitting location in the document page for each of the candidate models is identified using the distance minimization approach along with a confidence score and the model with the highest confidence score is selected as the correct model. The approach was tested on document page images containing tables from the challenge set of the DARPA MADCAT handwritten document image data. Results indicate that the method is effective for both model selection as well as table cell content extraction.
ieeexplore.ieee.org
Showing the best result for this search. See all results