PageNet: Page Boundary Extraction in Historical Handwritten Documents

Tensmeyer, Chris; Davis, Brian; Wigington, Curtis; Lee, Iain; Barrett, Bill

Computer Science > Computer Vision and Pattern Recognition

arXiv:1709.01618 (cs)

[Submitted on 5 Sep 2017]

Title:PageNet: Page Boundary Extraction in Historical Handwritten Documents

Authors:Chris Tensmeyer, Brian Davis, Curtis Wigington, Iain Lee, Bill Barrett

View PDF

Abstract:When digitizing a document into an image, it is common to include a surrounding border region to visually indicate that the entire document is present in the image. However, this border should be removed prior to automated processing. In this work, we present a deep learning based system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise. In PageNet, a Fully Convolutional Network obtains a pixel-wise segmentation which is post-processed into the output quadrilateral region. We evaluate PageNet on 4 collections of historical handwritten documents and obtain over 94% mean intersection over union on all datasets and approach human performance on 2 of these collections. Additionally, we show that PageNet can segment documents that are overlayed on top of other documents.

Comments:	HIP 2017 (in submission)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1709.01618 [cs.CV]
	(or arXiv:1709.01618v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1709.01618

Submission history

From: Chris Tensmeyer [view email]
[v1] Tue, 5 Sep 2017 22:54:49 UTC (3,555 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2017-09

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Chris Tensmeyer
Brian L. Davis
Curtis Wigington
Iain Lee
Bill Barrett

export BibTeX citation

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computer Vision and Pattern Recognition

Title:PageNet: Page Boundary Extraction in Historical Handwritten Documents

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computer Vision and Pattern Recognition

Title:PageNet: Page Boundary Extraction in Historical Handwritten Documents

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators