A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages

Carbonell, Manuel; Fornés, Alicia; Villegas, Mauricio; Lladós, Josep

Computer Science > Computer Vision and Pattern Recognition

arXiv:1912.10016 (cs)

[Submitted on 20 Dec 2019 (v1), last revised 4 May 2020 (this version, v2)]

Title:A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages

Authors:Manuel Carbonell, Alicia Fornés, Mauricio Villegas, Josep Lladós

View PDF

Abstract:In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages and limitations compared to sequential approaches. The results show that the model is capable of benefiting from shared features for simultaneously solving interdependent tasks.

Comments:	To be published in Pattern Recognition Letters
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1912.10016 [cs.CV]
	(or arXiv:1912.10016v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1912.10016

Submission history

From: Manuel Carbonell [view email]
[v1] Fri, 20 Dec 2019 18:45:19 UTC (7,915 KB)
[v2] Mon, 4 May 2020 17:34:13 UTC (5,307 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2019-12

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Manuel Carbonell
Alicia Fornés
Mauricio Villegas
Josep Lladós

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators