A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text

Mustafa, Ahmed; Rafique, Muhammad Tahir; Baig, Muhammad Ijlal; Sajid, Hasan; Khan, Muhammad Jawad; Kallu, Karam Dad

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.15119 (cs)

[Submitted on 27 Aug 2024 (v1), last revised 30 Aug 2024 (this version, v3)]

Title:A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text

Authors:Ahmed Mustafa, Muhammad Tahir Rafique, Muhammad Ijlal Baig, Hasan Sajid, Muhammad Jawad Khan, Karam Dad Kallu

View PDF HTML (experimental)

Abstract:This research paper introduces a novel word-level Optical Character Recognition (OCR) model specifically designed for digital Urdu text, leveraging transformer-based architectures and attention mechanisms to address the distinct challenges of Urdu script recognition, including its diverse text styles, fonts, and variations. The model employs a permuted autoregressive sequence (PARSeq) architecture, which enhances its performance by enabling context-aware inference and iterative refinement through the training of multiple token permutations. This method allows the model to adeptly manage character reordering and overlapping characters, commonly encountered in Urdu script. Trained on a dataset comprising approximately 160,000 Urdu text images, the model demonstrates a high level of accuracy in capturing the intricacies of Urdu script, achieving a CER of 0.178. Despite ongoing challenges in handling certain text variations, the model exhibits superior accuracy and effectiveness in practical applications. Future work will focus on refining the model through advanced data augmentation techniques and the integration of context-aware language models to further enhance its performance and robustness in Urdu text recognition.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2408.15119 [cs.CV]
	(or arXiv:2408.15119v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.15119

Submission history

From: Ahmed Mustafa [view email]
[v1] Tue, 27 Aug 2024 14:58:13 UTC (1,153 KB)
[v2] Wed, 28 Aug 2024 09:11:55 UTC (1,153 KB)
[v3] Fri, 30 Aug 2024 15:29:08 UTC (177 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators