International Journal of Computational Intelligence Systems

Volume 9, Issue 5, September 2016, Pages 900 - 916

A New Approach to Extract Text from Images based on DWT and K-means Clustering

Authors
Deepika Ghai[email protected], Divya Gera[email protected], Neelu Jain[email protected]
ECE Department, PEC University of Technology, Sector-12, Chandigarh 160 012, UT, India, Tel. No. +91-9463647809, +91-9467626310, +91-9888014575
Received 16 March 2015, Accepted 18 May 2016, Available Online 1 September 2016.
DOI
10.1080/18756891.2016.1237189How to use a DOI?
Keywords
Text extraction; Texture features; DWT; K-means clustering; sliding window; voting decision
Abstract

Text present in image provides important information for automatic annotation, indexing and retrieval. Therefore, its extraction is a well known research area in computer vision. However, variations of text due to differences in orientation, alignment, font, size, low image contrast and complex background make the problem of text extraction extremely challenging. In this paper, we propose a texture-based text extraction method using DWT with K-means clustering. First, the edges are detected from image by using DWT. Then, a small size overlapped sliding window is used to scan high frequency component sub-bands from which texture features of text and non-text regions are extracted. Based on these features, K-means clustering is employed to classify the image into text, simple background and complex background clusters. Finally, voting decision process and area based filtering are used to locate text regions exactly. Experimentation is carried out using public dataset ICDAR 2013 and our own dataset for English, Hindi and Punjabi text images for different number of clusters. The results show that the proposed method gives promising results with different languages in terms of detection rate (DR), precision rate (PR) and recall rate (RR).

Copyright
© 2016. the authors. Co-published by Atlantis Press and Taylor & Francis
Open Access
This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Download article (PDF)
View full text (HTML)

Journal
International Journal of Computational Intelligence Systems
Volume-Issue
9 - 5
Pages
900 - 916
Publication Date
2016/09/01
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.1080/18756891.2016.1237189How to use a DOI?
Copyright
© 2016. the authors. Co-published by Atlantis Press and Taylor & Francis
Open Access
This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Deepika Ghai
AU  - Divya Gera
AU  - Neelu Jain
PY  - 2016
DA  - 2016/09/01
TI  - A New Approach to Extract Text from Images based on DWT and K-means Clustering
JO  - International Journal of Computational Intelligence Systems
SP  - 900
EP  - 916
VL  - 9
IS  - 5
SN  - 1875-6883
UR  - https://doi.org/10.1080/18756891.2016.1237189
DO  - 10.1080/18756891.2016.1237189
ID  - Ghai2016
ER  -