Google Scholar

Improving offline HTR in small datasets by purging unreliable labels

JC Aradillas, JJ Murillo-Fuentes… - 2020 17th International …, 2020 - ieeexplore.ieee.org

JC Aradillas, JJ Murillo-Fuentes, PM Olmos

2020 17th International Conference on Frontiers in Handwriting …, 2020•ieeexplore.ieee.org

This paper focuses on the offline handwriting text recognition problem (HTR) with small training data sets. Some techniques such as transfer learning or data augmentation have recently been applied to this problem, improving the performance of the recognition. In these scenarios, we found that errors in the labelling of the training samples, present in some databases, have a great impact in the character error rates (CER). Accordingly, we propose a novel cross validation technique to remove incorrect labelled lines. In this approach, after a first training stage, transcript lines with CER above a threshold are discarded, where the threshold is a function of the available data. Less available data favours larger CER, even for healthy lines, suggesting higher thresholds for fewer lines. This new technique and the validation of the threshold are analyzed over the ICFHR 2018 competition on automated HTR and other well known databases such as Washington and Parzival. For the Ricordi database in the ICFHR 2018, with transcription errors, we report a reduction of CER by 2%.

ieeexplore.ieee.org

Show moreShow less

Save Cite Cited by 8 Related articles All 2 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Improving offline HTR in small datasets by purging unreliable labels