Identification of misspelled words without a comprehensive dictionary using prevalence analysis

A Turchin, JT Chu, M Shubina… - AMIA Annual …, 2007 - pmc.ncbi.nlm.nih.gov
A Turchin, JT Chu, M Shubina, JS Einbinder
AMIA Annual Symposium Proceedings, 2007pmc.ncbi.nlm.nih.gov
Misspellings are common in medical documents and can be an obstacle to information
retrieval. We evaluated an algorithm to identify misspelled words through analysis of their
prevalence in a representative body of text. We evaluated the algorithm's accuracy of
identifying misspellings of 200 anti-hypertensive medication names on 2,000 potentially
misspelled words randomly selected from narrative medical documents. Prevalence ratios
(the frequency of the potentially misspelled word divided by the frequency of the non …
Misspellings are common in medical documents and can be an obstacle to information retrieval. We evaluated an algorithm to identify misspelled words through analysis of their prevalence in a representative body of text. We evaluated the algorithm’s accuracy of identifying misspellings of 200 anti-hypertensive medication names on 2,000 potentially misspelled words randomly selected from narrative medical documents. Prevalence ratios (the frequency of the potentially misspelled word divided by the frequency of the non-misspelled word) in physician notes were computed by the software for each of the words. The software results were compared to the manual assessment by an independent reviewer. Area under the ROC curve for identification of misspelled words was 0.96. Sensitivity, specificity, and positive predictive value were 99.25%, 89.72% and 82.9% for the prevalence ratio threshold (0.32768) with the highest F-measure (0.903). Prevalence analysis can be used to identify and correct misspellings with high accuracy.
pmc.ncbi.nlm.nih.gov
Showing the best result for this search. See all results