Paper
28 January 2008 Hybrid approach combining contextual and statistical information for identifying MEDLINE citation terms
Author Affiliations +
Proceedings Volume 6815, Document Recognition and Retrieval XV; 68150X (2008) https://doi.org/10.1117/12.766660
Event: Electronic Imaging, 2008, San Jose, California, United States
Abstract
There is a strong demand for developing automated tools for extracting pertinent information from the biomedical literature that is a rich, complex, and dramatically growing resource, and is increasingly accessed via the web. This paper presents a hybrid method based on contextual and statistical information to automatically identify two MEDLINE citation terms: NIH grant numbers and databank accession numbers from HTML-formatted online biomedical documents. Their detection is challenging due to many variations and inconsistencies in their format (although recommended formats exist), and also because of their similarity to other technical or biological terms. Our proposed method first extracts potential candidates for these terms using a rule-based method. These are scored and the final candidates are submitted to a human operator for verification. The confidence score for each term is calculated using statistical information, and morphological and contextual information. Experiments conducted on more than ten thousand HTML-formatted online biomedical documents show that most NIH grant numbers and databank accession numbers can be successfully identified by the proposed method, with recall rates of 99.8% and 99.6%, respectively. However, owing to the high false alarm rate, the proposed method yields F-measure rates of 86.6% and 87.9% for NIH grants and databanks, respectively.
© (2008) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
In Cheol Kim, Daniel X. Le, and George R. Thoma "Hybrid approach combining contextual and statistical information for identifying MEDLINE citation terms", Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150X (28 January 2008); https://doi.org/10.1117/12.766660
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Biomedical optics

Proteins

Error analysis

Medicine

Statistical analysis

Associative arrays

Biological research

Back to Top