loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Monah Hatoum 1 ; Jean-Claude Charr 1 ; Christophe Guyeux 1 ; David Laiymani 1 and Alia Ghaddar 2

Affiliations: 1 University of Bourgogne Franche-Comté, UBFC, CNRS, 90000 Belfort, France ; 2 Department of Computer Science, International University of Beirut, Beirut P.O. Box 146404, Lebanon

Keyword(s): Deep Learning, Natural Language Processing (NLP), Computer-Aid Diagnosis, Chief Complaints, Text Mining, Abbreviations, Negations, Phrases.

Abstract: Downstream tasks like clinical textual data classification perform best when given good-quality datasets. Most of the existing clinical textual data preparation techniques rely on two main approaches, removing irrelevant data using cleansing techniques or extracting valuable data using feature extraction techniques. However, they still have limitations, mainly when applied to real-world datasets. This paper proposes a cleansing approach (called EMTE) which extracts phrases (medical terms, abbreviations, and negations) using pattern-matching rules based on the linguistic processing of the clinical textual data. Without requiring training, EMTE extracts valuable medical data from clinical textual records even if they have different writing styles. Furthermore, since EMTE relies on dictionaries to store abbreviations and pattern-matching rules to detect phrases, it can be easily maintained and extended for industrial use. To evaluate the performance of our approach, we compared the perf ormance of EMTE to three other techniques. All four cleansing techniques were applied to a large industrial imbalanced dataset, consisting of 2.21M samples from different specialties with 1,050 ICD-10 codes. The experimental results on several Deep Neural Network (DNN) algorithms showed that our cleansing approach significantly improves the trained models’ performance compared to the other tested techniques and according to different metrics. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.15.228.162

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Hatoum, M.; Charr, J.; Guyeux, C.; Laiymani, D. and Ghaddar, A. (2023). EMTE: An Enhanced Medical Terms Extractor Using Pattern Matching Rules. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-623-1; ISSN 2184-433X, SciTePress, pages 301-311. DOI: 10.5220/0011717300003393

@conference{icaart23,
author={Monah Hatoum. and Jean{-}Claude Charr. and Christophe Guyeux. and David Laiymani. and Alia Ghaddar.},
title={EMTE: An Enhanced Medical Terms Extractor Using Pattern Matching Rules},
booktitle={Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2023},
pages={301-311},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011717300003393},
isbn={978-989-758-623-1},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - EMTE: An Enhanced Medical Terms Extractor Using Pattern Matching Rules
SN - 978-989-758-623-1
IS - 2184-433X
AU - Hatoum, M.
AU - Charr, J.
AU - Guyeux, C.
AU - Laiymani, D.
AU - Ghaddar, A.
PY - 2023
SP - 301
EP - 311
DO - 10.5220/0011717300003393
PB - SciTePress