A Benchmark Evaluation of Clinical Named Entity Recognition in French

Bannour, Nesrine; Servan, Christophe; Névéol, Aurélie; Tannier, Xavier

Computer Science > Computation and Language

arXiv:2403.19726 (cs)

[Submitted on 28 Mar 2024]

Title:A Benchmark Evaluation of Clinical Named Entity Recognition in French

Authors:Nesrine Bannour (STL), Christophe Servan (STL), Aurélie Névéol (STL), Xavier Tannier (LIMICS)

View PDF HTML (experimental)

Abstract:Background: Transformer-based language models have shown strong performance on many Natural LanguageProcessing (NLP) tasks. Masked Language Models (MLMs) attract sustained interest because they can be adaptedto different languages and sub-domains through training or fine-tuning on specific corpora while remaining lighterthan modern Large Language Models (LLMs). Recently, several MLMs have been released for the biomedicaldomain in French, and experiments suggest that they outperform standard French counterparts. However, nosystematic evaluation comparing all models on the same corpora is available. Objective: This paper presentsan evaluation of masked language models for biomedical French on the task of clinical named entity this http URL and methods: We evaluate biomedical models CamemBERT-bio and DrBERT and compare them tostandard French models CamemBERT, FlauBERT and FrALBERT as well as multilingual mBERT using three publicallyavailable corpora for clinical named entity recognition in French. The evaluation set-up relies on gold-standardcorpora as released by the corpus developers. Results: Results suggest that CamemBERT-bio outperformsDrBERT consistently while FlauBERT offers competitive performance and FrAlBERT achieves the lowest carbonfootprint. Conclusion: This is the first benchmark evaluation of biomedical masked language models for Frenchclinical entity recognition that compares model performance consistently on nested entity recognition using metricscovering performance and environmental impact.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2403.19726 [cs.CL]
	(or arXiv:2403.19726v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.19726
Journal reference:	The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Torino, Italy

Submission history

From: Christophe Servan [view email] [via CCSD proxy]
[v1] Thu, 28 Mar 2024 07:59:58 UTC (40 KB)

Computer Science > Computation and Language

Title:A Benchmark Evaluation of Clinical Named Entity Recognition in French

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Benchmark Evaluation of Clinical Named Entity Recognition in French

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators