As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Most existing biomedical language models are trained on plain text with general learning goals such as random word infilling, failing to capture the knowledge in the biomedical corpus sufficiently. Since biomedical articles usually contain many tables summarising the main entities and their relations, in the paper, we propose a Tabular knowledge enhanced bioMedical pretrained language model, called TabMedBERT. Specifically, we align entities between table cells, and article text spans with pre-defined rules. Then we add two table-related self-supervised tasks to integrate tabular knowledge into the language model: Entity Infilling (EI) and Table Cloze Test (TCT). While EI masks tokens within aligned entities in the article, TCT converts aligned entities in the table layout into a cloze text by erasing one entity and prompts the model to extract the appropriate span to fill in the blank. Experimental results demonstrate that TabMedBERT surpasses all competing language models without adding additional parameters, establishing a new state-of-the-art performance of 85.59% (+1.29%) on the BLURB biomedical NLP benchmark and 7 additional information extraction datasets. Moreover, the model architecture for TCT provides a straightforward solution to revise information extraction with paired entities.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.