Anaphora Resolution for Bengali: An Experiment with Domain Adaptation
Abstract
In this paper we present our first attempt on anaphora resolution for a resource poor language, namely Bengali. We address the issue of adapting a state-of-the-art system, BART, which was originally
developed for English. Overall performance of co-reference resolution greatly depends on the high accurate mention detectors. We develop a number of models based on the heuristics used as well as on the
particular machine learning employed. Thereafter we perform a series of experiments for adapting BART for Bengali. Our evaluation shows, a language-dependant system (designed primarily for English) can achieve a good performance level when re-trained and tested on a new language with proper subsets of features. The system produces the recall, precision and F-measure values of 56.00%, 46.50% and 50.80%, respectively.
The contribution of this work is two-fold, viz. (i). attempt to build a machine learning based anaphora resolution system for a resource-poor Indian language; and (ii). domain adaptation of a state-of-the-art English
co-reference resolution system for Bengali, which has completely different orthography and characteristics.
developed for English. Overall performance of co-reference resolution greatly depends on the high accurate mention detectors. We develop a number of models based on the heuristics used as well as on the
particular machine learning employed. Thereafter we perform a series of experiments for adapting BART for Bengali. Our evaluation shows, a language-dependant system (designed primarily for English) can achieve a good performance level when re-trained and tested on a new language with proper subsets of features. The system produces the recall, precision and F-measure values of 56.00%, 46.50% and 50.80%, respectively.
The contribution of this work is two-fold, viz. (i). attempt to build a machine learning based anaphora resolution system for a resource-poor Indian language; and (ii). domain adaptation of a state-of-the-art English
co-reference resolution system for Bengali, which has completely different orthography and characteristics.
Keywords
Anaphora/Co-reference resolution, CRF based mention detection, Bengali, BART.