Improving Bilingual Lexicon Induction for Low Frequency Words

Jiaji Huang, Xingyu Cai, Kenneth Church


Abstract
This paper designs a Monolingual Lexicon Induction task and observes that two factors accompany the degraded accuracy of bilingual lexicon induction for rare words. First, a diminishing margin between similarities in low frequency regime, and secondly, exacerbated hubness at low frequency. Based on the observation, we further propose two methods to address these two factors, respectively. The larger issue is hubness. Addressing that improves induction accuracy significantly, especially for low-frequency words.
Anthology ID:
2020.emnlp-main.100
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1310–1314
Language:
URL:
https://aclanthology.org/2020.emnlp-main.100
DOI:
10.18653/v1/2020.emnlp-main.100
Bibkey:
Cite (ACL):
Jiaji Huang, Xingyu Cai, and Kenneth Church. 2020. Improving Bilingual Lexicon Induction for Low Frequency Words. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1310–1314, Online. Association for Computational Linguistics.
Cite (Informal):
Improving Bilingual Lexicon Induction for Low Frequency Words (Huang et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.100.pdf
Video:
 https://slideslive.com/38939203