计算机科学 ›› 2017, Vol. 44 ›› Issue (Z6): 14-18.doi: 10.11896/j.issn.1002-137X.2017.6A.003

• 综述研究 • 上一篇    下一篇

跨语言命名实体翻译对抽取的研究综述

王志娟,李福现   

  1. 中央民族大学信息工程学院 北京100081;国家语言资源监测与研究中心少数民族语言分中心 北京100081,合一网络技术有限公司数据智能部 北京100080
  • 出版日期:2017-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金重点项目(61331013),国家语委科研项目(WT125-46)资助

Survey on Cross-language Named Entity Translation Pairs Extraction

WANG Zhi-juan and LI Fu-xian   

  • Online:2017-12-01 Published:2018-12-01

摘要: 跨语言命名实体对于机器翻译、跨语言信息抽取都具有重要意义,从命名实体的音译、基于平行/可比语料库的跨语言命名实体对齐、基于网络挖掘的跨语言命名实体对翻译抽取3个方面对跨语言命名实体翻译对抽取的研究现状进行了总结。音译是跨语言命名实体翻译对抽取的重点内容之一,基于深度学习的音译模型将是今后的研究重点。目前,跨语言平行/可比语料库的获取和标注直接影响基于语料库的跨语言命名实体对齐的深入研究。基于信息检索和维基百科的跨语言命名实体翻译对抽取研究将是跨语言命名实体翻译对抽取研究的趋势。

关键词: 命名实体翻译对,音译,命名实体对齐,网络挖掘

Abstract: Cross-language named entity translation pairs are very important for machine translation,cross-language information retrial and so on.We made a survey on cross-language named entity translation pair extraction in three aspects.Firstly,transliteration is vital for cross-language named entity translation pair extraction.Rules,machine learning and deep learning are used in many languages named entity translation.The performance of transliteration model based on deep learning is excellent and it will be the key method in future studies.Secondly,named entity alignment based on parallel/comparable corpus is a useful method to get cross-language named entity translation pairs.The constructing and annotation of cross-language corpus are bottle necks for research on named entity alignment based on parallel/comparable corpus.Thirdly,cross-language named entity translation pairs can be extracted by Web mining.Cross-language named entity extraction based on cross-language information retail and knowledge base such as Wikipedia will be the trend in the future.

Key words: Named entity translation pairs,Transliteration,Named entity alignment,Web mining

[1] RAU L F.Extracting company names from text[C]∥Procee-dings Seventh IEEE Conference on Artificial Intelligence Application.1991:29-32.
[2] 华却才让,姜文斌,赵海兴,等.基于感知机模型藏文命名实体识别[J].计算机工程与应用,2014,0(15):172-176.
[3] 加羊吉,李亚超,宗成庆,等.最大熵和条件随机场模型相融合的藏文人名识别[J].中文信息学报,2014,28(1):107-112.
[4] 木合塔尔·艾尔肯.基于条件随机场的维吾尔语人名识别[D].乌鲁木齐:新疆大学,2013.
[5] 米日姑·肉孜.维吾尔文机构名识别研究[D].乌鲁木齐:新疆大学,2013.
[6] 通拉嘎.基于蒙古文语料库的人名自动识别[D].北京:中央民族大学,2013.
[7] AL-ONAIZAN Y,KNIGHT K.Translating named entitiesusing monolingual and bilingual resources[C]∥Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.2002:400-408.
[8] 王欣欣.统计机器翻译中命名实体处理研究[D].哈尔滨:哈尔滨工业大学,2009.
[9] DHORE M L,DHORE R M,RATHOD P H.Survey on Machine Transliteration and Machine Learning Models[J].International Journal on Natural Language Computing,2015,4(2):9-30.
[10] 阿力木·木拉提.基于音节切分的维吾尔人名汉字音译研究与实现[D].乌鲁木齐:新疆师范大学,2014.
[11] KARIMI S,SCHOLER F,TURPIN A.Machine transliteration survey [J].ACM Computing Surveys,2011,43(3):194-218.
[12] DAVIS M E,LIN J.Object-oriented rule-based text input transliteration system:US,US 5432948 A [P].1995.
[13] BHALLA D,JOSHI N,MATHUR I.Rule Based Transliteration Scheme for English to Punjabi [J].International Journal on Natural Language Computing,2013,2(2):67-73.
[14] WAN S,VERSPOOR C M.Automatic English-Chinese nametransliteration for development of multilingual resources[C]∥Proceedings of the 17th International Conference on Computational linguistics-Volume 2.Association for Computational Linguistics,1998:1352-1356.
[15] KNIGHT K,GRAEHL J.Machine transliteration[C]∥Eighth Conference on European Chapter of the Association for Computational Linguistics.1997:599-612.
[16] GAO W,WONG K F,LAM W.Improving Transliteration with Precise Alignment of Phoneme Chunks and Using Contextual Features[J].Lecture Notes in Computer Science,2004,3411:106-117.
[17] HAIZHOU L,MIN Z,JIAN S.A joint source-channel model for machine transliteration[C]∥Meeting on Association for Computational Linguistics.2004:1190-1194.
[18] LIN Y,PAN X,DERI A,et al.Leveraging Entity Linking and Related Language Projection to Improve Name Transliteration[C]∥Proceedings of the Sixth Named Entity Workshop,Joint with 54th ACL.2016:1-10.
[19] REDDY S,WAXMONSKY S.Substring-based transliterationwith conditional random fields[M]∥Named Entities Workshop:Shared Task on Transliteration of ACL.2009:92-95.
[20] DHORE M L,DIXIT S K,SONWALKAR T D.Hindi to English machine transliteration of named entities using conditional random fields[J].International Journal of Computer Applications,2012,48(23):31-37.
[21] GOTO I,KATO N,URATANI N,et al.Transliteration consi-dering context information based on the maximum entropy method[C]∥Proceedings of MT-Summit IX.2003:125-132.
[22] RATHOD P,DHORE M,DHORE R M.Hindi and Marathi to English machine transliteration using SVM[J].International Journal on Natural Language Computing,2013,2(4):55-71.
[23] KANG B J,CHOI K S.Automatic Transliteration and Back- transliteration by Decision Tree Learning[C]∥Conference on Language Resources and Evaluation.2000.
[24] KIM J J,LEE J S,CHOI K S.Pronunciation unit based automatic English-Korean transliteration model using neural network[C]∥Proceedings of Korea Cognitive Science Association.1999:247-252.
[25] FINCH A,DIXON P,SUMITA E.Rescoring a phrase-basedmachine transliteration system with recurrent neural network language models[C]∥Proceedings of the 4th Named Entity Workshop Association for Computational Linguistics.2012:47-51.
[26] LIU,WATANABE T,SUMITA E,et al.Additive Neural Networks for Statistical Machine Translation[C]∥Meeting of the Association for Computational Linguistics.2013:791-801.
[27] FINCH A,LIU L,WANG X,et al.Neural Network Transduction Models in Transliteration Generation[C]∥ Proceedings of the Fifth Named Entity Workshop.2015:61-66.
[28] DUAN X Y,BANCHS R E,ZHANG M,et al.Report of NEWS 2016 Machine Transliteration Shared Task [C]∥Proceedings of the Sixth Named Entity Workshop.2016:58-72.
[29] SHAO Y,NIVRE J.Applying Neural Networks to English-Chinese Named Entity Transliteration[M]∥Sixth Named Entity Workshop.2016.
[30] KUMANO T,KASHIOKA H,TANAKA H,et al.Acquiringbilingual named entity translations from content-aligned corpora[C]∥International Conference on Natural Language Processing.Springer Berlin Heidelberg,2004:177-186.
[31] HUANG F.Improved named entity translation and bilingualnamed entity extraction[C]∥Proceedings of the 4th IEEE International Conference on Multimodal Interfaces.IEEE Computer Society,2002:253.
[32] HUANG F,VOGEL S,WAIBEL A.Automatic extraction ofnamed entity translingual equivalence based on multi-feature cost minimization[C]∥ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition.Association for Computational Linguistics.2003:9-16.
[33] LU M,ZHAO J.Multi-feature Based Chinese-English NamedEntity Extraction from Comparable Corpora [C]∥Proceeding of PACLIC.2006,6:134-141.
[34] FUNG P,CHEUNG P.Mining Very-Non-Parallel Corpora:Pa-rallel Sentence and Lexicon Extraction via Bootstrapping and E[C]∥EMNLP.2004:57-63.
[35] FENG D,Lü Y,Zhou M.A New Approach for English-Chinese Named Entity Alignment[C]∥EMNLP.2004:372-379.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!