计算机科学 ›› 2017, Vol. 44 ›› Issue (10): 259-264.doi: 10.11896/j.issn.1002-137X.2017.10.047
李金廷,侯宏旭,武静,王洪彬,樊文婷
LI Jin-ting, HOU Hong-xu, WU Jing, WANG Hong-bin and FAN Wen-ting
摘要: 传统蒙古文形态分析主要采用将蒙古文词缀和词干直接切分而仅保留词干的方法,该方法会丢掉蒙古文词缀所包含的大量语义信息。蒙古文词缀中包含大量格的附加成分,主要表征句子的结构特征,对其进行切分并不会影响词汇的语义特征,若不进行预处理则会造成严重的数据稀疏问题,从而影响翻译质量。因此,基于现有理论对语料预处理方法进行总结研究,重点研究了蒙古文格处理对翻译结果的影响,目的是从蒙古文形态分析的特殊性入手来提高蒙古文-汉文统计机器翻译的质量。通过优化预处理方法,使机器翻译结果的BLEU得分相比基线系统1提高了3.22个点。
[1] NICOLAI G,KONDRAK G.Leveraging inflection tables for stemming and lemmatization[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin,Germany:Association for Computational Linguistics,2016:1138-1147. [2] NA S W.Mongolian word root,stem,suffix automatic segmentation system[J].Journal of Inner Mongolia University(Humanities and Social Sciences Edition),1997(2):53-57.(in Chinese) 那顺乌日图.蒙古文词根、词干、词尾的自动切分系统[J].内蒙古大学学报(人文社会科学版),1997(2):53-57. [3] SINGH J,GUPTA V.Text Stemming:Approaches,Applica-tions,and Challenges[J].ACM Computing Surveys(CSUR),2016,49(3):45. [4] WU J,HOU H X,BAO F L,et al.Template-based model for BiRNN Mongolian-Chinese machine translation[C]∥Procee-dings of TAAI 2015.2015. [5] HOU H X,LIU Q,NA S W,et al.Mongolian Word Segmentation Based on Statistical Language Model[J].Pattern Recognition and Artificial Intelligence,2009,2(1):108-112.(in Chinese) 侯宏旭,刘群,那顺乌日图,等.基于统计语言模型的蒙古文词切分[J].模式识别与人工智能,2009,2(1):108-112. [6] ZHAO W,HOU H X,CONG W,et al.Research on Conditional Random Fields Based Mongolian Word Segmentation[J].Journal of Chinese Information Processing,2010,4(5):31-35.(in Chinese) 赵伟,侯宏旭,从伟,等.基于条件随机场的蒙古文词切分研究[J].中文信息学报,2010,24(5):31-35. [7] MING Y.Researching of Mongolian Word Segmentation System Based On Dictionary,Rules and Language Model[D].Hohhot:Inner Mongolia University,2011.(in Chinese) 明玉.基于词典、规则与统计的蒙古文词切分系统的研究[D].呼和浩特:内蒙古大学,2011. [8] 申晓亭.少数民族文字拉丁转写的意义与方案[C]∥全国少数民族语言文字信息处理学术研讨会.2007. [9] XU J J,SUN X.Dependency-based gated recursive neural network for Chinese word segmentation[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin,Germany:Association for Computational Linguistics,2016:567-572. [10] ZHANG R,YASUDA K,SUMITA E.Improved statistical machine translation by multiple Chinese word segmentation[C]∥Proceedings of the Third Workshop on Statistical Machine Translation.Ohio:Association for Computational Linguistics,2008:216-223. [11] HUANG C N,ZHAO H.Chinese Word Segmentation:A Decade Review[J].Journal of Chinese Information Processing,2007(3):8-19.(in Chinese) 黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007(3):8-19. [12] 陈晓,靳光瑾,黄昌宁.基于字的分词方法的实验研究:第九届全国计算语言学学术会议[C]∥全国计算语言学学术会议.2007:52-57. [13] FENG G H.Review of Performance Evaluation of Text Classification [J].Journal of Intelligence,2011,0(8):66-70.(in Chinese) 奉国和.文本分类性能评价研究[J].情报杂志,2011,0(8):66-70. [14] WU J,HOU H X,LI J T,et al.Adapting Attention-Based Neural Network to Low-Resource Mongolian-Chinese Machine Translation[C]∥International Conference on Computer Processing of Oriental Languages.Kunming,China:Springer International Publishing,2016:470-480. [15] SENNRICH R,HADDOW B,BIRCH A.Neural machine translation of rare words with subword units[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin (Germany):Association for Computational Linguistics,2016:1715-1725. [16] LEE J,CHO K,HOFMANN T.Fully Character-Level NeuralMachine Translation without Explicit Segmentation[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin,Germany:Association for Computational Linguistics,2016:1693-1703. [17] PRABHU A,JOSHI A,SHRIVASTAVA M,et al.TowardsSub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text[J].ArXiv Preprint ArXiv:1611.00472,6. [18] OCH F J,NEY H.A systematic comparison of various statistical alignment models[J].Computational Linguistics,2003,29(1):19-51. [19] KOEHN P,HOANG H,BIRCH A,et al.Moses:Open sourcetoolkit for statistical machine translation[C]∥Proceedings of the Association for Computational Linguistics.Prague (Czech Republic):Association for Computational Linguistics,2007. [20] OCH F J.Minimum error rate training in statistical machinetranslation[C]∥Proceedings of the Association for Computational Linguistics.Sapporo,Japan:Association for Computatio-nal Linguistics,2003:440-447. [21] YANG N.Neural Network Learning for Statistical MachineTranslation[D].Hefei:University of Science and Technology of China,2014.(in Chinese) 杨南.基于神经网络学习的统计机器翻译研究[D].合肥:中国科学技术大学,2014. [22] KOEHN P,et al.BLEU:a method for automatic evaluation of machine translation[C]∥Proceedings of the 40th Annual Mee-ting on Association for Computational Linguistics.Philadelphia:Association for Computational Linguistics,2002:311-318. |
No related articles found! |
|