MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion

Z Gong, Y Guo, P Zhou, C Gao, Y Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Z Gong, Y Guo, P Zhou, C Gao, Y Wang, Z Xu
arXiv preprint arXiv:2212.09666, 2022arxiv.org
Code completion is a valuable topic in both academia and industry. Recently, large-scale
mono-programming-lingual (MonoPL) pre-training models have been proposed to boost the
performance of code completion. However, the code completion on low-resource
programming languages (PL) is difficult for the data-driven paradigm, while there are plenty
of developers using low-resource PLs. On the other hand, there are few studies exploring
the effects of multi-programming-lingual (MultiPL) pre-training for the code completion …
Code completion is a valuable topic in both academia and industry. Recently, large-scale mono-programming-lingual (MonoPL) pre-training models have been proposed to boost the performance of code completion. However, the code completion on low-resource programming languages (PL) is difficult for the data-driven paradigm, while there are plenty of developers using low-resource PLs. On the other hand, there are few studies exploring the effects of multi-programming-lingual (MultiPL) pre-training for the code completion, especially the impact on low-resource programming languages. To this end, we propose the MultiCoder to enhance the low-resource code completion via MultiPL pre-training and MultiPL Mixture-of-Experts (MoE) layers. We further propose a novel PL-level MoE routing strategy (PL-MoE) for improving the code completion on all PLs. Experimental results on CodeXGLUE and MultiCC demonstrate that 1) the proposed MultiCoder significantly outperforms the MonoPL baselines on low-resource programming languages, and 2) the PL-MoE module further boosts the performance on six programming languages. In addition, we analyze the effects of the proposed method in details and explore the effectiveness of our method in a variety of scenarios.
arxiv.org
Showing the best result for this search. See all results