StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Dou, Shihan; Liu, Yan; Jia, Haoxiang; Xiong, Limao; Zhou, Enyu; Shen, Wei; Shan, Junjie; Huang, Caishuang; Wang, Xiao; Fan, Xiaoran; Xi, Zhiheng; Zhou, Yuhao; Ji, Tao; Zheng, Rui; Zhang, Qi; Huang, Xuanjing; Gui, Tao

Computer Science > Software Engineering

arXiv:2402.01391v2 (cs)

[Submitted on 2 Feb 2024 (v1), last revised 5 Feb 2024 (this version, v2)]

Title:StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Authors:Shihan Dou, Yan Liu, Haoxiang Jia, Limao Xiong, Enyu Zhou, Wei Shen, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, Yuhao Zhou, Tao Ji, Rui Zheng, Qi Zhang, Xuanjing Huang, Tao Gui

View PDF

Abstract:The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation quality. However, the lengthy code generated by LLMs in response to complex human requirements makes RL exploration a challenge. Also, since the unit tests may not cover the complicated code, optimizing LLMs by using these unexecuted code snippets is ineffective. To tackle these challenges, we introduce StepCoder, a novel RL framework for code generation, consisting of two main components: CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks, while FGO only optimizes the model by masking the unexecuted code segments to provide Fine-Grained Optimization. In addition, we furthermore construct the APPS+ dataset for RL training, which is manually verified to ensure the correctness of unit tests. Experimental results show that our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks. Our dataset APPS+ and StepCoder are available online.

Comments:	13 pages, 5 figures
Subjects:	Software Engineering (cs.SE); Computation and Language (cs.CL)
Cite as:	arXiv:2402.01391 [cs.SE]
	(or arXiv:2402.01391v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2402.01391

Submission history

From: Shihan Dou [view email]
[v1] Fri, 2 Feb 2024 13:14:31 UTC (323 KB)
[v2] Mon, 5 Feb 2024 13:28:23 UTC (324 KB)

Computer Science > Software Engineering

Title:StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators