Evaluating gpt's programming capability through codewars' katas
Z Zhang, L Wen, S Zhang, D Chen, Y Jiang - International Conference on …, 2024 - Springer
Z Zhang, L Wen, S Zhang, D Chen, Y Jiang
International Conference on Knowledge Science, Engineering and Management, 2024•SpringerUnderstanding the capabilities and limitations of programming-oriented AI models is crucial.
This paper evaluates the programming proficiency of GPT-3.5 and GPT-4 using Codewars
coding problems of varying difficulty. The experiments reveal a distinct boundary at the 3kyu
level, beyond which these models struggle. This led to proposing a complexity measure that
includes problem difficulty and solution time. The research emphasizes the need for
validation and creative thinking in AI models to better emulate human problem-solving …
This paper evaluates the programming proficiency of GPT-3.5 and GPT-4 using Codewars
coding problems of varying difficulty. The experiments reveal a distinct boundary at the 3kyu
level, beyond which these models struggle. This led to proposing a complexity measure that
includes problem difficulty and solution time. The research emphasizes the need for
validation and creative thinking in AI models to better emulate human problem-solving …
Abstract
Understanding the capabilities and limitations of programming-oriented AI models is crucial. This paper evaluates the programming proficiency of GPT-3.5 and GPT-4 using Codewars coding problems of varying difficulty. The experiments reveal a distinct boundary at the 3kyu level, beyond which these models struggle. This led to proposing a complexity measure that includes problem difficulty and solution time. The research emphasizes the need for validation and creative thinking in AI models to better emulate human problem-solving. Future work aims to refine the complexity measure, enhance AI capabilities, and develop an objective programming problem difficulty measure. These insights are valuable for advancing AI programming and problem-solving abilities.
Springer
Showing the best result for this search. See all results