CRQBench: A Benchmark of Code Reasoning Questions

Dinella, Elizabeth; Chandra, Satish; Maniatis, Petros

Computer Science > Software Engineering

arXiv:2408.08453 (cs)

[Submitted on 15 Aug 2024]

Title:CRQBench: A Benchmark of Code Reasoning Questions

Authors:Elizabeth Dinella, Satish Chandra, Petros Maniatis

View PDF HTML (experimental)

Abstract:Large Language Models have demonstrated exceptional proficiency on coding tasks, but it is challenging to precisely evaluate their code reasoning ability. Existing benchmarks are insufficient as they are unrealistic and conflate semantic reasoning ability with performance on software engineering tasks. We introduce CRQBench, a benchmark of 100 C++ code reasoning questions and answers derived from contextualized code review comments. To curate CRQBench, we use an LLM assistant alongside human inspection, reducing manual effort. We conduct an evaluation of GPT-4 on CRQBench and find that it produces correct responses grounded in the given context for 65 of the 100 questions.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2408.08453 [cs.SE]
	(or arXiv:2408.08453v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2408.08453

Submission history

From: Elizabeth Dinella [view email]
[v1] Thu, 15 Aug 2024 23:30:47 UTC (12,294 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SE

< prev | next >

new | recent | 2024-08

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Software Engineering

Title:CRQBench: A Benchmark of Code Reasoning Questions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:CRQBench: A Benchmark of Code Reasoning Questions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators