Do Names Echo Semantics? A Large-Scale Study of Identifiers Used in C++'s Named Casts

Petrescu, Constantin Cezar; Smith, Sam; Giavrimis, Rafail; Dash, Santanu Kumar

doi:10.1016/j.jss.2023.111693

Computer Science > Software Engineering

arXiv:2111.01577 (cs)

[Submitted on 2 Nov 2021 (v1), last revised 3 Apr 2023 (this version, v2)]

Title:Do Names Echo Semantics? A Large-Scale Study of Identifiers Used in C++'s Named Casts

Authors:Constantin Cezar Petrescu, Sam Smith, Rafail Giavrimis, Santanu Kumar Dash

View PDF

Abstract:Developers relax restrictions on a type to reuse methods with other types. While type casts are prevalent, in weakly typed languages such as C++, they are also extremely permissive. Assignments where a source expression is cast into a new type and assigned to a target variable of the new type, can lead to software bugs if performed without care. In this paper, we propose an information-theoretic approach to identify poor implementations of explicit cast operations. Our approach measures accord between the source expression and the target variable using conditional entropy. We collect casts from 34 components of the Chromium project, which collectively account for 27MLOC and random-uniformly sample this dataset to create a manually labelled dataset of 271 casts. Information-theoretic vetting of these 271 casts achieves a peak precision of 81% and a recall of 90%. We additionally present the findings of an in-depth investigation of notable explicit casts, two of which were fixed in recent releases of the Chromium project.

Comments:	The manuscript has 27 pages and it contains 4 Figures, 18 Listings and 4 Tables. The preprint has been accepted at Journal of Systems and Software from Elsevier
Subjects:	Software Engineering (cs.SE); Information Theory (cs.IT); Programming Languages (cs.PL)
Cite as:	arXiv:2111.01577 [cs.SE]
	(or arXiv:2111.01577v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2111.01577
Related DOI:	https://doi.org/10.1016/j.jss.2023.111693

Submission history

From: Constantin Cezar Petrescu [view email]
[v1] Tue, 2 Nov 2021 13:05:04 UTC (227 KB)
[v2] Mon, 3 Apr 2023 14:37:57 UTC (335 KB)

Computer Science > Software Engineering

Title:Do Names Echo Semantics? A Large-Scale Study of Identifiers Used in C++'s Named Casts

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Do Names Echo Semantics? A Large-Scale Study of Identifiers Used in C++'s Named Casts

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators