Cross-Project Software Defect Prediction Based on SMOTE and Deep Canonical Correlation Analysis

Xin Fan; Shuqing Zhang; Kaisheng Wu; Wei Zheng; Yu Ge

doi:10.32604/cmc.2023.046187

Open Access icon Open Access

ARTICLE

Cross-Project Software Defect Prediction Based on SMOTE and Deep Canonical Correlation Analysis

Xin Fan^1,2, Shuqing Zhang^1,2,*, Kaisheng Wu^1,2, Wei Zheng^1,2, Yu Ge^1,2

1 School of Software, Nanchang Hangkong University, Nanchang, 330063, China
2 Software Testing and Evaluation Center, Nanchang Hangkong University, Nanchang, 330063, China

* Corresponding Author: Shuqing Zhang. Email: email

Computers, Materials & Continua 2024, 78(2), 1687-1711. https://doi.org/10.32604/cmc.2023.046187

Received 21 September 2023; Accepted 04 December 2023; Issue published 27 February 2024

Abstract

Cross-Project Defect Prediction (CPDP) is a method that utilizes historical data from other source projects to train predictive models for defect prediction in the target project. However, existing CPDP methods only consider linear correlations between features (indicators) of the source and target projects. These models are not capable of evaluating non-linear correlations between features when they exist, for example, when there are differences in data distributions between the source and target projects. As a result, the performance of such CPDP models is compromised. In this paper, this paper proposes a novel CPDP method based on Synthetic Minority Oversampling Technique (SMOTE) and Deep Canonical Correlation Analysis (DCCA), referred to as S-DCCA. Canonical Correlation Analysis (CCA) is employed to address the issue of non-linear correlations between features of the source and target projects. S-DCCA extends CCA by incorporating the MlpNet model for feature extraction from the dataset. The redundant features are then eliminated by maximizing the correlated feature subset using the CCA loss function. Finally, cross-project defect prediction is achieved through the application of the SMOTE data sampling technique. Area Under Curve (AUC) and F1 scores (F1) are used as evaluation metrics. This paper conducted experiments on 27 projects from four public datasets to validate the proposed method. The results demonstrate that, on average, our method outperforms all baseline approaches by at least 1.2% in AUC and 5.5% in F1 score. This indicates that the proposed method exhibits favorable performance characteristics.

Keywords

Cross-project defect prediction; deep canonical correlation analysis; feature similarity

Cite This Article

APA Style

Fan, X., Zhang, S., Wu, K., Zheng, W., Ge, Y. (2024). Cross-project software defect prediction based on SMOTE and deep canonical correlation analysis. Computers, Materials & Continua, 78(2), 1687–1711. https://doi.org/10.32604/cmc.2023.046187

Vancouver Style

Fan X, Zhang S, Wu K, Zheng W, Ge Y. Cross-project software defect prediction based on SMOTE and deep canonical correlation analysis. Comput Mater Contin. 2024;78(2):1687–1711. https://doi.org/10.32604/cmc.2023.046187

IEEE Style

X. Fan, S. Zhang, K. Wu, W. Zheng, and Y. Ge, “Cross-Project Software Defect Prediction Based on SMOTE and Deep Canonical Correlation Analysis,” Comput. Mater. Contin., vol. 78, no. 2, pp. 1687–1711, 2024. https://doi.org/10.32604/cmc.2023.046187

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Cross-Project Software Defect Prediction Based on SMOTE and Deep Canonical Correlation Analysis

Abstract

Keywords

Cite This Article

927

406

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link