Journal of Supercomputing, ( ISI ), Volume (76), No (1), Year (2020-1) , Pages (602-635)

Title : ( Software defect prediction using over-sampling and feature extraction based on Mahalanobis distance )

Authors: mohammadmahdi Nshokoohi , SeyedMohammadAli MajidiAnvari , Abbas Rasoolzadegan ,

Access to full-text not allowed by authors

Citation: BibTeX | EndNote

Abstract

As the size of software projects becomes larger, Software Defect Prediction (SDP) will play a key role in allocating testing resources reasonably, reducing testing costs, and speeding up the development process. Most SDP methods have used machine learning techniques based on some software metrics such as Halstead and McCabe\\\'s cyclomatic. However, many of these metrics usually do not follow Gaussian distribution, and defect and non-defect classes have overlaps. In addition, in many of software defect datasets, the number of defective modules (minority class) are much less than non-defective modules (majority class). In this situation, the performance of machine learning methods is reduced dramatically. Therefore, we first need to create a balance between minority and majority classes and then transferring the samples into the new space in which the pair samples with the same class (must-link set) are near to each other as most as possible and pair samples with different classes (cannot-link) stay away as far as possible. To achieve the mentioned objectives, in this paper, we use Mahalanobis distance in two manners. First, the minority class is oversampled based on the Mahalanobis distance such that generated synthetic data are more diverse from other minority data, and minority class distribution is not changed significantly. Second, a feature extraction method based on Mahalanobis distance metric learning is used which try to minimize distances of sample pairs in must-links and maximize the distance of sample pairs in cannot-links. To demonstrate the effectiveness of the proposed method, we performed some experiments on 12 publicly available datasets which are collected NASA repositories and compare its result by some powerful previous methods. The performance is evaluated in F-measure, G-Mean, and Matthews Correlation Coefficient (MCC).

Keywords

Software Defect Prediction; Software Metrics; Mahalanobis distance; Oversampling; Feature extraction.
برای دانلود از شناسه و رمز عبور پرتال پویا استفاده کنید.

@article{paperid:1076479,
author = {Nshokoohi, Mohammadmahdi and MajidiAnvari, SeyedMohammadAli and Rasoolzadegan, Abbas},
title = {Software defect prediction using over-sampling and feature extraction based on Mahalanobis distance},
journal = {Journal of Supercomputing},
year = {2020},
volume = {76},
number = {1},
month = {January},
issn = {0920-8542},
pages = {602--635},
numpages = {33},
keywords = {Software Defect Prediction; Software Metrics; Mahalanobis distance; Oversampling; Feature extraction.},
}

[Download]

%0 Journal Article
%T Software defect prediction using over-sampling and feature extraction based on Mahalanobis distance
%A Nshokoohi, Mohammadmahdi
%A MajidiAnvari, SeyedMohammadAli
%A Rasoolzadegan, Abbas
%J Journal of Supercomputing
%@ 0920-8542
%D 2020

[Download]