An ensemble oversampling model for class imbalance problem in software defect prediction

Huda, MD Shamsul, Liu, Kevin, Abdelrazek, Mohamed, Ibrahim, Amani, Alyahya, Sultan, Al-Dossari, Hmood and Ahmad, Shafiq 2018, An ensemble oversampling model for class imbalance problem in software defect prediction, IEEE access, vol. 6, pp. 24184-24195, doi: 10.1109/ACCESS.2018.2817572.

Attached Files
Name Description MIMEType Size Downloads

Title An ensemble oversampling model for class imbalance problem in software defect prediction
Author(s) Huda, MD Shamsul
Liu, Kevin
Abdelrazek, MohamedORCID iD for Abdelrazek, Mohamed
Ibrahim, AmaniORCID iD for Ibrahim, Amani
Alyahya, Sultan
Al-Dossari, Hmood
Ahmad, Shafiq
Journal name IEEE access
Volume number 6
Start page 24184
End page 24195
Total pages 12
Publisher Institute of Electrical and Electronics Engineers
Place of publication Piscataway, N.J.
Publication date 2018
ISSN 2169-3536
Keyword(s) software quality and fault detection
imbalanced metric data
ensemble model of detection
highly accurate detection
science & technology
computer science, information systems
engineering, electrical & electronic
computer science
Summary Software systems are now ubiquitous and are used every day for automation purposes in personal and enterprise applications; they are also essential to many safety-critical and mission-critical systems, e.g., air traffic control systems, autonomous cars, and SCADA systems. With the availability of massive storage capabilities, high speed Internet, and the advent of Internet of Things devices, modern software systems are growing in both size and complexity. Maintaining a high quality of such complex systems while manually keeping the error rate at a minimum is a challenge. Therefore, automated detection of faulty components in a software system is important during software development and also post-delivery. Fault detection models usually needs to be trained on a labeled-balanced dataset with both faulty and non-faulty samples. Earlier work, e.g. Mohsin et al. (2016), showed that most real fault detection training dataset are imbalanced. Thereby, the trained model gets over-fitted and classifies faulty components as non-faulty components. The consequence of a high false negative rate is cumulative and results in generating more errors when using the model in other software systems -never seen before, which is very expensive. In this paper, we propose a soft ware defect prediction ensemble model which considers the class imbalance problem in real software datasets. We use different oversampling techniques to build an ensemble classifier that can reduce the effect of low minority samples in the defective data. The proposed approach is verified using PROMISE software engineering datasets. The results show that our ensemble oversampling technique can more greatly reduce the false negative rate compared to the standard classification techniques and identify the faulty components more accurately resulting in a less expensive detection system (lowering the rate of non-faulty predictions of faulty modules).
Language eng
DOI 10.1109/ACCESS.2018.2817572
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2018, IEEE
Persistent URL

Document type: Journal Article
Collection: School of Information Technology
Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 2 times in TR Web of Science
Scopus Citation Count Cited 1 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 63 Abstract Views, 1 File Downloads  -  Detailed Statistics
Created: Thu, 21 Jun 2018, 10:54:50 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact