NUST Institutional Repository

Software Defect Prediction using Ensemble Learning

Show simple item record

dc.contributor.author Shafique Ahmed
dc.date.accessioned 2021-01-28T05:53:22Z
dc.date.available 2021-01-28T05:53:22Z
dc.date.issued 2016
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/21967
dc.description Supervisor:Muhammad Usman Akram en_US
dc.description.abstract Extensive research has been carried out in the last decade regarding the improvement of software defect prediction methods, aiming at optimization goals, namely, cost-effectiveness, less effort employment and less time consumption, in order to achieve good quality control and ensure delivery of bug-free software packages to the end user. Several machine learning techniques were applied in efforts of gaining software defect prediction optimization. This research aims at demonstrating the positive aspects of data sampling, feature subset selection and ensemble learning model upon the outcome of defect prediction classification. Along with data sampling of defective datasets and feature subset selection and ensemble model algorithm is proposed to deliver robustness to both feature redundancy and data imbalance. We carefully combine variety of strong learning algorithms for ensemble learning models and using data sampling techniques with effective feature subset selection to report these issues and nullify their effects on the defect prediction classification performance. Forward and Backward features selection exposed that only few features promote to high area under the curve (AUC). On these tested datasets, Genetic forward selection method outpaced other feature selection techniques like correlation based feature selection and Info Gain Attribute selection. This recommends that taken features are extremely unbalanced. Yet, ensemble learners like the proposed algorithm and random forests, average probability ensemble are not as affected by meagre features as in the case of support vector machines (SVM). Also the proposed model combined with genetic forward selection achieved area under receiver operating curve (AUC) values of almost 1 for the NASA datasets. This research shows that software defect datasets must have well-balanced datasets for training. Also, features must be selected in a way that ensures optimized classification of defective components. Moreover, while dealing with above-mentioned data issues, along with the proposed model, resulted in exceptional performance leading to nearly perfect methods for quality control. en_US
dc.publisher CEME-NUST-National Univeristy of Science and Technology en_US
dc.subject Computer Engineering en_US
dc.title Software Defect Prediction using Ensemble Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [331]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account