Abstract:
Software Bug Prediction is an active research area and is being widely explored with the help of Machine Learning technique. The goal of bug prediction models is to identify potential software defects or bugs early in the development process, enabling developers to take preventive actions and improve software quality. Since bug prediction is now considered as an important measure of SDLC, there is need to have an efficient bug prediction model. Presently transfer learning, class imbalance and ensemble learning approaches are being researched much. In this research work an efficient model design is proposed and implemented. The proposed design caters the class imbalance issue of datasets as this is not much touched in the past. Class imbalance can affect the model accuracy by overfitting the model prediction results. The proposed design employ feature engineering technique which is used to add more domain information in the dataset for accurate prediction. Transfer learning is used to train and test the model on different datasets to analyze how much of the learning is passed to other dataset for cross project defect prediction; and ensemble method is utilized to explore the increase in performance upon combining multiple classifiers in a model. So, a model design is proposed which involve employing feature engineering, class imbalance and ensemble methods using machine learning technique for cross project defect prediction. Five NASA and four Promise datasets are used in the study for experimental analysis. Decision Tree (DT) and Random Forest (RF) are used as an individual base classifier. Three ensemble methods of bagging, boosting and stacking are used. The results shown that model attain the best accuracy with RF classifiers both as an individual and in ensemble methods. The model has highest accuracy of 84% with RF as an
v
individual classifier and also 84% with adaBoost in ensemble methods on NASA dataset. Whereas in PROMISE dataset, again RF have highest accuracy of 77% as an individual classifier and 79% with stacked ensemble method. Some other experiments are also conducted to evaluate buggy class recall score and it reveals that by using class imbalance, the recall of buggy class is high which indicates the model accuracy for prediction bugs in datasets.