Abstract:
The last few decades have seen extensive research on the critical activity of quality assurance,
known as defect prediction, in the early phases of software development life
cycle. Premature revealing of defective modules in software development can assist the
development team in making efficient and effective use of the resources at hand to produce
high-grade software in a less amount of time. Until now, numerous academics have
created defect prediction models exploiting statistical and machine learning (ML) methods.
By identifying hidden patterns among software features, the ML methodology is a
useful technique for locating problematic modules. Three widely known NASA datasets
are utilized in this work to forecast software problems using a variety of ML classification
approaches. The projected approach in this thesis reflects the hybrid model, which is
designed using ensemble-based ML algorithms that have enabled faults to be predicted
in the software modules. Also, three datasets from NASA have been used to check the
models’ accuracy as a benchmark. The model suggests that the Adaboost classifier has
shown the best accuracy amongst other ensemble-based ML techniques like NB, RF, Xgboost,
beggingboost and catboost which produced 99.95% accuracy. The effectiveness
of the employed classification approaches is assessed using a variety of metrics which
include precision, recall, F-measure, accuracy and support.