Abstract:
In this modern era of science and technology, the use of software and computer-aided
programs has increased very rapidly. With the increase in use, the data and size of the
computer software’s are also increased. Due to which the collection of large amounts of
software testing data to support the software development and maintenance process has
become difficult. With the development of the software, there is a need to assure the quality
of the software. Software testing is the only solution to find the quality of the software and
there is a need to find the defects in the software before delivering it to the clients. Almost
50% of the projects failed due to low quality and poor software testing. So, based on this
problem we realized the need to use the latest data mining techniques for predicting defects in
software.
In this research study, we used Data Mining (DM) techniques to predict defects from the
software testing data. With the help of data mining techniques, we can improve the reliability
and quality of the software. First, we have identified some available software testing datasets
and selected data based on the parameters and requirements of the research study. For this
purpose, we have explored related studies in the Literature Review (LR) and identified some
defect prediction datasets & techniques. Based on the literature review, we have found
different defect prediction techniques and chose the best one for designing and implementing
research methodology. After data selection, we found the correlation between the different
parameters of the software testing dataset using the correlation analysis. Further, applied data
cleaning & transformation for preprocessed data. Processed data contains on the continuous
data, so we transformed data into discrete data while using clustering (grouping) techniques.
Then we implemented Apriori algorithm under Association Rule Mining (ARM) technique
for predicting defects in software testing data. Apriori algorithm provided the supports and
confidence in multiple iterations, and we got more accurate results. This proposed framework
is based on Market Basket Analysis (MBA) and found the most frequent defects while using
Association Rule Mining.