Abstract:
Regression testing involves re-execution of test suite whenever a software undergoes any
update. This cost is increased manifolds due to recent continuous integration practices
in software industry. There are many approaches proposed by researchers to reduce the
execution time of test suite. This thesis examines two variants of K-means clustering
algorithm for test suite reduction. The K-means algorithm has been used for grouping of various objects based on similar features and data organization. The suggested
algorithms for test suite reduction will cluster those test cases which are testing the
same requirements. For clustering, the Euclidean distance has been used to calculate
the distance between test cases. Once clustered, algorithm will select the one representative vector or test case from each cluster and add it to reduced test suite. We have
tested this algorithm on 43 versions of seven dierent NPM packages. Out of these, 20
package versions are giving test suite reduction for all alpha values 5, 10, 15, 20, 25,
30, 35, 40, 45, and 50, whereas other package versions are giving test suite reduction
for some alpha values. The experimental results have shown that both K-means and
K-means++ clustering algorithms produce approximately similar results for test suite
reduction with a slight dierence in fault detection loss. For some package versions, the
fault detection loss is slightly higher in K-means whereas for other package versions,
the fault detection loss is little bit higher in K-means++. The async package has the
highest test suite reduction which lies between 70% to 80% and fault detection loss lies
between 4% to 30% for alpha values of 5-50. Hence, we can state that these algorithms
can eectively reduce the size of test suite while having minimal fault detection capability loss. However, we cannot generalize the results because the algorithm still needs
to be executed on JavaScript programs with larger test suites. In the future, we plan
to implement this algorithm with other distance metrics such as Hamming distance and
Levenshtein Edit distance.