Test Suite Reduction Using K-means Clustering

Saleem, Jabran

DSpace Home
→
E-Theses
→
SEECS
→
Computer Science
→
MS
→
View Item

Test Suite Reduction Using K-means Clustering

Saleem, Jabran

URI: http://10.250.8.41:8080/xmlui/handle/123456789/29845

Date: 2022

Abstract:

Regression testing involves re-execution of test suite whenever a software undergoes any update. This cost is increased manifolds due to recent continuous integration practices in software industry. There are many approaches proposed by researchers to reduce the execution time of test suite. This thesis examines two variants of K-means clustering algorithm for test suite reduction. The K-means algorithm has been used for grouping of various objects based on similar features and data organization. The suggested algorithms for test suite reduction will cluster those test cases which are testing the same requirements. For clustering, the Euclidean distance has been used to calculate the distance between test cases. Once clustered, algorithm will select the one representative vector or test case from each cluster and add it to reduced test suite. We have tested this algorithm on 43 versions of seven dierent NPM packages. Out of these, 20 package versions are giving test suite reduction for all alpha values 5, 10, 15, 20, 25, 30, 35, 40, 45, and 50, whereas other package versions are giving test suite reduction for some alpha values. The experimental results have shown that both K-means and K-means++ clustering algorithms produce approximately similar results for test suite reduction with a slight dierence in fault detection loss. For some package versions, the fault detection loss is slightly higher in K-means whereas for other package versions, the fault detection loss is little bit higher in K-means++. The async package has the highest test suite reduction which lies between 70% to 80% and fault detection loss lies between 4% to 30% for alpha values of 5-50. Hence, we can state that these algorithms can eectively reduce the size of test suite while having minimal fault detection capability loss. However, we cannot generalize the results because the algorithm still needs to be executed on JavaScript programs with larger test suites. In the future, we plan to implement this algorithm with other distance metrics such as Hamming distance and Levenshtein Edit distance.

Show full item record