Abstract:
Feature selection is the technique applied in data mining to extract relevant features for better
understanding of data. Many researchers have applied feature selection in supervised learning but
it becomes a challenging task for unsupervised learning due to the absence of class labels. The
concept of selecting features using attributes dependency based on rough set theory has recently
gained popularity. Unsupervised Incremental dependency classes (UIDC) is an algorithm which
calculates the dependency of attributes by eliminating the positive region. In this thesis, we have
proposed UIDC for the unsupervised datasets to calculate the dependency of attributes as we add,
delete or merge new records. The absence of decision attribute and class labels in unsupervised
datasets causes the problem of calculating attribute dependency where decision attribute is
involved in every step. UIDC has been applied successfully on the unsupervised datasets
generating positive results. The dependency formula is performed on the unlabeled datasets to
extract features and calculate attributes dependency. Unsupervised datasets from UCI and kaggel
are used where normalization and preprocessing techniques are applied for better performance.
Furthermore, parallel computing is applied to minimize execution time by almost 50%. Maximum
classification accuracy is achieved by comparing the results obtained with conventional methods.