Abstract:
Robust Clustering using Links (ROCK) is a hierarchical clustering technique which is used for clustering categorical data. It considers the neighbors of pair of points, and clusters points having a larger number of similar entities i.e. cluster on the basis of links. Two points are linked if they have similar items. However, if there is inherent latent ordinality in data then ROCK tends to fail and does not produce good results. Ordinality may be found in categorical data which is often hidden i.e. not known beforehand. In this study, we present a novel ROCK-based algorithm which can handle such latent ordinality and we show that the technique leads to improved results over traditional ROCK. We apply the technique to a problem from the domain of infant mortality and investigate the impact of the food taken by an expectant mother on infant mortality.
Perinatal Mortality, also known as perinatal death, is death of a neonate within 6 days (early neonatal mortality) or from 7 – 27 days of birth (late neonatal mortality). Food consumed by an expectant mother is said to have an impact on the pregnancy outcome apart from other factors. Using ROCK we cluster expectant mothers as per the food intake. As expected some food items have a greater impact on pregnancy outcome than others. When clustering categorical data such as the one used in this study it is difficult to estimate the inherent latent ordinality in the data i.e. what foods have more impact as compared to other items found in the data which as shown in this work affects the accuracy of the technique applied. To resolve the issue in the context of our problem we propose our novel technique ROCK for Latent Inherent Ordinality (ROCKLIO). The technique involves two stages clustering where the first stage is used to ascertain what food groups have a greater impact on clustering. In the second stage, a new link measure is derived that improves the clustering accuracy by applying weights on the basis of the significance of each food item. Better clustering results are obtained with the novel Robust Clustering using Links with latent Inherent Ordinality as compared to the simple ROCK.