dc.description.abstract |
This work presents a detailed study of majority-based clustering algorithms decision on three
different data sets of anti-microbial evaluation, the minimum inhibitory concentration of antibacterial,
antibacterial, and anti-fungal activity of chemical compounds against 04 bacteria
(E. Coli, P. Aeruginosa, S. Aureus, S.Pyogenes) and 02 Fungus (C. Albicans, As. Fumigatus).
Clustering is an unsupervised machine learning method used to divide the chemical
compounds on the bases of their similarity. In this thesis we applied the K-means clustering,
Gaussian mixture model (GMM), and Mixtures of multivariate t distribution on antibacterial
activity data sets. For an optimal number of clusters and to determine which clustering
algorithm performs best we used a variety of clustering validation indices (CVI) which are
within sum square (to be minimized), connectivity (to me minimized), silhouette width (to
be maximized), the Dunn index (to be maximized). On the bases of the majority score clustering
algorithm, we conclude that K-means and the mixture of multivariate t distribution
satisfy the maximum and the Gaussian mixture model satisfies the minimum cluster validation
indices. The K-means algorithm and mixture of multivariate t distribution give 3
optimal number of clusters in an anti-microbial evaluation of antibacterial activity data set
and 5 number of optimal clusters in minimum inhibitory concentration (MIC) of anti bacteria’s
data set. K-means, Mixtures of multivariate t distribution and Gaussian mixture model
give 3 optimal number of clusters in the antibacterial and anti-fungal activity data set. The
K-means clustering algorithm gives the best performance on the bases of a majority-based
decision. This study may help the pharmaceutical industry, alchemists as well as doctors in
the future. |
en_US |