Abstract:
DNA Microarray technology is a valuable advancement in medical field but it gives birth
to many challenges like curse of dimensionality, storage and computational requirements.
Feature Selection is one way to handle these issues. To overcome the issues and challenges
associated with microarray cancer dataset and not to compromise over relevancy, optimality and
to improve the performance of metaheuristic Genetic Algorithm based wrappers, in this paper we
have proposed, a multiple filters and GA wrapper based hybrid feature selection approach (MFGARF) that incorporates Random forest as fitness evaluator of features. The proposed hybrid
approach MF-GARF is comprised of three phases relevancy block; containing information
theory based filters Information Gain, Gain Ratio and Gini Index, responsible for ensuring
relevancy and removal of irrelevant and noisy features. Second phase is Redundancy block;
incorporating Pearson Correlation statistics to remove redundancy among features, and then final
phase Optimization Block; containing Genetic Algorithm wrapper with Random Forest as fitness
evaluator, responsible for generating an optimal feature subset with high predictive power.
Random Forest, kNN, Naïve Bayes and SVM within a 10-fold cross validation setup is used to
calculate the classification accuracy of selected optimal feature subset. Experiments are carried
out on 7 publically available benchmark binary and multiclass Microarray gene expression
cancer datasets and the proposed algorithm has achieved good accuracy with minimal selected
features for all datasets. The thorough comparison with other state of the art GA based and other
metaheuristic hybrid techniques validates the effectiveness of our proposed approach in terms of
features count and classification accuracy.