Abstract:
Classifying microarray gene expression data is crucial due to its high-dimensional na
ture and its significant impact on disease diagnosis and personalized treatment strate
gies. Timely and accurate classification of gene expression data greatly influences treat
ment outcomes and patient survival rates. Traditionally, gene expression data analysis
involves various statistical methods. However, with the emergence of advanced ma
chine learning techniques, automated classification within these datasets becomes cru
cial. Present methodology typically involve SVM classifier with different kernel functions
to classify diverse gene expression profiles. Nonetheless, the varied characteristics within
gene expression data present notable classification challenges.
In our study, we introduce a comprehensive dataset comprising thousands of gene ex
pression profiles from Leukemia cancer. Our approach involves proposing an optimal
classification method by fine-tuning Support Vector Machine (SVM) parameters and
selecting the most appropriate kernel functions. We utilize both standard and refined
SVMs with various kernel functions, including linear, polynomial, radial basis func
tion (RBF), and sigmoid, alongside penalized SVM models using L1, Smoothly Clipped
Absolute Deviation (SCAD), and SCAD + L2 penalties to improve classification per
formance.
Notably, our innovative approach, when applied to refined SVM with linear and poly
nomial kernels, achieves superior performance, with the L1 norm exhibiting the best
classification accuracy among penalized models. This breakthrough marks a significant
advancement in gene expression data classification literature, highlighting the potential
of SVMs, particularly with linear and polynomial kernels combined with appropriate
penalty terms, for precise and efficient disease classification.