Abstract:
In the era of advanced analytics and big data, ML algorithms are increasingly pivotal for
addressing complex classification problems across diverse domains. This study explores
and compares the performance of several ML algorithms for a multiclass classification
task using the UC Merced Land-Use Scene Classification dataset from Kaggle. This
dataset consists of satellite images categorized into 21 different land-use classes, with
a total of 8400 images (400 images per class). The primary objective of this study was
to identify the most effective ML model based on key performance metrics, including
precision, recall, F1-score, accuracy, and error rate. To achieve this, five distinct al
gorithms were evaluated: KNN, SVM-Linear,SVM-RBF,SVM-POLY Kernals and RF.
The models were assessed for their ability to classify images accurately into their re
spective land-use categories. The evaluation revealed that the RF model emerged as
the most effective algorithm, achieving the highest macro-average precision, recall, and
F1-score of 83%, along with the highest accuracy of 83% and the lowest error rate of
0.17. The SVM with the SVM-RBF Kernel also demonstrated strong performance,
with a macro-average precision of 83%, recall of 82%, F1-score of 82%, an accuracy
of 82%, and an error rate of 0.18. In contrast, KNN and SVM-Linear Kernel both
exhibited identical performance metrics, with an accuracy of 80% and an error rate
of 0.20, while the SVM-POLY Kernel showed slightly lower performance with an ac
curacy of 81% and an error rate of 0.19. These results underscore the effectiveness of
the RF model for multiclass classification tasks and highlight the SVM-RBF model as
a strong alternative. The study’s findings offer valuable insights for practitioners, em
phasizing that the choice of the optimal ML model depends on a careful consideration
of performance metrics and specific application requirements. Overall, the RF model
stand out as the top-performing algorithm for this multiclass classification problem,demonstrating the importance of balancing precision, recall, accuracy, and error rates
to achieve successful classification outcomes.