Abstract:
Object classification is essential to the reliability and security of environmental
perception in autonomous driving conditions, especially under poor weather conditions
such as rain, fog, or snow, which can significantly reduce the accuracy of object
classification. Although many methods have been suggested to overcome these
challenges, state-of-the-art models tend to fail under changing weather conditions. Such
inconsistency is a concern regarding the operational safety of autonomous vehicles
under real-world conditions. Additionally, we need computationally light and efficient
models to handle hardware constraints commonly encountered in real-world
deployments where real-time computation is of the highest priority. To overcome these
limitations, we propose a state-of-the-art Vision Transformer-based method that is
carefully designed to be computationally light and efficient so that it can remain
functional and reliable under a wide range of typical and extreme weather conditions.
The strength of our method is in the novel application of object detection features of
the YOLOv8 algorithm. These features are used seamlessly in a custom Vision
Transformer model optimized to handle the complexity of object classification tasks.
To train a robust model, we use the BDD100K dataset, a rich dataset of thousands of
labeled images, divided into ten different classes. This dataset offers extensive coverage
of various driving conditions and environmental conditions, offering a robust basis for
training our model. The model is trained on 70% of the training set, validated on 10%
of the validation set, and tested on 20% of the test set with an accuracy of 91.47%.
Based on the same data, we compared our model with reference models. It outperforms
the ViT models (B-16:81.61%, B-32: 80.41%, L-16: 78.14%, L-32: 81.07%, H-14:
82.46%). VGG16 was 86.75% accurate, RESNET-101 was 84.23% accurate,
vi
INCEPTION-V3 was 85.16% accurate, and XCEPTION was 87.23% accurate. We also
tested the proposed model on four more datasets: ACDC, CADC, Cityscapes, and
ONCE. On ACDC dataset, the proposed model was 88.37% accurate, then 89.89%
accuracy was achieved on CADC, 87.00% accuracy was achieved on Cityscapes, and
89.20% accuracy was achieved on ONCE dataset. Our new framework is strong
enough to recognize objects in poor and normal weather conditions. It is light enough
in architecture that it can be used in real-world applications such as security and
surveillance.