Detection and Classification of Vehicles in Varying Complexity of Urban  Traffic Scenes using Vision and Deep Learning

Arif, Muhammad Umair

DSpace Home
→
E-Theses
→
PNEC
→
Electrical Engineering
→
PhD
→
View Item

Detection and Classification of Vehicles in Varying Complexity of Urban Traffic Scenes using Vision and Deep Learning

Arif, Muhammad Umair

URI: http://10.250.8.41:8080/xmlui/handle/123456789/39588

Date: 2022

Abstract:

In today's world, traffic monitoring and surveillance are becoming increasingly popular. Traffic data analysis relies heavily on computer vision and its tools. One of the challenging tasks in this domain is real-time detection and classification of moving vehicles. Varying image quality, size, occlusion, similar vehicle shape, camera angle, etc., are some of the primary problems in the classification phase. Additionally, issue of moving cast shadow makes it challenging to design robust foreground detection and identification algorithms. The absence of standardization of urban traffic data is noted as a major problem that must be addressed by the research community to proceed in identifying and removing the moving cast shadow detection correctly. This work systematically analyzes existing vision and deep learning models for multiple standard and custom datasets. Moreover, a variety of computer vision and deep learning-based algorithms have been explored and examined in this dissertation before settling on the state-of-the-art pipeline for vehicle detection and accurate classification. The pipeline incorporates the queue length estimation at a signal intersection using a comprehensive and complex urban dataset acquired locally. To kickstart the research, four different datasets with various challenging conditions have been acquired. These datasets include the NIPA dataset, Toll Plaza dataset, urban dataset (I, II, III), and university road dataset. For vehicle detection, several conventional (i.e., Blob statistics and Haar cascade method) and advanced techniques (MobileNet, ResNet, Inception, and different variants of YOLO) have been investigated on different datasets with a variety of challenging conditions. Shadow pre-processing has been investigated using different GAN-based and gamma correction based methods. Furthermore, both fine-grained and coarse classification problems have been examined using several simple and advanced classification algorithms including ANN, ResNet, MobileNet, and EfficientNet models. Also, fine-grained classification has been examined using one-shot learning and compared the fusion-based novel technique (Fused edge features (FE-CNN) for both coarse and fine-grained classification. Consideration of all these techniques paved the way toward the development of the proposed state-of-the-art and an end-to-end solution. The comprehensive and complex dataset (i.e., Karachi signal dataset) has been acquired, performed vehicle detection and tracking using binary YOLO and deepSORT algorithms and removed the shadow artifacts using the combination of gamma correction and the pre-trained ghost-free model. Further, fine-grained, and coarse vehicle classification was carried out using EfficientNet and FE-CNN, respectively. Finally, queue length was estimated using both classification methods at the signal intersection in a complex urban environment using the exact dimensions of the detected and predicted vehicles in a particular lane. Due to unavailability of data from roadside sensors, binary YOLO which forecasts the vehicle count as effectively as road sensors, is used as the benchmark queue length. The proposed end to end system is independent of camera calibration and road parameters and works as a complete solution for an outdoor, complex, shadow centered urban traffic scenario giving a queue length accuracy of more than 93%. In low traffic environment, binary YOLO, FE-CNN and EfficientNet has reported the average root mean square (RMSE) of 13.61, 6.79 and 1.22, respectively. However, the RMSE of 22.39, 15.38, 2.29 and 28.81, 4.69, 10.16 have been observed in the medium and dense traffic scenarios, respectively. Since binary YOLO only provides information on the total count of vehicles, it can be estimated to be equivalent to results of ground sensors. The RMSE error of Binary YOLO is highest among all queue length estimation algorithms which shows that queue length based on only vehicle count is inaccurate as compared to our approach. The proposed queue length estimation pipeline based on vehicle count and classification produced excellent queue estimates. The fine-grained classification based on EfficientNet outperforms other techniques with the lowest root mean square error in low and medium traffic scenarios, while course classification based on FE-CNN has the lowest error in dense traffic. The proposed system will help in reducing road accidents, identifying vehicles for security purposes, categorizing heavy traffic on urban roads, and adjusting traffic signal timings etc. It is believed that this research will lead the ground for the development of more reliable algorithms for use in real-world systems. As a future work, an in-depth investigation can be conducted to improve the developed algorithms by including optimal vehicle make, model and logo recognition, which would increase the proposed model's stability for diverse applications. Also, the suggested technique can also be expanded to incorporate the vehicle manufacturer (make) or integrating manufacturing year (model) hence increasing the system's dynamic nature. Additionally, the proposed approaches are easily adaptable to various situations in which the camera is not permanently installed, such as an onboard camera on a mobile surveillance vehicle.