Abstract:
In today's world, traffic monitoring and surveillance are becoming increasingly popular.
Traffic data analysis relies heavily on computer vision and its tools. One of the challenging tasks
in this domain is real-time detection and classification of moving vehicles. Varying image quality,
size, occlusion, similar vehicle shape, camera angle, etc., are some of the primary problems in the
classification phase. Additionally, issue of moving cast shadow makes it challenging to design
robust foreground detection and identification algorithms. The absence of standardization of urban
traffic data is noted as a major problem that must be addressed by the research community to
proceed in identifying and removing the moving cast shadow detection correctly. This work
systematically analyzes existing vision and deep learning models for multiple standard and custom
datasets. Moreover, a variety of computer vision and deep learning-based algorithms have been
explored and examined in this dissertation before settling on the state-of-the-art pipeline for
vehicle detection and accurate classification. The pipeline incorporates the queue length estimation
at a signal intersection using a comprehensive and complex urban dataset acquired locally.
To kickstart the research, four different datasets with various challenging conditions have been
acquired. These datasets include the NIPA dataset, Toll Plaza dataset, urban dataset (I, II, III), and
university road dataset. For vehicle detection, several conventional (i.e., Blob statistics and Haar
cascade method) and advanced techniques (MobileNet, ResNet, Inception, and different variants
of YOLO) have been investigated on different datasets with a variety of challenging conditions.
Shadow pre-processing has been investigated using different GAN-based and gamma correction based methods. Furthermore, both fine-grained and coarse classification problems have been
examined using several simple and advanced classification algorithms including ANN, ResNet,
MobileNet, and EfficientNet models. Also, fine-grained classification has been examined using
one-shot learning and compared the fusion-based novel technique (Fused edge features (FE-CNN)
for both coarse and fine-grained classification.
Consideration of all these techniques paved the way toward the development of the proposed
state-of-the-art and an end-to-end solution. The comprehensive and complex dataset (i.e., Karachi
signal dataset) has been acquired, performed vehicle detection and tracking using binary YOLO
and deepSORT algorithms and removed the shadow artifacts using the combination of gamma correction and the pre-trained ghost-free model. Further, fine-grained, and coarse vehicle
classification was carried out using EfficientNet and FE-CNN, respectively. Finally, queue length
was estimated using both classification methods at the signal intersection in a complex urban
environment using the exact dimensions of the detected and predicted vehicles in a particular lane.
Due to unavailability of data from roadside sensors, binary YOLO which forecasts the vehicle
count as effectively as road sensors, is used as the benchmark queue length. The proposed end to
end system is independent of camera calibration and road parameters and works as a complete
solution for an outdoor, complex, shadow centered urban traffic scenario giving a queue length
accuracy of more than 93%. In low traffic environment, binary YOLO, FE-CNN and EfficientNet
has reported the average root mean square (RMSE) of 13.61, 6.79 and 1.22, respectively. However,
the RMSE of 22.39, 15.38, 2.29 and 28.81, 4.69, 10.16 have been observed in the medium and
dense traffic scenarios, respectively. Since binary YOLO only provides information on the total
count of vehicles, it can be estimated to be equivalent to results of ground sensors. The RMSE
error of Binary YOLO is highest among all queue length estimation algorithms which shows that
queue length based on only vehicle count is inaccurate as compared to our approach. The proposed
queue length estimation pipeline based on vehicle count and classification produced excellent
queue estimates. The fine-grained classification based on EfficientNet outperforms other
techniques with the lowest root mean square error in low and medium traffic scenarios, while
course classification based on FE-CNN has the lowest error in dense traffic.
The proposed system will help in reducing road accidents, identifying vehicles for security
purposes, categorizing heavy traffic on urban roads, and adjusting traffic signal timings etc. It is
believed that this research will lead the ground for the development of more reliable algorithms
for use in real-world systems. As a future work, an in-depth investigation can be conducted to
improve the developed algorithms by including optimal vehicle make, model and logo recognition,
which would increase the proposed model's stability for diverse applications. Also, the suggested
technique can also be expanded to incorporate the vehicle manufacturer (make) or integrating
manufacturing year (model) hence increasing the system's dynamic nature. Additionally, the
proposed approaches are easily adaptable to various situations in which the camera is not permanently
installed, such as an onboard camera on a mobile surveillance vehicle.