dc.description.abstract |
Accurate and fastest object detection models are in high demand due to its wide variety of
applications in the fields of computer vision, such as pedestrian detection, video surveillance and
especially crowd counting applications. Automated crowd has been and continues to be a difficult
problem for autonomous visual surveillance for many years. In the relevant literature, a substantial
amount of research has been undertaken on the subject of crowd-counting and different
architectures have been proposed for accurate and timely detection of heads in a crowd. Most of
the approaches are based on regression, segmentation, image processing, machine learning
techniques, counters and sensor-based models. Although the advancements in infrastructure has
significantly improved the prediction accuracy but small heads are often missed by most of the
proposed architectures in the literature. Scale invariance and high miss detection rates for small
objects leads to the inaccurate results. The purpose of this research is to provide an accurate and
fastest detection model for crowd counting by focusing on human head detection in real time
scenarios acquired from publicly available datasets of Casablanca, Hollywood-Heads and Scuthead. In this study, we have tuned a yolov5 which is a deep convolutional neural networks (CNN)
based object detection architecture by improving the mAP, precision and recall. The loss factors
are reduced and accurate results are achieved by accurate tuning of hyper-parameters. Transfer
learning approach is used for fine-tuning the architecture. From the experimental results, it can be
seen that this yolov5 architecture showed significant improvements in small head detections in
crowded scenes as compared to the other baseline approaches such as that Faster R-CNN and
VGG-16 based SSD MultiBox Detector. In Faster R-CNN, features are extracted in the last layer
therefore image resolution is decreased and small objects are not detected while yolov5 perform
slicing of feature maps in the backbone region Therefore, small heads are detected accurately.
Another main contribution of our research is use of merge dataset which include every kind of
heads that is medium, large and small |
en_US |