Abstract:
Recent advances in object detection focus on finding configurations and different training
criteria to achieve better mean Average Precision (mAP) or F1-score on various datasets.
Among the datasets considered for object detection is a subset in which a broad class
category such as cars, pedestrians or fish need to be detected for every given frame within
video sequences that are usually extracted in real-time from a static camera video feed.
In these datasets that consist of sequences, conventional detection techniques that are
one-stage such as YOLO and Retina Net or two-stage such as Faster-RCNN do not make
any use of sequential nature of frames and instead use each frame as a stand-alone input
image. In this work, these sequence based datasets are considered using conventional
techniques and with a modification that implements use of sequential nature of each
frame. The modifications are made by combining pre-existing independent techniques of
Optical Flow and Gaussian Mixture Models background subtraction that extract motion
and foreground information from a video sequence respectively. Extracted information
from these techniques is coupled with Retina Net and experimental results are considered
on Pedestrian Detection and Fish Detection showing that the use of such modifications
improves detection in both datasets.