Efficient Classification of Motion in Video Data by Using Deep Neural Networks

Irfan, Muntaha

DSpace Home
→
E-Theses
→
CEME
→
Computer Engineering
→
MS
→
View Item

Efficient Classification of Motion in Video Data by Using Deep Neural Networks

Irfan, Muntaha

URI: http://10.250.8.41:8080/xmlui/handle/123456789/35623

Date: 2021

Abstract:

Video has become more popular in many applications in recent years due to increased storage capacity, more advanced network architectures, as well as easy access to digital cameras, especially in mobile phones. Classification of the type of motion in a video sequence is an area targeted by many researchers for the purpose of traffic control, video scene classification, event prediction, sport analysis, management of web videos etc. There are several conventional and unconventional techniques for motion classification in videos but due to the advent of sophisticated algorithms and high computational capabilities deep learning architectures are utilized for almost every image/video processing task including motion classification. Deep learning methodology is more reliable and effective than other approaches. Training a deep learning architecture for motion classification requires that all of the frames (pixel by pixel) are fed to the network along with their corresponding label and once the network learns the classification task, we can use it for inference purpose. However, this method requires a lot of memory and computational resources as large amount of data (all the frames in a video) needs to be processed by the architecture. We aim to reduce the amount of data to be processed by the deep learning architecture for motion classification task this subsequently results in low memory requirements and reduced computational complexity. At the same time, we strive for maintaining the classification accuracy. A video is a sequence of individual frames hence there exists a lot of temporal redundancy between consecutive frames. This redundancy can be exploited by traditional motion estimation which gives us awareness about the motion information in a video sequence. If instead of inputting the standard video frames to the deep learning architectures, we feed them the motion information so that our architectures have to process much less amount of information for the motion classification task. In our work the motion information in a video sequence is retrieved by using the three-step search which is a block matching algorithm. This algorithm gives us the motion vectors which contain the motion information in a video sequence and hence we train our network on these motion vectors instead of the standard frames to achieve motion classification task. Experimental results show that by employing our proposed method the motion classification task can be carried out by processing much less amount of information while maintain good accuracies