Efficient Action Recognition Framework Via Deep Learning and  Features Optimization

Akbar, Muhammad Naeem

DSpace Home
→
E-Theses
→
CEME
→
Computer Engineering
→
PhD
→
View Item

Efficient Action Recognition Framework Via Deep Learning and Features Optimization

Akbar, Muhammad Naeem

URI: http://10.250.8.41:8080/xmlui/handle/123456789/45908

Date: 2024

Abstract:

The combination of spatiotemporal videos and essential features can improve the performance of human action recognition (HAR); however, the individual type of features usually degrades the performance due to similar actions and complex backgrounds. The deep convolutional neural network has improved performance in recent years for several computer vision applications due to its spatial information such as video surveillance. This dissertation proposes three techniques for human action recognition using deep learning. The first technique is called HybridHR-Net for action recognition in the video surveillance. Deep transfer learning is employed to utilize the pre-trained EfficientNet-b0 deep model. Bayesian optimization is employed for the tuning of hyper parameters of the fine-tuned deep model. Instead of fully connected layer features, average pooling layer is employed and performed activation for the feature extraction. Two feature selection techniques- an improved artificial bee colony and an entropy-based approach are employed for the selection of best features. Using a serial nature technique, the features that were selected are combined into a single vector, and then the results are categorized by machine learning classifiers. In the second technique, a new framework is proposed for accurate human action recognition (HAR) based on deep learning and an improved features optimization algorithm. From deep learning feature extraction to feature classification, the proposed framework includes several important steps. Before training fine-tuned deep learning models – MobileNet-V2 and Darknet53 – the original video frames are normalized. For feature extraction, pre-trained deep models are used, which are fused using the canonical correlation approach. Following that, an improved particle swarm optimization (IPSO)-based algorithm is used to select the best features. Following that, the selected features were used to classify actions using various classifiers. In the third technique, a human detection process is performed using correlation filtering and traditional features. The humans are recognized in static images through omega shapes. For this process, the correlation filters are linked with pre-processing algorithms to recognize a human in video imagery. Background extraction is performed to avoid extra details that can make recognition more complex. Moreover, an optimized correlation values have been calculated through a Hierarchal Particle Swarm Optimization (HPSO) algorithm for the final classification. iv Five publicly accessible datasets have been utilized for the experimental process of the first methodology and obtained notable accuracy of 97%, 98.7%, 100%, 99.7%, and 96.8%, respectively. For the second methodology, sex datasets such as KTH, UT Interaction, UCF Sports, Hollywood, IXMAS, and UCF YouTube are used and attained an accuracy of 98.3%, 98.9%, 99.8%, 99.6%, 98.6%, and 100%, respectively. The third method is evaluated on KTH dataset and obtained the improved detection accuracy. Additionally, a comparison of the proposed framework with contemporary methods is done to demonstrate the increase in accuracy.