dc.description.abstract |
Surveillance using aerial videos has emerged as a critical area of study with diverse
applications in domains such as military operations, law enforcement activities,
traffic monitoring, and disaster management. Accurate detection and recognition of
human actions play a pivotal role in aerial surveillance, enabling the identification of
potential threats and suspicious behavior. This thesis introduces a novel framework
for action recognition in aerial videos, employing the YOLO-Pose algorithm and
LSTM network. The proposed framework consists of two primary stages. Firstly, the
YOLO-Pose algorithm is utilized to extract 17 key points, representing the body
pose, from each video frame. Secondly, an LSTM network is employed to classify
human actions. To evaluate the framework, the Drone Action Dataset, comprising 13
distinct human action classes, is employed. The dataset is divided into three subsets
for training and testing purposes. To capture the temporal dynamics of human
actions, the extracted key points are normalized and segmented into 30 frame
chunks. Subsequently, the LSTM network processes these chunks as input,
generating a probability distribution over the 13 action classes. The evaluation of the
proposed framework on the Drone Action Dataset demonstrates its effectiveness,
achieving a notable accuracy of 80%. Furthermore, the experimental results establish
that the proposed framework surpasses existing state-of-the-art methods for action
recognition in aerial videos. Hence, the proposed framework presents a robust and
efficient solution for action recognition in aerial videos, particularly within
surveillance scenarios. By combining the YOLO-Pose algorithm and LSTM network,
it effectively captures both spatial and temporal information of human actions,
resulting in enhanced accuracy. The framework holds promising potential for
application across various domains, including military operations, law enforcement,
and disaster management. |
en_US |
dc.subject |
Action recognition, Aerial videos, Surveillance scenarios, YOLO-Pose algorithm, LSTM network, Drone Action Dataset, Human actions, Key points, Temporal dynamics. |
en_US |