Abstract:
This research presents a distinct spatio-temporal deep learning model for fall detection
by combining two spatio-temporal models. First is a skeletal model that extracts human
pose landmarks using Mediapipe pose estimation and feds it to one dimensional convo lutional neural networks (1DCNNs) and gated recurrent units (GRUs) that withdraws
the spatial and temporal features in the skeletal data. Second is video based model, it
take the video data directly as input and with Three dimensional convolutional neural
networks (3DCNN) learns the spatio-temporal features in the video data. Output of
the two models are concatenated passed through a dense network of classification and a
threshold parameter which results in a binary output. Preprocessing of input data for
both the models is different but same size of windowing and stride is used in order to
sync the them. The two models are also tested separately with different classification
networks and threshold at the end resulting a binary category. Solely the skeletal model
is trained with windowing and stride while the video based model is trained without
it. The skeletal model, video based model and combined of these two the hybrid model
are tested on the UP-Fall Detection Dataset and procure 98.36%, 99.01% and 99.66%
test accuracy respectively. We have used the vision part of the dataset. It has five type
of fall and six type of activities of daily livings (ADLs) but we have consider them two
classes, one fall and other non-fall for this project. Our model shows finer results than
the recent presented sate-of-the-art models.