Abstract:
The incorporation of Internet of Things (IoT) technology has considerably improved emotion detection, allowing for a real-time emotion detection through the usage of inertial sensors and wearable technologies. In order to identify human emotions from gait data, we proposed a Transformer-based deep learning model with an encoder-only architecture. A Time Series Transformer model consists of a multi-layer Transformer encoder, self-attention mechanisms, fully linked layer for classification, and an input projection layer to reduce dimensionality. Thus, this architecture enables this model to incorporate the contextual and time-related factors of human gait data. The proposed model is evaluated on a closed access dataset namely SEECS emotions dataset. The dataset contains six different emotions: anger, fear, sad, surprise, happy, and disgust. Emotions were classified as "cold" (disgust, fear, and sadness) and "hot" (surprise, happy, and anger) in key experiments, and performance was examined for each category. The model’s accuracy was 70.40% for cold emotions and 69.84% for hot emotions. We also carried out additional experiments to further understand the impacts of using both (6D) sensor data configuration, which includes acceleration and gyroscopic signals and limited (3D) sensor data configuration that retained only the acceleration signals or gyroscopic signals. These
experiments have demonstrated the flexibility of our model and the influence that varying dimensionality levels have on recognition ability. The findings demonstrate the effectiveness in identifying intricate emotional states and suggest potential applications in the Internet of Things environments that can detect emotions.