Abstract:
Recent advancements in technologies have enabled a modern and flexible
mode of learning i.e., e-learning. Several e-learning models have been pro posed. However, monitoring emotional engagement levels using learner’s
expressions during e-learning is still a challenging area of research. Hence in
this work, we propose a tri-emotion engagement level detection model that
first captures learners’ audio-visual data provided by the ubiquitous hard ware built into every laptop computer i.e., webcams and microphones. Com puter Vision (CV), Machine Learning (ML), and Deep Learning (DL) tech niques are used by our model to analyze expressions and recognize learner
emotions i.e., facial, upper-body gesture, and speech emotions. Information
about the learner’s emotions is fused for efficient engagement levels (high,
medium, low) detection. We also proposed a gesture emotional dataset for
our upper-body gesture emotion analysis. To validate our contribution, we
tested our model’s accuracy on a multi-modal emotion dataset and also eval uated it on an e-learning dataset collected during the COVID-19 Pandemic.
Our model can further be integrated with any learning management sys tem (LMS) to expand its usability. It can also assist a teacher to judge
the learner’s engagement. It is our belief that our work has paved a new
direction for engagement-level detection in e-learning scenarios.