Abstract:
The advancement of computer vision and deep learning have delved deeply into the realm of
human pose detection, with applications spanning across diverse fields such as healthcare,
animation, and autonomous driving. 3D pose estimation plays a pivotal role in activity
recognition, yet it grapples with persistent issues stemming from uncertainties in body
proportions and occlusions. Despite the progress in computer vision and graphics, achieving
precise 3D points remains difficult, especially in dynamic environments or due to uncertain
body proportions, occlusion, and the loss of depth information. In this research, we propose a
framework to estimate the 3D pose of an individual using videos captured from multiple
cameras mounted at different angles. The focus shifts towards the domain of multi-view
geometry estimation to achieve a more accurate 3D pose estimation. This process encompasses
the estimation of 2D poses for each view and the subsequent reconstruction of 3D
representations through technique like triangulation. The framework uses a dataset ‘Fit 3D’ for
the evaluation of its approach. We used the metric MPJPE (Mean Per Joint Position Error) and
resulted in value of 14.4mm (algebraic triangulation with confidence). The proposed
methodology will facilitate the Physiotherapist, fitness advisor/trainers, and orthopedic
specialists in the analysis of the person from different angles using 3D pose estimation.