dc.description.abstract |
3D Pose Estimation (Human joints detection in 3D) from a single 2D RGB image, without depth
information, is quite an ill-posed problem, in the sense that it does not have a unique solution
with respect to its 2D solution. Many techniques involving human-tailored feature extractors
have been proposed up till now but with no impressive results have been reported, especially for
in-the-wild human interaction scenarios.
Neural Networks, though introduced decades before, have gained much popularity quite recently
in every field because of the availability of huge data corpora and computing resources. 3D pose
estimation from a 2D RGB image seems to be an interesting challenge to be tackled with Neural
Networks.
A 50-layered Convolutional Neural Network based on Microsoft's Residual Network architecture
is proposed for the extraction of 2D Heat-maps (images showing probabilities at each pixel for
each joint) and 3D Location-maps (images showing probabilistic coordinates, relative to detected
pelvis, of each pixel for each joint). Afterwards, the location of joints are selected from Locationmaps where the 2D Heat-maps have given maximum probabilities for the joints. This output is
temporally filtered to take advantage of correlation between detected joints in current frame with
the detected joints in the previous frames. Then Skeleton-fitting is done on the output to bring
the coordinates back to camera coordinates from pelvis-relative coordinates by optimization of
the proposed objective function, which also stabilizes the output even further |
en_US |