Abstract:
Video content is ubiquitous in the modern world and there is a growing need for au tomated methods to extract information from videos. Face-based person retrieval is a
particularly interesting task in this domain which involves the use of face recognition to
track the appearances of people in video data. It is useful in a variety of applications,
from video analytics and indexing to video surveillance and crowd analysis. Feature
representation and identity classification are two key components of a face recognition
system and largely determine its accuracy. However, existing representation and iden tification methods are ill-equipped for large-scale, unsupervised video face recognition.
Despite open challenges, development of novel methods has been slow while test scores
on most benchmark datasets have saturated.
The availability of good quality datasets is a prerequisite for any deep learning-based
face recognition system. Most face datasets are based on web images of celebrities and
do not represent the challenges of video face recognition. The few video face datasets
that do exist are curated from short-form video content such as movies and television
shows, and generally contain a small number of identities while also being limited to the
demographics of international celebrities. This research is focused on the development
of a large-scale dataset of face images extracted from videos, in order to renew interest in
and promote the development of face representation and identification models capable
of large-scale, unsupervised face recognition in videos.
We present TVFace, a large-scale dataset of face images extracted from public live
streams of international news channels. It consists of 22 subsets, one for each chan nel, containing a total of 2.6 million face images and 33 thousand identities. Identity
labeling is performed using a clustering-based, semi-automatic annotation framework
designed to facilitate manual annotation of large collections of face images. Each im age is also annotated for 6 facial attributes (mask, age, gender, ethnicity, expression and pose) using state-of-the-art face analysis models. The dataset can be used for the
evaluation of face representation and identity classification components in both image
and video domains, as well as for multiple tasks including face verification, identifica tion, and clustering. It effectively represents the challenges of the video domain, such as
variations in photometric properties and non-discriminatory facial attributes like pose
and expression, while maintaining a diverse demographic distribution. We also design a
hierarchical retrieval index for online clustering in order to demonstrate the effectiveness
of the proposed dataset in evaluating real-time person retrieval systems.