Towards Large-scale Unsupervised Face Recognition in Videos

Khurshid, Atif

DSpace Home
→
E-Theses
→
SEECS
→
Computer Science
→
MS
→
View Item

dc.contributor.author	Khurshid, Atif
dc.date.accessioned	2023-05-02T10:53:51Z
dc.date.available	2023-05-02T10:53:51Z
dc.date.issued	2023
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/32816
dc.description.abstract	Video content is ubiquitous in the modern world and there is a growing need for au tomated methods to extract information from videos. Face-based person retrieval is a particularly interesting task in this domain which involves the use of face recognition to track the appearances of people in video data. It is useful in a variety of applications, from video analytics and indexing to video surveillance and crowd analysis. Feature representation and identity classification are two key components of a face recognition system and largely determine its accuracy. However, existing representation and iden tification methods are ill-equipped for large-scale, unsupervised video face recognition. Despite open challenges, development of novel methods has been slow while test scores on most benchmark datasets have saturated. The availability of good quality datasets is a prerequisite for any deep learning-based face recognition system. Most face datasets are based on web images of celebrities and do not represent the challenges of video face recognition. The few video face datasets that do exist are curated from short-form video content such as movies and television shows, and generally contain a small number of identities while also being limited to the demographics of international celebrities. This research is focused on the development of a large-scale dataset of face images extracted from videos, in order to renew interest in and promote the development of face representation and identification models capable of large-scale, unsupervised face recognition in videos. We present TVFace, a large-scale dataset of face images extracted from public live streams of international news channels. It consists of 22 subsets, one for each chan nel, containing a total of 2.6 million face images and 33 thousand identities. Identity labeling is performed using a clustering-based, semi-automatic annotation framework designed to facilitate manual annotation of large collections of face images. Each im age is also annotated for 6 facial attributes (mask, age, gender, ethnicity, expression and pose) using state-of-the-art face analysis models. The dataset can be used for the evaluation of face representation and identity classification components in both image and video domains, as well as for multiple tasks including face verification, identifica tion, and clustering. It effectively represents the challenges of the video domain, such as variations in photometric properties and non-discriminatory facial attributes like pose and expression, while maintaining a diverse demographic distribution. We also design a hierarchical retrieval index for online clustering in order to demonstrate the effectiveness of the proposed dataset in evaluating real-time person retrieval systems.	en_US
dc.description.sponsorship	Dr. Muhammad Moazam Fraz	en_US
dc.language.iso	en	en_US
dc.publisher	School of Electrical Engineering and Computer Sciences (SEECS) NUST	en_US
dc.title	Towards Large-scale Unsupervised Face Recognition in Videos	en_US
dc.type	Thesis	en_US