Abstract:
Supervised techniques for recognizing detected objects (faces, vehicles, animals, birds
etc) are inherently constrained (they can only recognize objects they are trained to
recognize) especially in real-time or live videos. This is because of the uncertainty
involved due to limited knowledge of future encounters. While current unsupervised
techniques do mitigate this issue up to some extent, they themselves are limited due to
the assumptions they make (either the number of clusters are known or enough samples
exist that represent the true underlying data distribution). In many real-life problems,
such as Computer Vision based TV Analytics, we have no prior knowledge available
about the entities appearing. Reducing these constraints, in terms of prior assumptions,
will allow real-time unsupervised recognition of detected objects. In this paper we
present Clustering Large Online Unrecognized Detections (CLOUD), a technique that
is unsupervised as well as dynamic (it makes no assumption about the number of classes).
CLOUD is dynamic enough that it can be applied to any detection problem, we apply
it on the problem of face detection for our paper. Face Detection is one of the toughest
problems in computer vision because of the attention to the details it requires. Also,
in live streams, new faces appear all the time, thus it is an adequately challenging
task. CLOUD introduces the concept of Dynamic Clustering (DC) which uses Dynamic
Database Population (DDP) for keeping a dictionary of reference faces. We run CLOUD
on live video coming from Pakistani news channels. Our method recognized 1000 entities
in 11 hours of video. It achieved a Cluster Purity (CP) of 90% which is comparable to
other unsupervised techniques.