Abstract:
Vehicle re-identification is very useful in intelligent traffic monitoring systems.
Its application is not just limited to vehicle monitoring or surveillance but having
an efficient vehicle re-identification procedure allows a system to accurately and
timely detect/track a vehicle, which can play an important part in doing forensic
analysis as well. The procedure that will be based on deep neural network where
a random camera input of vehicle ID will be given to the system and the system
will learn to distinguish between different vehicles. Most of the current
algorithms solve this problem in fully-supervised manner that require large
number of labeled training data. However, it is almost impossible to get large
labeled dataset due to high cost. Besides this, in practical scenarios, testing data
contains unseen vehicle images on which model is not trained. So, a more robust
model is required to handle unseen data. Zero Shot Vehicle Re-Identification, an
unsupervised model is proposed to handle unseen data to handle real time data.
Two consistencies are proposed to work the model on unseen data, cross view
support consistency (CVSC) and cross view projection consistency (CVPC).
Let’s suppose we have vehicle images of two cameras Ca and Cb. In spite of
images important viewpoints distortion and object occlusion, it can be said that
visual appearance of images from Ca to Cb will face same illumination changes
and blur variation. So, vehicle image from camera Ca can be denoted with images
from Cb and vehicle image in camera Cb can be denoted with images in camera
Ca. Cross view support consistency says that one image can be represented by
other images by sparse coding. So, representation of probe and gallery images are
selected and those gallery images are selected whose representatives have
maximum overlap with gallery images representatives. The idea behind cross
view projection consistency is that probe and gallery image of same vehicle
should have more common neighborhoods than probe and gallery image of
different vehicles. The neighborhood of the vehicle images are identified by
vii
calculating Euclidean distance between gallery and probe image. KNN of gallery
and probe images are selected and the gallery images who have more overlapping
neighborhoods with neighborhoods of probe image have stronger projection
consistency with the probe image. The neighborhoods of image of camera Ca is
directly calculated by taking Euclidean distance, but for neighborhoods of images
of Cb, first images of Cb and basic reference subset in Ca is projected to virtual
camera Cv then distance is calculated by the learnt metric.