Abstract:
Person re-identification (re-id) deals with matching images of the same person over multiple non-overlapping camera views. It is applicable in searching a specific individual over multiple cameras. Commonly task of re-id is broken down into three modules detection, tracking, and matching. Most of the techniques use manually annotated bounding boxes and only focus on matching between queries and candidates. Which is not suitable in a real-time environment where the annotation of object boundaries is not available. The target person needs to be identified from complete images which may contain so many distractors as well. To address the issue we investigated how the localization and matching of target person can be done without using prior annotation of bounding boxes. Our proposed method is based on end to end deep learning. It handles detection and re-id together. Under the supervision of sparse and unbalanced labels, a random sampling softmax loss is proposed to train the model effectively.
In this research, we first thoroughly analyzed the knowledge base of Re-id and revealed that most of the Re-id research carried with the supposition of cropped images. Mostly identification is based on hand-crafted feature extraction and matching, which does not impersonates real-time person recognition task. Specifically, in a surveillance system, where we have raw video frames and no prior information about object localization is available. So the proposed approach works well under these scenarios. The research provides an end-end implementation of person tracking across multiple cameras. The model is tested under diverse situations and resulted in a very high retrieval accuracy. The whole network is jointly optimized, using CIFM loss and fine-tuned to get better accuracy. Finally, a user interface is also designed to get a better visualization experience. For a specified suspect image, the system displays a ranked list of top matching images.