dc.description.abstract |
In Computer vision, object detection and classification are active fields of research. Applications of object detection and classification includes a diverse range of fields such as surveillance, autonomous cars, robotic vision, search and rescue, driver assistance systems and military applications. Many intelligent systems are built by researchers to achieve the accuracy of human perception but could not quite achieve it yet. In the last couple of decades, Convolution Neural Network (CNN) emerged as the most active field of research. There are a number of applications of CNN, and its architectures are used for the improvement of accuracy and efficiency in various fields. In this research, we aim to use CNN in order to generate fusion of visible and thermal camera images to detect persons present in those images for a reliable surveillance application. There are various kinds of image fusion methods to achieve multi-sensor, multi-modal, multi-focus and multi-view image fusion. Our proposed methodology includes Encoder-Decoder architecture for fusion of visible and thermal images, ResNet-152 architecture for classification of images. KAIST multi-spectral dataset consisting of 95,000 visible and thermal images is used for training of CNNs. During experimentation, it is observed that fused architecture outperforms individual visible and thermal based architectures, where fused architecture gives 99.2% accuracy while visible gives 99.01% and thermal gives 98.98% accuracy. Images obtained from ResNet-152 are then fed into Mask-RCNN for localization of persons. Mask-RCNN uses ResNet-101 architecture for localization of objects. From the results it can be clearly seen that Fused model for object localization outperforms the Visible model and gives promising results for person detection for surveillance purposes. Our proposed localization module gives a miss rate of 5.25%, which is 5 percent better than previous best techniques proposed. |
en_US |