dc.description.sponsorship |
This thesis addresses the problem of visual indoor place recognition (e.g., in an office setting, automatically recognizing different places, such as offices, corridor, wash room, etc.). The potential applications include robot navigation, augmented reality, and image retrieval, etc. However, the task is highly challenging due to the large appearance variations in such dynamic setups (e.g., view-point, occlusion, illumination, scale, etc.). While local feature based methods (e.g., bag-of-features) have been promising, they are still limited in their capability to tackle severe visual variations. Recently, Convolutional Neural Network (CNN) has emerged as a powerful learning mechanism, able to learn higher-level deep features when provided with a relatively large amount of labeled training data. Such networks have shown state-of-the-art object and scene recognition results on the ImageNet and Places dataset. Here, we exploit the generic nature of CNN features by employing the pre-trained CNNs (on objects and scenes) for deep feature extraction on the challenging COLD dataset. We demonstrate that these off-the-shelf deep features when combined with a simple linear SVM classifier, outperform their bag-of-features counterpart. Moreover, a simple combination scheme, combining the local bag-of-features and higher-level deep CNN features, highlights their complementary nature. We benchmark our results with two other methods, and present superior results on the COLD dataset. |
en_US |