Large Scale Fine-Grained Visual Recognition

Usman, Omer

DSpace Home
→
E-Theses
→
SEECS
→
Computer Science
→
MS
→
View Item

Large Scale Fine-Grained Visual Recognition

Usman, Omer

URI: http://10.250.8.41:8080/xmlui/handle/123456789/36951

Date: 2021

Abstract:

Computer Vision depends on pattern recognition techniques to self-train and understand visual data. While machine learning algorithms were previously used for computer vision applications, now deep learning methods have evolved as a better solution for this domain. Deep learning counts on neural networks and samples for problem solving. It self-learns by using labeled data to recognize common patterns in the given data set. The extensive availability of the data used for training computer vision algorithms has contributed in driving the growth of computer vision. Image Classification is a fundamental task that attempts to comprehend an entire image as a whole. The goal is to classify the image by assigning it to a specific label. Typically, Image Classification refers to images in which only one object appears and is analyzed. CNN-based models hold state-of-the-art performance in various computer vision tasks, including image classification. Deep CNNs are well suited for large-scale supervised visual recognition tasks because of their highly scalable training algorithm. Two main methods are used for image classification via deep CNN namely Large Scale Visual Recognition and Fine Grained Visual Recognition. The availability of highly advanced models from competitions such as ImageNet has made it possible to explore fine grained classification and other non-deep learning classifiers for classification. This research presents a combination of models in hierarchical layers to first distinguish between meta categories (currently Large Scale Recognition algorithms) and then go further in depth to classify into individual objects (currently Fine grained Recognition algorithms). The aim is to develop an approach in which initially the large scale visual recognition problem is solved and then in the same pipeline fine grained visual recognition is also performed in one go. This research presents a single solution to both the problems. The evaluation of the model was done by submitting the predicted score on Kaggle. The pipeline requires annotation of the data in broad groups and then another set of annotation within the broad groups like specie class. This way the problem was broken down to solve it independently. The simulations were performed on state of art deep learning techniques such as ResNet-18, ResNet-50 and ResNeXt-101. The results show visible improvement with respect to single model approach. It is concluded that as far as the resource requirements are concerned, the proposed methodology has high resource re quirements in terms of memory space and training time. However, it is the trade-of between resource requirement and performance improvement that needs to be decided.