Abstract:
Computer Vision depends on pattern recognition techniques to self-train
and understand visual data. While machine learning algorithms were previously used for computer vision applications, now deep learning methods have
evolved as a better solution for this domain. Deep learning counts on neural
networks and samples for problem solving. It self-learns by using labeled
data to recognize common patterns in the given data set. The extensive
availability of the data used for training computer vision algorithms has contributed in driving the growth of computer vision. Image Classification is a
fundamental task that attempts to comprehend an entire image as a whole.
The goal is to classify the image by assigning it to a specific label. Typically,
Image Classification refers to images in which only one object appears and
is analyzed. CNN-based models hold state-of-the-art performance in various computer vision tasks, including image classification. Deep CNNs are
well suited for large-scale supervised visual recognition tasks because of their
highly scalable training algorithm. Two main methods are used for image
classification via deep CNN namely Large Scale Visual Recognition and Fine
Grained Visual Recognition. The availability of highly advanced models from
competitions such as ImageNet has made it possible to explore fine grained
classification and other non-deep learning classifiers for classification.
This research presents a combination of models in hierarchical layers to first
distinguish between meta categories (currently Large Scale Recognition algorithms) and then go further in depth to classify into individual objects
(currently Fine grained Recognition algorithms). The aim is to develop an
approach in which initially the large scale visual recognition problem is solved
and then in the same pipeline fine grained visual recognition is also performed
in one go. This research presents a single solution to both the problems. The
evaluation of the model was done by submitting the predicted score on Kaggle. The pipeline requires annotation of the data in broad groups and then
another set of annotation within the broad groups like specie class. This way
the problem was broken down to solve it independently. The simulations
were performed on state of art deep learning techniques such as ResNet-18, ResNet-50 and ResNeXt-101. The results show visible improvement with
respect to single model approach. It is concluded that as far as the resource
requirements are concerned, the proposed methodology has high resource re quirements in terms of memory space and training time. However, it is the
trade-of between resource requirement and performance improvement that
needs to be decided.