Abstract:
Normally, we use Softmax after the last fully connected layer in a
convolutional neural network to get the probability of the classes. With the number
of increasing classes, the accuracy for classification decreases. Also, Multiple
datasets cannot be jointly trained because they are not mutually exclusive. A joint
training algorithm that combines multiple datasets in a hierarchical structure and
uses hierarchical softmax as the output layer is proposed. We use this technique to
train the detection and classification datasets like COCO and ImageNet together.
We train YOLO on 9000 classes from the detection dataset of COCO and
classification dataset of ImageNet. Our trained model was able to predict classes
that were not in the training dataset. While testing on the 200 detection classes of
ImageNet, we were able to achieve 19 % mean Average Precision. Out of this 200
classes the model was trained only on 44 classes that were present in the COCO
detection dataset. We achieved 16 % mAP on the 156 classes that were never in
the training dataset. Apart from this, Our model was trained to predict over 9000
classes.