Abstract:
Visual recognition in aerial imagery plays an important role in a wide range of applications
such as surveillance, monitoring, detection and management of natural and manmade
disasters. Current advances in deep learning show promising results for computer
vision tasks of classification, detection, segmentation and tracking. The meta-learning
branch of deep learning seeks to train models that learn new concepts with few labeled
examples, to save data annotation cost and solve the problem of data scarcity. Recent
works show the superior performance of metric-learning approaches for meta-learning
among others. This research evaluates two state-of-the-art metric-learning methods,
namely Prototypical Networks and Relation Networks, in remote sensing imagery and
explores avenues to improve performance by utilizing efficient networks with different
depths for feature extraction and jointly training on multi-domain data. The performance
of the same efficient networks is also evaluated for object detection in satellite
imagery, to aid in the wise selection of feature extraction backbone for a meta-learning
object detector. Our results suggest that Prototypical Networks are faster to train
and more accurate than Relation Networks when the number of training classes are
limited. Furthermore, jointly training on natural and satellite imagery for few shot
classification is shown to slightly improve accuracy, given a suitable feature extraction
backbone. Finally, we conclude that MobileNet v2 might serve as the potential network
design to begin design space exploration of feature extraction backbones targeted for
accurate and efficient meta-learning as it outperforms its competitors in both the tasks
of object detection and few shot classification.
xiii