Abstract:
The quantity and quality of remote sensing images have dramatically increased due to
the rapid growth of remote sensing technology, which greatly aids in the advancement
of remote sensing image interpretation. Remote sensing imagery interpretation is an
impossible task for a human to complete due to the sheer volume of data. Therefore,
a quick and precise method of image interpretation is required. Our research is aiming
to solve this problem. The techniques used in this research can be used to efficiently
recognize man-made objects in remote sensing images. To demonstrate our method,
we chose aircrafts. Aircrafts type recognition plays important role in many civil and
military applications. We proposed Vision Transformer[44] and carefully crafted aug mentation pipeline to recognize man-made objects in remote sensing imagery. We also
made Aircraft101, a challenging dataset to evaluate state-of-the-art models. Aircraft101
contains 1,752 images of 20 aircrafts type, varied in pose, illumination, weather, back ground, scale and resolution. We gave comparison of performance of state-of-the-art
models on MTARSI[43] and on Aircraft101 and also discuss why specific model per forms better for remote sensing interpretation task. In this research, we verbosely an alyze MTARSI[43] dataset and document its shortcomings with evidence. By using
our augmentation pipeline and Vision transformer we achieved benchmark classification
accuracy of 99.78% with DenseNet[27] and 99.63% with ViT-B/16 on MTARSI.