Abstract:
Semantic segmentation of aerial images is vital for Unmanned Aerial Vehicle (UAVs) applications, such as land cover mapping, surveillance, and identifying flood-affected areas for effective
natural disaster management and flood impact mitigation. Traditional CNN-based techniques
encounter significant challenges in retaining specific information from deeper layers. Moreover, existing transformer-based architectures often demand high computational resources or
produce single-scale, low-resolution features. To address these limitations, we proposed a novel
transformer-based model named SwinSegFormer that harnesses the strengths of SegFormer and
Swin Transformer (SwinT). Our model was trained on the FloodNet dataset and benchmark evaluations, focusing on challenging classes such as vehicles, pools, and flooded and non-flooded
roads, which are crucial to segment for effective disaster management. This potentially allows
our model to be utilized in first aid activities during floods. The proposed model achieved notable results with a validation mIoU of 71.99%, mDice of 82.86%, and mAcc of 82.69%. This
represents an 8-10% improvement compared to state-of-the-art methods.