Abstract:
Forest conservation is essential to combat global warming, climate change, and threats to global
biodiversity. This necessitates innovative technological solutions for effective forest monitoring
and management. This study focuses on forest segmentation of 15 districts of the Khyber
Pakhtunkhwa (KPK) province in Pakistan using multi-spectral Landsat-8. The dataset comprises
images with 11 bands, including RGB.
This work presents an advanced approach to forest segmentation utilizing transformer-based
architectures. Transformers, known for their powerful feature extraction and attention mechanisms,
are leveraged to improve segmentation accuracy. They excel in handling long-range
dependencies and operate in parallel mode, which enhances global context modeling by processing
entire sequences simultaneously. In this study we investigated two transformer based
models, SegFormer and SegForest. SegForest model is inspired by the SegFormer architecture,
featuring an encoder-decoder structure. The encoder is akin to that of SegFormer, enabling
multiscale feature extraction without requiring positional encoding. The decoder employs an
advanced methodology, integrating multi-feature fusion and multi-scale multi-decoder modules
for enhanced performance. The key innovation of this work is the introduction of reduction
ratios in the encoder’s CNN layers, which optimize performance by selectively downsampling
feature maps to balance detail retention and capture finer details. This novel approach enhances
the model’s capability to process multiscale features more effectively.
We trained and tested our models on subsets of this dataset: RGB bands and 11 selected bands
to evaluate performance across different spectral inputs.Our transformer based models with pretrained
weights demonstrated superior segmentation accuracy compared to existing methods,
achieving a notable increase in accuracy and F1-score and contributed to a 4 percent improvement
in accuracy on our dataset, showcasing the effectiveness of our approach in handling diverse
spectral inputs.