Abstract:
Accurately estimating crop emergence prior to harvesting is increasingly critical for ensuring the long-term sustainability of natural resources. This process serves multiple purposes, including yield estimation, seed quality prediction, identification of regions prone to yield losses, and formulation of effective agricultural plans. By maximizing crop population within the constraints of limited land and resources, crop emergence estimation contributes to the sustainable utilization of these valuable resources. However, existing plant counting frameworks often require extensive offline image processing using licensed software to generate orthomosaics through the multiview stereo, resulting in significant computational demands. To address this challenge, this study proposes a comprehensive plant counting framework that directly estimates plant counts from aerial images. The framework comprises three essential modules: overlap detection, plant detection, and plant counting. The overlap detection module eliminates the need for computationally intensive orthomosaic generation by utilizing only visual cues to mask overlapping areas, thereby preventing duplicate plant counting. Three distinct methods are evaluated as core modules to identify an optimal generalized solution for plant counting, considering both time complexity and accuracy. The first method employs semantic segmentation with U-NET after overlap detection for plant detection followed by counting connected pixels. In the second method, object detection using YOLOv7 is utilized for plant detection after overlap removal. Finally, the third method introduces a real-time plant counting framework based on multiple object tracking, employing YOLOv7 for object detection and SORT for object tracking as a replacement for the overlap detection module. The proposed algorithm is evaluated using high-resolution aerial data collected from two separate Tobacco fields near Peshawar, Pakistan. The first and second methods achieve average F1 scores of 0.947 and 0.9667, respectively. Notably, the third method exhibits promising potential for real-time applicability, achieving an average F1 score of 0.967.