Abstract:
Crop management extensively utilizes remote sensing data for predicting crop yield. Freely available data products (Landsat, sentinel) have been used extensively. This study explores the potential of remote sensing and machine learning for wheat yield estimation of the Faisalabad division of Punjab, by utilizing Landsat-8 surface reflectance data. Time series of vegetation indices as Normalized Difference Vegetation indices (NDVI) and Enhanced Vegetation Indices (EVI) for the years 2019-2020 and 2020- 2021 were extracted. Several machine learning models were tested and two models were selected for the final yield prediction after feature selection using correlation analysis. Random Forest Regression (RFR) and Decision Tree Regression (DTR) are the two models that were used for wheat yield prediction. Feature selection was critical in reducing input data to avoid uncertainty, and only important data was used as input to the models. The 8th time step was found to have a high correlation with yield, and data from this step was used for model input. For the two years separately, separate feature selections were made and meteorological variables of other time steps were found to be correlated with yield. Training and testing results and model accuracy were based on the Root mean square Error (RMSE) and Root Square. The results of the Decision Tree Regression (DTR) in training (RMSE = 0.062, R2 = 0.952 t/ha, RMSE = 0.062, R2 = 0.952 t/ha) and testing (RMSE = 0.150, R2 = 0.700 t/ha, RMSE = 0.120, R2= 0.799 t/ha) for both of the years shows that the model overfitted in the training phase. The results of Random Forest Regression (RFR) in training (RMSE = 0.076, R2 = 0.929 t/ha, RMSE = 0.075, R2= 0.930 t/ha) and testing (RMSE = 0.144, R2=0.725 t/ha, RMSE = 0.106, R2 = 0.842 t/ha) for both of the years. The finding suggests that the RFR models resist overfitting and have strong adaptability for the variables and wheat yield prediction. This study demonstrates the potential of remote sensing and machine learning in precision agriculture and provides on sights into the selection of relevant input data for accurate yield prediction.