Abstract:
Diabetes mellitus is a global health challenge, requiring early detection to prevent
severe complications. This study utilizes machine learning for diabetes diagnosis,
leveraging a dataset collected from the Pakistani population to ensure demographic
relevance. Features included invasive parameters (e.g., fasting blood glucose, blood
pressure) and non-invasive factors (e.g., age, gender, BMI, waist circumference). The
data was split into training (70%) and testing (30%) sets and evaluated using nine
classifiers, including Logistic Regression, Random Forest, XGBoost, and LightGBM.
Ensemble models, particularly XGBoost achieved superior performance, with testing
accuracy reaching 93%. This model demonstrated robustness in capturing complex
feature interactions without requiring extensive feature selection. Integration into a
mobile app and GUI further demonstrated the practical utility of these models,
allowing users to input health parameters and receive instant predictions.
This research highlights the importance of combining machine learning with regionspecific data for accurate and accessible diabetes prediction. It demonstrates the
potential of predictive modeling to complement traditional diagnostics and improve
early detection. Future work may focus on publicizing the mobile application and
additional data to enhance model performance.