Abstract:
Diabetes is a chronic disease that poses a great challenge for health systems worldwide. On average, about 8.3% of people are diagnosed with diabetes around the world. In the United States alone, 25.8 million individuals had diabetes in 2011, and 79 million more were at a high risk to develop the disease.
The aim of this project is to reduce diagnosis time of a diabetic patient by giving them a probability of them having diabetes based on their data. For this, we have used machine learning techniques and statistical modelling. The project has focused on providing the most accurate results in diabetes prediction based on certain diagnostic measurements on a particular dataset obtained from Kaggle. For conducting literature review and understanding the healthcare topic, relevant papers about healthcare analytics were searched in popular databases such as google scholar and springer using specific keywords.
The most significant and obvious result of using such technology within the healthcare sectors is its positive results on costs and quick diagnosis. Because of reduced cost, electronic information is one of the main aspects that has a dominant impact on healthcare predictive analytics.
For the implementation process, we have used JetBrains Pycharm IDE for development in Python 3.6 and RStudio for statistical modelling in R 3.4. The python libraries which proved to be most useful for our experiments were numpy, pandas and sklearn.