Diabetes is a leading disease in the world. With the seriousness of diabetes and its complexity in diagnosis, we aimed to produce a model to help with prediction of onset of diabetes. Three models, logistic regression, gradient boosting and random forest were performed and evaluated to predict the onset of diabetes. A dataset of size 768 that includes information about some indian population were used. the population are specific to indian women that are at least 21 years old and of Pima Indian Heritage. Methods of standardizing including Synthetic Minority Oversampling Technique (SMOTE) and hyperparameter tuning are performed.
Random forest performed the best with an accuracy score of 81.8%, followed by gradient boosting (78%), and followed by logistic regression (76%). Glucose, BMI and age are the top predictors for Diabetes according to random forest feature importance. Because of the limited dataset we used in this dataset, more future available datasets are hoped to improve the accuracy of the models and give more information about the onset of diabetes. Moreover, this dataset is very specific to some group, future datasets with information about broader groups (including more age, gender and race) might give more insights about this issue.
Select your language of interest to view the total content in your interested language
Annals of Medical and Health Sciences Research received 24805 citations as per google scholar report