Research Article
Evaluating Traditional Machine Learning Models for Predicting Diabetes Onset Using the Pima Indians Dataset
Author(s):
Faith Nassiwa* and Jiahui Zeng
Diabetes is a leading disease in the world. With the seriousness of diabetes and its complexity
in diagnosis, we aimed to produce a model to help with prediction of onset of diabetes. Three
models, logistic regression, gradient boosting and random forest were performed and evaluated
to predict the onset of diabetes. A dataset of size 768 that includes information about some indian
population were used. the population are specific to indian women that are at least 21 years
old and of Pima Indian Heritage. Methods of standardizing including Synthetic Minority Oversampling
Technique (SMOTE) and hyperparameter tuning are performed.
Random forest performed the best with an accuracy score of 81.8%, followed by gradient boosting
(78%), and followed by logistic regression (76%). Glucose, BMI and age are the top predictors for
Diabetes according to random forest.. Read More»
Select your language of interest to view the total content in your interested language
Annals of Medical and Health Sciences Research received 24805 citations as per google scholar report