Early Prediction of Diabetes Using a Stacking Ensemble of Tree-Based Classifiers
ID:99
View protection:Participant Only
Updated time:2025-12-23 13:10:49 Views:93
Online
Abstract
Early prediction of diabetes can significantly improve patient outcomes by enabling timely interventions. In this study, we propose a stacking ensemble model composed of four tree-based classifiers – Random Forest, XGBoost, LightGBM, and CatBoost – with a logistic regression meta-learner for the early prediction of diabetes. The model is trained and evaluated on the PIMA Indians Diabetes Dataset, using data preprocessing steps to handle missing values (zeros replaced with median imputation) and feature standardization. We perform an 80/20 stratified train-test split and tune the decision threshold. The stacking ensemble achieves superior performance compared to individual classifiers and prior ensemble approaches in literature. Key performance metrics include an accuracy and ROC-AUC of about 0.85 on the test set. These results improve upon the baseline non-ensemble methods (around 77% accuracy) and are competitive with state-of-the-art ensemble models such as AdaBoost and XGBoost. The proposed model and findings suggest that stacking heterogeneous tree-based learners is a promising approach for early diabetes detection.
Keywords
Diabetes Mellitus, Stacking Ensemble, Random Forest, XGBoost, LightGBM, CatBoost, Early Prediction, Machine Learning, Classification, ROC-AUC
Post comments