Predictive Analytics for Early Disease Detection Using Machine Learning: A Multi-Model Ensemble Approach with SHAP Explainability

Sanjivani Sanjay MeshramStudent, Department of Computer Science and Engineering, Shri Sai College of Engineering and Technology (SSCET), DBATU University, Bhadrawati, Chandrapur, Maharashtra, IndiaTrushna Shankar SandrawarStudent, Department of Computer Science and Engineering, Shri Sai College of Engineering and Technology (SSCET), DBATU University, Bhadrawati, Chandrapur, Maharashtra, IndiaPriya Shamrao TajneStudent, Department of Computer Science and Engineering, Shri Sai College of Engineering and Technology (SSCET), DBATU University, Bhadrawati, Chandrapur, Maharashtra, IndiaVaishnavi Sonu SatimeshramStudent, Department of Computer Science and Engineering, Shri Sai College of Engineering and Technology (SSCET), DBATU University, Bhadrawati, Chandrapur, Maharashtra, IndiaSuraj S. BankarAssistant Professor, Department of Computer Science and Engineering, Shri Sai College of Engineering and Technology (SSCET), DBATU University, Bhadrawati, Chandrapur, Maharashtra, India

Vol 10 No 5 (2026): Volume 10, Issue 5, May 2026 | Pages: 29-39

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 05-05-2026

doi Logo doi.org/10.47001/IRJIET/2026.105005

Abstract

Non-communicable diseases (NCDs) — including diabetes mellitus, cardiovascular disease, breast cancer, liver disease, chronic kidney disease, and skin malignancies — account for 74% of global mortality annually (WHO, 2023). Early detection is the single most effective intervention for improving survival rates and reducing treatment burden; yet conventional diagnostic pathways rely predominantly on symptomatic presentation, which delays detection to advanced disease stages where treatment efficacy is substantially diminished. Predictive analytics powered by machine learning (ML) offers a transformative alternative: by analysing patterns in clinical, biochemical, imaging, and genetic data, ML models can identify individuals at elevated disease risk months or years before clinical symptoms manifest. This paper presents a comprehensive predictive analytics system for early multi-disease detection developed at Shri Sai College of Engineering and Technology (SSCET), DBATU University, Chandrapur. The system implements a multi-model ML pipeline incorporating data preprocessing (KNN imputation, SMOTE oversampling, StandardScaler normalisation), classical ML models (Random Forest, XGBoost, SVM, Logistic Regression), deep learning models (CNN for medical imaging, LSTM for temporal EHR sequences), and a stacked ensemble meta-learner combining predictions for optimal accuracy. Evaluation across six benchmark healthcare datasets — Pima Indians Diabetes, Cleveland Heart Disease, Wisconsin Breast Cancer, Indian Liver Patient, Chronic Kidney Disease, and HAM10000 Skin Lesion — achieved accuracy values ranging from 88.4% (liver disease) to 97.5% (chronic kidney disease), with the proposed ensemble achieving 96.8% overall accuracy and AUC-ROC of 0.98. SHAP (SHapley Additive exPlanations) explainability analysis provides clinically interpretable feature importance rankings aligned with established biomedical knowledge, addressing the 'black box' critique of ML in healthcare. A web-based clinical dashboard enables risk score visualisation and model explanation for non-technical medical practitioners. The results establish that a multi-model ensemble approach, trained on publicly available datasets without specialised hardware, can deliver clinically relevant early disease detection performance comparable to expert clinical judgment.

Keywords

Predictive Analytics; Early Disease Detection; Machine Learning; Random Forest; XGBoost; SVM; Deep Learning; CNN; LSTM; Ensemble Learning; SMOTE; SHAP Explainability; Diabetes; Cardiovascular Disease; Breast Cancer; Chronic Kidney Disease; Healthcare AI; EHR; SSCET; DBATU University


Citation of this Article

Sanjivani Sanjay Meshram, Trushna Shankar Sandrawar, Priya Shamrao Tajne, Vaishnavi Sonu Satimeshram, & Suraj S. Bankar. (2026). Predictive Analytics for Early Disease Detection Using Machine Learning: A Multi-Model Ensemble Approach with SHAP Explainability. International Research Journal of Innovations in Engineering and Technology - IRJIET, 10(5), 29-39. Article DOI https://doi.org/10.47001/IRJIET/2026.105005

References
  1. World Health Organization, "Noncommunicable Diseases Progress Monitor 2023," WHO, Geneva, Switzerland, 2023. [Online]. Available: https://www.who.int/publications/i/item/9789240073104
  2. R. Saeedi et al., "Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edition," Diabetes Res. Clin. Pract., vol. 157, p. 107843, Nov. 2019.
  3. D. Bloom, E. Cafiero, E. Jané-Llopis, S. Abrahams-Gessel, L. Bloom, S. Fathima, et al., "The Global Economic Burden of Noncommunicable Diseases," World Economic Forum, Geneva, 2011.
  4. R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, "Deep learning for healthcare: Review, opportunities and challenges," Briefings Bioinformatics, vol. 19, no. 6, pp. 1236–1246, Nov. 2018.
  5. E. J. Topol, "High-performance medicine: The convergence of human and artificial intelligence," Nat. Med., vol. 25, no. 1, pp. 44–56, Jan. 2019.
  6. D. Sisodia and D. S. Sisodia, "Prediction of diabetes using classification algorithms," Procedia Comput. Sci., vol. 132, pp. 1578–1585, 2018.
  7. M. M. Islam, F. Islam, M. M. Asiful Islam, and M. R. Islam, "Likelihood prediction of diabetes at early stage using data mining techniques," in Proc. IEEE Int. Conf. Comput. Commun. Chem. Mater. Electron. Eng. (IC4ME2), Rajshahi, Bangladesh, pp. 1–4, 2020.
  8. A.Choudhury and N. Gupta, "A survey on medical diagnosis of diabetes using machine learning techniques," in Recent Developments in Machine Learning and Data Analytics, Springer, pp. 67–78, 2019.
  9. S. Mohan, C. Thirumalai, and G. Srivastava, "Effective heart disease prediction using hybrid machine learning techniques," IEEE Access, vol. 7, pp. 81542–81554, 2019.
  10. K. Deepika and S. Seema, "Predictive analytics to prevent and control chronic diseases," in Proc. Int. Conf. Appl. Theor. Comput. Commun. Technol. (iCATccT), Tumkur, India, pp. 381–386, 2016.
  11. M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and M. A. Moni, "Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison," Comput. Biol. Med., vol. 136, p. 104672, Sep. 2021.
  12. A.Osareh and B. Shadgar, "Machine learning techniques to diagnose breast cancer," in Proc. 5th Int. Symp. Health Informatics Bioinformatics, Antalya, Turkey, pp. 114–120, 2010.
  13. P. Tschandl, C. Rinner, Z. Apalla, G. Argenziano, N. Codella, A. Halpern, et al., "Human–computer collaboration for skin cancer recognition," Nat. Med., vol. 26, no. 8, pp. 1229–1234, 2020.
  14. S. M. Lundberg and S.-I. Lee, "A unified approach to interpreting model predictions," in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, pp. 4765–4774, 2017.
  15. G. Stiglic, P. Kocbek, N. Fijacko, M. Zitnik, K. Dostal, and L. Cilar, "Interpretability of machine learning-based prediction models in healthcare," WIREs Data Mining Knowl. Discovery, vol. 10, no. 5, p. e1379, 2020.
  16. P. Rajpurkar, E. Chen, O. Banerjee, and E. J. Topol, "AI in health and medicine," Nat. Med., vol. 28, no. 1, pp. 31–38, Jan. 2022.
  17. O. Sagi and L. Rokach, "Ensemble learning: A survey," WIREs Data Mining Knowl. Discovery, vol. 8, no. 4, p. e1249, 2018.
  18. D. H. Wolpert, "Stacked generalization," Neural Netw., vol. 5, no. 2, pp. 241–259, 1992.
  19. J. Sterne, I. R. White, J. B. Carlin, M. Spratt, P. Royston, M. G. Kenward, A. M. Wood, and J. R. Carpenter, "Multiple imputation for missing data in epidemiological and clinical research," BMJ, vol. 338, p. b2393, 2009.
  20. American Diabetes Association, "2. Classification and diagnosis of diabetes: Standards of Medical Care in Diabetes — 2023," Diabetes Care, vol. 46, Suppl. 1, pp. S19–S40, Jan. 2023.