Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Vol 10 No 5 (2026): Volume 10, Issue 5, May 2026 | Pages: 29-39
International Research Journal of Innovations in Engineering and Technology
OPEN ACCESS | Research Article | Published Date: 05-05-2026
Non-communicable diseases (NCDs) — including diabetes mellitus, cardiovascular disease, breast cancer, liver disease, chronic kidney disease, and skin malignancies — account for 74% of global mortality annually (WHO, 2023). Early detection is the single most effective intervention for improving survival rates and reducing treatment burden; yet conventional diagnostic pathways rely predominantly on symptomatic presentation, which delays detection to advanced disease stages where treatment efficacy is substantially diminished. Predictive analytics powered by machine learning (ML) offers a transformative alternative: by analysing patterns in clinical, biochemical, imaging, and genetic data, ML models can identify individuals at elevated disease risk months or years before clinical symptoms manifest. This paper presents a comprehensive predictive analytics system for early multi-disease detection developed at Shri Sai College of Engineering and Technology (SSCET), DBATU University, Chandrapur. The system implements a multi-model ML pipeline incorporating data preprocessing (KNN imputation, SMOTE oversampling, StandardScaler normalisation), classical ML models (Random Forest, XGBoost, SVM, Logistic Regression), deep learning models (CNN for medical imaging, LSTM for temporal EHR sequences), and a stacked ensemble meta-learner combining predictions for optimal accuracy. Evaluation across six benchmark healthcare datasets — Pima Indians Diabetes, Cleveland Heart Disease, Wisconsin Breast Cancer, Indian Liver Patient, Chronic Kidney Disease, and HAM10000 Skin Lesion — achieved accuracy values ranging from 88.4% (liver disease) to 97.5% (chronic kidney disease), with the proposed ensemble achieving 96.8% overall accuracy and AUC-ROC of 0.98. SHAP (SHapley Additive exPlanations) explainability analysis provides clinically interpretable feature importance rankings aligned with established biomedical knowledge, addressing the 'black box' critique of ML in healthcare. A web-based clinical dashboard enables risk score visualisation and model explanation for non-technical medical practitioners. The results establish that a multi-model ensemble approach, trained on publicly available datasets without specialised hardware, can deliver clinically relevant early disease detection performance comparable to expert clinical judgment.
Predictive Analytics; Early Disease Detection; Machine Learning; Random Forest; XGBoost; SVM; Deep Learning; CNN; LSTM; Ensemble Learning; SMOTE; SHAP Explainability; Diabetes; Cardiovascular Disease; Breast Cancer; Chronic Kidney Disease; Healthcare AI; EHR; SSCET; DBATU University
Sanjivani Sanjay Meshram, Trushna Shankar Sandrawar, Priya Shamrao Tajne, Vaishnavi Sonu Satimeshram, & Suraj S. Bankar. (2026). Predictive Analytics for Early Disease Detection Using Machine Learning: A Multi-Model Ensemble Approach with SHAP Explainability. International Research Journal of Innovations in Engineering and Technology - IRJIET, 10(5), 29-39. Article DOI https://doi.org/10.47001/IRJIET/2026.105005
This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence