Machine Learning-Based Road Accident Fatality Prediction Under Imbalanced Data Conditions: An Evaluation of Resampling and Classification Techniques

Olusola Theophilus FaboyaDepartment of Computing and Information Science, Bamidele Olumilua University of Education, Science and Technology, Ikere-Ekiti, Nigeria

Vol 10 No 5 (2026): Volume 10, Issue 5, May 2026 | Pages: 705-712

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 31-05-2026

doi Logo doi.org/10.47001/IRJIET/2026.105094

Abstract

Road traffic accidents remain a leading cause of mortality worldwide and a major public safety concern, particularly in developing regions where data imbalance affects predictive modelling performance. This study proposes a machine learning framework for fatal accident prediction under imbalanced class conditions using multiple resampling strategies and classification algorithms. Multiple classification algorithms, including Logistic Regression, Random Forest, XGBoost, and LightGBM, were evaluated with both data-level and algorithm-level imbalance-handling techniques that include Synthetic Minority Oversampling Technique (SMOTE), ADASYN, Random Oversampling and Random Undersampling. Performance was measured using imbalance-aware metrics such as accuracy, precision, recall, F1-score, and Area Under the Receiver Operating Characteristic Curve (ROC-AUC). In addition, statistical significance was tested using the Friedman and Wilcoxon signed-rank tests. To enhance interpretability, SHAP analysis was used to explain model decisions at both global and local levels. Experimental results indicate that ensemble boosting models significantly outperformed conventional methods, with LightGBM achieving the best overall performance with 99.35% accuracy, 99.25% F1-score, and ROC-AUC of 0.999736 without resampling. XGBoost also demonstrated strong robustness to class imbalance. Although resampling techniques slightly improved minority class learning, the improvements were not statistically significant. SHAP analysis reveals that vehicle make, crash location and engine type were key determinants of fatal outcomes. The findings demonstrate that advanced boosting algorithms possess strong predictive capability for imbalanced crash severity prediction and provide insight into developing robust predictive systems for traffic safety management for policymakers and intelligent transportation systems. It also demonstrates that integrating imbalance-aware learning with explainable AI enhances both predictive performance and interpretability in road safety analytics.

Keywords

Imbalance-handling, Resampling, classification, road accident fatality, Machine learning.


Citation of this Article

Olusola Theophilus Faboya. (2026). Machine Learning-Based Road Accident Fatality Prediction Under Imbalanced Data Conditions: An Evaluation of Resampling and Classification Techniques. International Research Journal of Innovations in Engineering and Technology - IRJIET, 10(5), 705-712. Article DOI https://doi.org/10.47001/IRJIET/2026.105094

References
Amiri, M. A., Afshari, S., & Soltani, A. (2025). Machine learning approaches to traffic accident severity prediction: Addressing class imbalance. Machine Learning with Applications, 22 (November), 100792. https://doi.org/10.1016/j.mlwa.2025.100792

Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, w. P. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. http://arxiv.org/abs/2003.09788

Dayanah, I., Melinda, A., Prihutaminingsih, T., & Sarwinda, D. (2023). Tree-Based Ensemble Methods and Their Applications for Predicting Students’ Academic Performance. 13(3).

Fernández, A., García, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research, 61, 863–905. https://doi.org/10.1613/jair.1.11192

Moyo, E., Dzinavatonga, K., & Hakunavanhu, Z. L. (2025). Logistic Regression Modelling of Road Traffic Accident Severity: A Study on Driver Characteristics in Zimbabwe. INTERNATIONAL JOURNAL OF RESEARCH AND INNOVATION IN SOCIAL SCIENCE, IX(XI), 3189–3206. https://doi.org/DOI: 10.47772/IJRISS

Obasi, I. C., & Benson, C. (2023). Evaluating the effectiveness of machine learning techniques in forecasting the severity of traffic accidents. Heliyon, 9(8), e18812. https://doi.org/10.1016/j.heliyon.2023.e18812

Somvanshi, S., Liu, J., Chakraborty, R., Tamakloe, R., & Das, S. (2026). Predicting Crash Severity using Naturalistic Driving Data and Neural Networks. International Journal of Intelligent Transportation Systems Research, 2023. https://doi.org/10.1007/s13177-025-00624-3

Swets, J. A. (1988). Measuring the Accuracy of Diagnostic Systems. Science, 240(4857), 1285–1293. online.stat.psu

Taha, K. (2026). Tree-based ensemble learning models for protein-protein interactions detection: a review and experimental evaluation. 1–28.

Xiao, Y., Lin, L., Zhou, H., Tan, Q., Wang, J., Yang, Y., & Xu, Z. (2023). Fatal crashes and rare events logistic regression: an exploratory empirical study. Frontiers in Public Health, 11. https://doi.org/10.3389/fpubh.2023.1294338