Predictive Analysis of Pharmaceutical Compounds Using Kernel Naive Bayes in Clinical Informatics

Marwa Mawfaq Mohamedsheet Al-HatabTechnical Engineering College, Northern Technical University, Mosul, IraqMohamedshet Mwfq MohmdshtMiddle East University, Faculty of Pharmacy, Amman, JordanMurtadha A. SalimMiddle East University, Faculty of Pharmacy, Amman, JordanHussein M. GateaMiddle East University, Faculty of Pharmacy, Amman, JordanGhaith Z. IhsanTechnical Engineering College, Northern Technical University, Mosul, IraqMuataz Z. AhmedTechnical Engineering College, Northern Technical University, Mosul, IraqIbrahim M. HusseinTechnical Engineering College, Northern Technical University, Mosul, IraqOmar A. AbdullahTechnical Engineering College, Northern Technical University, Mosul, IraqAlaq M. ZakiCollege of Dentistry, University of Mosul, IraqWameedh R. FathelMinistry of Education, General Directorate of Education in Nineveh, Iraq

Vol 9 No 5 (2025): Volume 9, Issue 5, May 2025 | Pages: 88-97

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 19-05-2025

doi Logo doi.org/10.47001/IRJIET/2025.905012

Abstract

The drug classification into various needful types, improved quality clinical decisions, and more accurate support of pharmacovigilance are some areas of pharmaceutical sciences that can be transformed using Machine learning (ML). Encyclopedia of Information Systems 3rd Edition Kernel Naive Bayes for Drug Classification. This study describes a Kernel Naive Bayes (KNB) model for drug classification based on a wide variety of pharmacological and therapeutic properties. From drug product data repository, this model integrates at least fundamental drug-related features, such as dosage forms, routes of administration, adverse reactions, interactions, and indications for use, which are considered as basic elements in pharmaceutical research and clinical pharmacy. It uses a Gaussian kernel to model continuous variables and a Multivariate Multinomial (MVMN) distribution to model categorical features — which allows for a more complex relationship among the features. To improve interpretability and mitigate noise, irrelevant or sparse attributes (i.e., regulatory codes, precautionary labels) were excluded. The last model attained an accuracy of 83.2% along with a prediction speed of ~1600 observations/sec proving its potential in handling large-scale pharmaceutical data effectively and efficiently. These results support the relevance of kernel-based probabilistic models in pharmacy-related issues, especially in drug safety screening, automated classification, and pharmacological data mining.

Keywords

Pharmaceutical Informatics, Drug Classification, Kernel Naive Bayes, Predictive Modeling, Feature Selection, Gaussian Kernel


Citation of this Article

Marwa Mawfaq Mohamedsheet Al-Hatab, Mohamedshet Mwfq Mohmdsht, Murtadha A. Salim, Hussein M. Gatea, Ghaith Z. Ihsan, Muataz Z.Ahmed, Ibrahim M. Hussein, Omar A. Abdullah, Alaq M. Zaki, & Wameedh R. Fathel. (2025). Predictive Analysis of Pharmaceutical Compounds Using Kernel Naive Bayes in Clinical Informatics. International Research Journal of Innovations in Engineering and Technology - IRJIET, 9(5), 88-97. Article DOI https://doi.org/10.47001/IRJIET/2025.905012

References
  1. A.S. Kotsiantis, I. Zaharakis, and P. Pintelas, “Machine learning: a review of classification and combining techniques,” Artificial Intelligence Review, vol. 26, no. 3, pp. 159–190, 2006.
  2. S. S. Bharati, P. Podder, and D. J. Lee, “Machine learning in pharmaceutical industry: applications and trends,” Applied Sciences, vol. 10, no. 17, p. 5701, 2020.
  3. I.Goodfellow, Y. Bengio, and A. Courville, Deep Learning, Cambridge, MA: MIT Press, 2016.
  4. G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
  5. D. J. Spiegelhalter, “Bayesian methods in health-related research,” Statistical Science, vol. 8, no. 4, pp. 356–383, 1993.
  6. F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv preprint arXiv:1702.08608, 2017.
  7. M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you?” Explaining the predictions of any classifier,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, 2016, pp. 1135–1144.
  8. N. Bansal, S. Singh, and A. Arora, “Drug classification using machine learning and data mining techniques,” International Journal of Pharmaceutical Sciences and Research, vol. 10, no. 4, pp. 1642–1649, 2019.
  9. C. M. Bishop, Pattern Recognition and Machine Learning, New York: Springer, 2006.
  10. Z. Wang, S. Wang, and J. M. Hu, “AI in drug discovery and development: current applications and future perspectives,” Journal of Pharmaceutical Innovation, vol. 16, no. 3, pp. 364–376, 2021.
  11. M. M. Al-Hatab, A. Thamer, A. R. H. Al-Jader, and E. Younis, "Healthcare Monitoring COVID-19 Patients Based on IoT System," Revista Bionatura, vol. 8, no. CSS 4, pp. 1-11, Oct. 2023, doi: 10.21931/RB/CSS/2023.08.04.24.
  12. R. R. O. Al-Nima, M. M. M. Al-Hatab, and M. A. Qasim, "An artificial intelligence approach for verifying persons by employing the deoxyribonucleic acid (DNA) nucleotides," Journal of Electrical and Computer Engineering, vol. 2023, no. 1, Art. no. 6678837, 2023.
  13. M. A. Malla, O. H. Al-Beaka, D. M. Hameed, M. M. M. Al-Hatab, R. O. Al-Nima, M. S. Jarjees, and K. A. K. Al-Maqsood, "Adopting Machine Learning to Automatically Identify a Suitable Surgery Type for Refractive Error Patients," Jurnal Kejuruteraan, vol. 36, no. 4, pp. 1749-1757, 2024.
  14. M. A. Al-Hashim, W. R. Fathel, H. D. Ali, and M. M. M. Al-Hatab, “Enhanced Non-Invasive Blood Glucose Monitoring System Employing Wearable Optical Technology, " FPA J. Eng. Sci. ", vol. 19, no. 1, pp. 1-10, Jan. 2025, doi: https://doi.org/10.54216/FPA.190101.
  15. M. M. M. Al-Hatab, A. S. I. Al-Obaidi, and M. A. Al-Hashim, "Exploring CIE lab color characteristics for skin lesion images detection: a novel image analysis methodology incorporating color-based segmentation and luminosity analysis," Fusion: Practice and Applications, vol. 15, no. 1, pp. 88-97, 2024.
  16. A.Johny, 2023, “Comprehensive Drug Information Dataset,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/anoopjohny/comprehensive-drug-information-dataset
  17. C. Selvaraj, I. Chandra, and S. K. Singh, "Artificial intelligence and machine learning approaches for drug design: Challenges and opportunities for the pharmaceutical industries," Molecular Diversity, pp. 1–21, 2022.
  18. S. Glamocak, Feature importance in imbalanced binary classification with ensemble methods, Doctoral dissertation, Technische Universität Wien, 2024.
  19. J. Su, D. A. Knowles, and R. Rabadan, "Disentangling interpretable factors with supervised independent subspace principal component analysis," in Advances in Neural Information Processing Systems, vol. 37, pp. 37408–37438, 2024.
  20. S. S. Bafjaish, "Comparative analysis of Naive Bayesian techniques in health-related for classification task," Journal of Soft Computing and Data Mining, vol. 1, no. 2, pp. 1–10, 2020.