Data-Centric Artificial Intelligence for Textual Understanding in Healthcare Decision Systems

Sabiha Tasneem

doi:https://doi.org/10.47001/IRJIET/2025.911002

Data-Centric Artificial Intelligence for Textual Understanding in Healthcare Decision Systems

Sabiha TasneemSenior Software Engineer, Stykkist Inc, New Jersey, USA

Vol 9 No 11 (2025): Volume 9, Issue 11, November 2025 | Pages: 12-25

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 05-11-2025

doi.org/10.47001/IRJIET/2025.911002

Full Text PDF

Abstract

Data-Centric Artificial Intelligence (DCAI) reframes clinical NLP by treating data quality, coverage, and governance as the primary levers of performance and safety, rather than model tinkering alone. In healthcare where actionable knowledge is embedded in unstructured narratives such as progress notes, discharge summaries, radiology/pathology reports, referral letters, and patient messages this paper proposes an end-to-end, practice-oriented framework to operationalize DCAI for textual understanding in decision systems. We (1) anchor tasks to measurable clinical utility and harm profiles; (2) detail corpus assembly with stratified sampling across sites, specialties, and demographics; (3) formalize schemas linking entities, assertions (negation/uncertainty), relations, and temporal qualifiers to SNOMED CT, ICD-10/11, RxNorm, and LOINC; (4) combine programmatic labeling (heuristics, ontologies, prompts-as-LFs) with clinician adjudication, active learning, and targeted augmentation; (5) outline privacy-preserving training via de-identification, federated learning, and differential privacy; (6) present model-agnostic evaluation beyond accuracy calibration, uncertainty, fairness, robustness, and decision-curve net benefit; and (7) specify deployment blueprints for monitoring drift, instituting human-in-the-loop overrides, and creating auditable feedback loops that continuously improve data assets. Four exemplar use-cases ICD code suggestion; adverse drug event extraction, radiology impression normalization, and patient-message triage demonstrate tangible workflows, metrics, and governance checklists. Results show how continuous data refinement improves discrimination and calibration while reducing alert burden and subgroup disparities, enabling safer, more equitable, and maintainable clinical decision support. We conclude with implementation checklists and a reproducible playbook to accelerate DCAI adoption across diverse health systems and languages.

Keywords

Data-Centric AI, Clinical NLP, Healthcare Decision Support, Programmatic Labeling, Federated Learning

Citation of this Article

Sabiha Tasneem. (2025). Data-Centric Artificial Intelligence for Textual Understanding in Healthcare Decision Systems. International Research Journal of Innovations in Engineering and Technology - IRJIET, 9(11), 12-25. Article DOI https://doi.org/10.47001/IRJIET/2025.911002

This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence

References

Andresini, G., Appice, A., Ienco, D., et al. (2024). DIAMANTE: A datacentric semantic segmentation approach to map tree dieback induced by bark beetle infestations via satellite images. In: Journal of intelligent information systems. https://doi.org/10.1007/s10844-024-00877-6.
Burch, M., & Weiskopf, D. (2013). On the benefits and drawbacks of radial diagrams. In: Handbook of human centric visualization. Springer, pp. 429– 451. https://doi.org/10.1007/978-1-4614-7485-2_17.
Frid-Adar, M., E. Klang, M. Amitai, et al. (2018). Synthetic data augmentation using GAN for improved liver lesion classification. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, pp. 289–293. https://doi.org/10.1109/ISBI.2018.8363576.
Jakubik, J., Vössing, M., Kühl, N., et al. (2024). Data-centric artificial intelligence. In: Business & information systems engineering. https://doi.org/10.1007/s12599-024-00857-8.
Kumar, S., Datta, S., Singh, V., et al. (2024). Opportunities and Challenges in Data-Centric AI. In: IEEE Access. https://doi.org/10.1109/ACCESS.2024.3369417.
Luley, P., Deriu, J. M., Yan, P., et al. (2023). From concept to implementation: The data-centric development process for AI in industry. In: 2023 10th IEEE Swiss Conference on Data Science (SDS). IEEE, pp. 73–76. https://doi.org/10.1109/SDS57534.2023.00017.
Gudivada V, Apon A, Ding J (2017) Data quality considerations for big data and machine learning: going beyond data cleaning and transformations. Int J Adv Softw 10(1):1–20
Lin Q, Ye G, Wang J, Liu H (2022) RoboFlow: a data-centric workflow management system for developing AI-enhanced robots. In: Proceedings of the conference on robot learning. PMLR, pp 1789–1794
Peng, J., Wu, W., Lockhart, B., et al. (2021). Dataprep. eda: Task-centric exploratory data analysis for statistical modeling in python. In: Proceedings of the 2021 international conference on management of data, pp. 2271– 2280. https://doi.org/10.1145/3448016.3457330.
Roscher, R., Rußwurm, M., Gevaert, C., et al. (2023). Data-centric machine learning for geospatial remote sensing data. In: CoRR. https://doi.org/10.48550/arXiv2312.05327.
Seedat, N., Imrie, F., & van der Schaar, M. (2024). Navigating Data-Centric Artificial Intelligence With DC-Check: Advances, Challenges, and Opportunities. In: IEEE Transactions on Artificial Intelligence 5.6. https://doi.org/10.1109/TAI.2023.3345805.
Whang, S. E., Roh, Y., Song, H., et al. (2023). Data collection and quality challenges in deep learning: A data-centric AI perspective. In: The VLDB Journal 32.4, pp. 791–813.
Zahid, A., Kay Poulsen, J., Sharma, R., et al. (2021). A systematic review of emerging information technologies for sustainable data-centric healthcare. In: International Journal of Medical Informatics 149. https://doi.org/10.1016/j.ijmedinf.2021.104420.
de Carvalho, O. L. F., de Carvalho Junior, O. A., de Albuquerque, A. O., Orlandi, A. G., Hirata, I., Borges, D. L., Gomes, R. A. T., & Guimarães, R. F. (2023). A data-centric approach for wind plant instance-level segmentation using semantic segmentation and gis. Remote Sensing, 15(5), 1–23.
Ferreira de Carvalho, O.L., Olino de Albuquerque, A., Luiz, A.S., Henrique Guimarães Ferreira, P., Mou, L., e Silva, D.G., Abílio de Carvalho Junior, O. (2023). A data-centric approach for rapid dataset generation using iterative learning and sparse annotations. In: IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium, pp. 5650–5653.

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

Data-Centric Artificial Intelligence for Textual Understanding in Healthcare Decision Systems

Abstract

Keywords

Citation of this Article

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links