Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Vol 9 No 11 (2025): Volume 9, Issue 11, November 2025 | Pages: 12-25
International Research Journal of Innovations in Engineering and Technology
OPEN ACCESS | Research Article | Published Date: 05-11-2025
Data-Centric Artificial Intelligence (DCAI) reframes clinical NLP by treating data quality, coverage, and governance as the primary levers of performance and safety, rather than model tinkering alone. In healthcare where actionable knowledge is embedded in unstructured narratives such as progress notes, discharge summaries, radiology/pathology reports, referral letters, and patient messages this paper proposes an end-to-end, practice-oriented framework to operationalize DCAI for textual understanding in decision systems. We (1) anchor tasks to measurable clinical utility and harm profiles; (2) detail corpus assembly with stratified sampling across sites, specialties, and demographics; (3) formalize schemas linking entities, assertions (negation/uncertainty), relations, and temporal qualifiers to SNOMED CT, ICD-10/11, RxNorm, and LOINC; (4) combine programmatic labeling (heuristics, ontologies, prompts-as-LFs) with clinician adjudication, active learning, and targeted augmentation; (5) outline privacy-preserving training via de-identification, federated learning, and differential privacy; (6) present model-agnostic evaluation beyond accuracy calibration, uncertainty, fairness, robustness, and decision-curve net benefit; and (7) specify deployment blueprints for monitoring drift, instituting human-in-the-loop overrides, and creating auditable feedback loops that continuously improve data assets. Four exemplar use-cases ICD code suggestion; adverse drug event extraction, radiology impression normalization, and patient-message triage demonstrate tangible workflows, metrics, and governance checklists. Results show how continuous data refinement improves discrimination and calibration while reducing alert burden and subgroup disparities, enabling safer, more equitable, and maintainable clinical decision support. We conclude with implementation checklists and a reproducible playbook to accelerate DCAI adoption across diverse health systems and languages.
Data-Centric AI, Clinical NLP, Healthcare Decision Support, Programmatic Labeling, Federated Learning
Sabiha Tasneem. (2025). Data-Centric Artificial Intelligence for Textual Understanding in Healthcare Decision Systems. International Research Journal of Innovations in Engineering and Technology - IRJIET, 9(11), 12-25. Article DOI https://doi.org/10.47001/IRJIET/2025.911002
This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence