Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Large
language models (LLMs) have demonstrated remarkable capabilities in various
natural language processing tasks, including question answering, text
generation, and multimodal understanding. However, their performance in
specialized domains such as healthcare and their propensity for generating
hallucinated (false) information remains an area of active investigation. This
research paper explores the capabilities and limitations of Mistral's LLM,
mistral-large-2402, in tackling medical challenge problems and assessing its
tendency to hallucinate. The study is motivated by the potential of LLMs to
augment medical decision-making processes and the need to evaluate their
reliability in critical domains like healthcare. We investigate
mistral-large-2402's performance on a curated dataset of medical challenge
problems, spanning diagnosis, treatment recommendation, and medical condition
analysis tasks. Additionally, we examine the model's propensity for
hallucinating by analyzing its responses for factual inconsistencies and
unsubstantiated claims. Through quantitative and qualitative analyses, we
provide insights into mistral-large-2402's strengths and weaknesses in handling
medical challenges. Our evaluation methodology involves measuring the model's
accuracy, completeness, and coherence of responses, as well as its ability to
recognize and mitigate hallucinations. The findings of this study contribute to
the ongoing discourse on the responsible deployment of LLMs in healthcare and
highlight potential areas for improvement in model design and training.
Country : India
IRJIET, Volume 8, Issue 5, May 2024 pp. 156-164