Exploring the Capabilities of Large Language Model Mistral Large (Mistral) on Medical Challenge Problems and Hallucinations

Pooja Mishra; Rutuja Bhujbal; Tushar Singh

doi:https://doi.org/10.47001/IRJIET/2024.804024

Exploring the Capabilities of Large Language Model Mistral Large (Mistral) on Medical Challenge Problems and Hallucinations

Pooja MishraDr. D. Y. Patil Institute of Engineering Management and Research, Pune, Maharashtra, IndiaRutuja BhujbalDr. D. Y. Patil Institute of Engineering Management and Research, Pune, Maharashtra, IndiaTushar SinghDr. D. Y. Patil Institute of Engineering Management and Research, Pune, Maharashtra, India

Vol 8 No 5 (2024): Volume 8, Issue 5, May 2024 | Pages: 156-164

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 29-05-2024

doi.org/10.47001/IRJIET/2024.804024

Full Text PDF

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in various natural language processing tasks, including question answering, text generation, and multimodal understanding. However, their performance in specialized domains such as healthcare and their propensity for generating hallucinated (false) information remains an area of active investigation. This research paper explores the capabilities and limitations of Mistral's LLM, mistral-large-2402, in tackling medical challenge problems and assessing its tendency to hallucinate. The study is motivated by the potential of LLMs to augment medical decision-making processes and the need to evaluate their reliability in critical domains like healthcare. We investigate mistral-large-2402's performance on a curated dataset of medical challenge problems, spanning diagnosis, treatment recommendation, and medical condition analysis tasks. Additionally, we examine the model's propensity for hallucinating by analyzing its responses for factual inconsistencies and unsubstantiated claims. Through quantitative and qualitative analyses, we provide insights into mistral-large-2402's strengths and weaknesses in handling medical challenges. Our evaluation methodology involves measuring the model's accuracy, completeness, and coherence of responses, as well as its ability to recognize and mitigate hallucinations. The findings of this study contribute to the ongoing discourse on the responsible deployment of LLMs in healthcare and highlight potential areas for improvement in model design and training.

Keywords

Large Language Models, Medical Question Answering, Mistral 7B, Few-shot Learning, Fine-tuning, Model Merging, Quantization

Citation of this Article

Pooja Mishra, Rutuja Bhujbal, Tushar Singh, “Exploring the Capabilities of Large Language Model Mistral Large (Mistral) on Medical Challenge Problems & Hallucinations”, Published in International Research Journal of Innovations in Engineering and Technology - IRJIET, Volume 8, Issue 5, pp 156-164, May 2024. Article DOI https://doi.org/10.47001/IRJIET/2024.805024

This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence

References

Ankit Pal, Malaikannan Sankarasubbu. 2024: Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations. Arxiv./abs/2402.07023
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed (2023). Mistral 7B. Arxiv./abs/2310.06825
Hsiao-Ching Tsai, Yueh-Fen Huang, Chih-Wei Kuo et al. Comparative Analysis of Automatic Literature Review Using Mistral Large Language Model and Human Reviewers, 07 March 2024, PREPRINT (Version 1) Research Square. [https://doi.org/10.21203/rs.3.rs-4022248/v1]
Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry Payne, Martin Seneviratne, Paul Gamble, Chris Kelly, Abubakr Babiker, Nathanael Schärli, Aakanksha Chowdhery, Philip Mansfield, Dina Demner-Fushman, Blaise Agüera y Arcas, Dale Webster, Greg S. Corrado, Yossi Matias, Katherine Chou, Juraj Gottweis, Nenad Tomasev, Yun Liu, Alvin Rajkomar, Joelle Barral, Christopher Semturs, Alan Karthikesalingam & Vivek Natarajan, (2023): Large language models encode clinical knowledge. Nature, 620(7972), 172-180. https://doi.org/10.1038/s41586-023-06291-2
Mistral Instruct 7B Fine Tuning on MedMCQA Dataset by Saankhya Mondal. Medium.
Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, Richard Dufour: BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains. arXiv:2402.10373v1 [cs.CL] 15 Feb 2024.
Tirth Dave, Sai Anirudh Athaluri, and Satyam Singh (2023). ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell, 6:1169595.
Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, and Antoine Bosselut. 2023. Meditron-70b: Scaling medical pretraining for large language models.
Marco Guevara, Shan Chen, Spencer Thomas, Tafadzwa L. Chaunzwa, Idalid Franco, Benjamin H. Kann, Shalini Moningi, Jack M. Qian, Madeleine Goldstein, Susan Harper, Hugo J. W. L. Aerts, Paul J. Catalano, Guergana K. Savova, Raymond H. Mak, and Danielle S. Bitterman. 2024. Large language models to identify social determinants of health in electronic health records. npj Digital Medicine, 7(1):6.
Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, and Xinghua Lu. 2019. PubMedQA: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577, Hong Kong, China. Association for Computational Linguistics.
Xisen Jin, Xiang Ren, Daniel Preotiuc-Pietro, and Pengxiang Cheng. 2023. Dataless knowledge fusion by merging weights of language models. In The Eleventh International Conference on Learning Representations.
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
Yunxiang Li, Zihan Li, Kai Zhang, Ruilong Dan, Steve Jiang, and You Zhang. 2023. Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge.
OpenAI. 2023. Chatgpt: Language models are few-shot learners. https://openai.com/blog/ chatgpt.

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

Exploring the Capabilities of Large Language Model Mistral Large (Mistral) on Medical Challenge Problems and Hallucinations

Abstract

Keywords

Citation of this Article

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links