Comparative Evaluation of ChatGPT and Microsoft Copilot in Solving Clinical Vignette- style multiple-choice questions (MCQs) in Physiology

Authors

  • Rekha Prabhu Manipal University College Malaysia
  • Girish Prabhu Manipal University College Malaysia
  • Ramesh Holla Manipal Academy of Higher Education, India

DOI:

https://doi.org/10.31436/imjm.v25i01.3213

Keywords:

ChatGPT, Large language models, Microsoft Copilot, United States Medical Licensing Examination, USMLE

Abstract

INTRODUCTION: Large language models (LLMs) are increasingly used by MBBS students as supplementary resources for exam preparation. The objective of this study was to evaluate the performance of ChatGPT and Microsoft Copilot in answering clinical vignette-style physiology MCQs from widely used resources for the United States Medical Licensing Examination (USMLE). MATERIALS AND METHODS: Fifty clinical vignette-style physiology multiple choice questions (MCQs) from the various USMLE question banks were submitted to ChatGPT and Microsoft Copilot to choose the correct option. The performance of ChatGPT and Microsoft Copilot was assessed using the provided answers in the question bank. Two experienced physiologists independently reviewed the explanations provided by ChatGPT and Microsoft Copilot for each MCQ. The explanations were rated between one to three points based on whether the answers were completely incorrect, partially correct with inaccurate information, or correct with adequate information. RESULTS: ChatGPT and Microsoft Copilot both correctly answered 48 and 47 out of 50 questions, reflecting a 96% and 94% accuracy rates respectively. One MCQ each on hypothyroidism and arrhythmia was incorrectly answered by both ChatGPT and Microsoft Copilot. For two MCQs, the explanations provided were inaccurate by ChatGPT and Microsoft Copilot provided inaccurate explanations for four of the MCQs. CONCLUSION: ChatGPT and Microsoft Copilot both demonstrated more than 90% accuracy in answering case-based MCQs from the USMLE Step 1 resources. Their incorrect option choices MCQs on hypothyroidism and inaccurate explanations for some MCQs highlight cautious use of AI by students.

Downloads

Download data is not yet available.

Downloads

Published

03.03.2026

How to Cite

Prabhu, R., Prabhu, G., & Holla, R. (2026). Comparative Evaluation of ChatGPT and Microsoft Copilot in Solving Clinical Vignette- style multiple-choice questions (MCQs) in Physiology. IIUM Medical Journal Malaysia, 25(01). https://doi.org/10.31436/imjm.v25i01.3213