Comparative Evaluation of ChatGPT and Microsoft Copilot in Solving Clinical Vignette- style multiple-choice questions (MCQs) in Physiology

Rekha  Prabhu; Girish Prabhu; Ramesh Holla

doi:10.31436/imjm.v25i01.3213

Comparative Evaluation of ChatGPT and Microsoft Copilot in Solving Clinical Vignette- style multiple-choice questions (MCQs) in Physiology

Authors

Rekha Prabhu Manipal University College Malaysia
Girish Prabhu Manipal University College Malaysia
Ramesh Holla Manipal Academy of Higher Education, India

DOI:

https://doi.org/10.31436/imjm.v25i01.3213

Keywords:

ChatGPT, Large language models, Microsoft Copilot, United States Medical Licensing Examination, USMLE

Abstract

INTRODUCTION: Large language models (LLMs) are increasingly used by MBBS students as supplementary resources for exam preparation. The objective of this study was to evaluate the performance of ChatGPT and Microsoft Copilot in answering clinical vignette-style physiology MCQs from widely used resources for the United States Medical Licensing Examination (USMLE). MATERIALS AND METHODS: Fifty clinical vignette-style physiology multiple choice questions (MCQs) from the various USMLE question banks were submitted to ChatGPT and Microsoft Copilot to choose the correct option. The performance of ChatGPT and Microsoft Copilot was assessed using the provided answers in the question bank. Two experienced physiologists independently reviewed the explanations provided by ChatGPT and Microsoft Copilot for each MCQ. The explanations were rated between one to three points based on whether the answers were completely incorrect, partially correct with inaccurate information, or correct with adequate information. RESULTS: ChatGPT and Microsoft Copilot both correctly answered 48 and 47 out of 50 questions, reflecting a 96% and 94% accuracy rates respectively. One MCQ each on hypothyroidism and arrhythmia was incorrectly answered by both ChatGPT and Microsoft Copilot. For two MCQs, the explanations provided were inaccurate by ChatGPT and Microsoft Copilot provided inaccurate explanations for four of the MCQs. CONCLUSION: ChatGPT and Microsoft Copilot both demonstrated more than 90% accuracy in answering case-based MCQs from the USMLE Step 1 resources. Their incorrect option choices MCQs on hypothyroidism and inaccurate explanations for some MCQs highlight cautious use of AI by students.

Downloads

Download data is not yet available.

Downloads

Published

03.03.2026

How to Cite

Prabhu, R., Prabhu, G., & Holla, R. (2026). Comparative Evaluation of ChatGPT and Microsoft Copilot in Solving Clinical Vignette- style multiple-choice questions (MCQs) in Physiology. IIUM Medical Journal Malaysia, 25(01). https://doi.org/10.31436/imjm.v25i01.3213

Download Citation

Issue

Vol. 25 No. 01 (2026): Volume 25 Special Issue No 1

Section

Original Articles

License

All material submitted for publication is assumed to be submitted exclusively to the IIUM Medical Journal Malaysia (IMJM) unless the contrary is stated. Manuscript decisions are based on a double-blinded peer review process. The Editor retains the right to determine the style and if necessary, edit and shorten any material accepted for publication.

IMJM retain copyright to all the articles published in the journal. All final ‘proof’ submissions must be accompanied by a completed Copyright Assignment Form, duly signed by all authors. The author(s) or copyright owner(s) irrevocably grant(s) to any third party, in advance and in perpetuity, the right to use, reproduce or disseminate the research article in its entirety or in part, in any format or medium, provided that no substantive errors are introduced in the process, proper attribution of authorship and correct citation details are given, and that the bibliographic details are not changed. If the article is reproduced or disseminated in part, this must be clearly and unequivocally indicated.