Comparative Analysis of Four AI Platforms for Orthopedic Education: Evaluation of Accuracy and Explanation Quality

Authors

  • Simerjit Singh Taylor's University, Malaysia
  • Avneet Kaur University Tunku Abdul Razak, Malaysia
  • Harmanpreet Singh Government Polytechnic College, Jalandhar, India

DOI:

https://doi.org/10.31436/imjm.v25i01.3215

Keywords:

Artificial intelligence, Orthopaedics, ChatGPT, Gemini, Perplexity

Abstract

INTRODUCTION: Artificial Intelligence (AI) systems are increasingly used in medical education, which requires integrative clinical reasoning. Despite their rapid adoption, little is known about the comparative performance of different AI platforms in solving scenario-based orthopaedic multiple-choice questions (MCQs) and providing high-quality explanatory feedback. We evaluated four AI platforms; ChatGPT, Perplexity, Claude, and Gemini; on their ability to answer 45 validated orthopaedic MCQs accurately and provide clear, logical explanations. MATERIALS AND METHODS: Each platform received the same 45 MCQs under standardized conditions. Correctness was scored out of 45, and the explanation quality was scored on a scale of 0 to 90 using a structured rubric. Pairwise comparisons were conducted using one-way ANOVA and Tukey's post hoc tests. A composite score, comprising 70% correctness and 30% explanation weightage, further contextualized overall performance. RESULTS: ChatGPT and Perplexity demonstrated higher correctness scores than Claude and Gemini. Explanation quality ranged from 80% (72/90) for ChatGPT and Perplexity; to 63% (57/90) for Gemini. Both correctness and explanation quality scores were positively correlated (r=0.84, p<0.01). Composite scores paralleled these findings, placing ChatGPT and Perplexity above Claude and Gemini. CONCLUSIONS: The results highlight that AI platforms vary substantially in accuracy and the clarity of their explanations, thus underscoring the importance of carefully selecting a platform when integrating AI into orthopaedic education. Educators should consider that the significant inter-platform variability in correctness and explanation quality observed in this study has important implications for orthopaedic education.

Downloads

Download data is not yet available.

Downloads

Published

03.03.2026

How to Cite

Singh, S., Kaur, A., & Singh, H. (2026). Comparative Analysis of Four AI Platforms for Orthopedic Education: Evaluation of Accuracy and Explanation Quality. IIUM Medical Journal Malaysia, 25(01). https://doi.org/10.31436/imjm.v25i01.3215