☆ 4.5 Article

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions

WORLD NEUROSURGERY (2023)

期刊

WORLD NEUROSURGERY

卷 179, 期 -, 页码 E160-E165

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.wneu.2023.08.042

关键词

Artificial intelligence; ChatGPT; GPT-4; Machine learning; Neurosurgical boards; Neurosurgical training; SANS question

类别

Clinical Neurology Surgery

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study examines the competence of GPT-4, an updated language model, on neurosurgical board-style questions. The results show that GPT-4 outperforms medical students and residents, suggesting its potential in medical education and clinical decision-making.

-BACKGROUND: Artificial intelligence (AI) and machine learning have transformed health care with applications in various specialized fields. Neurosurgery can benefit from artificial intelligence in surgical planning, predicting patient outcomes, and analyzing neuroimaging data. GPT-4, an -pdated language model with additional training parameters, has exhibited exceptional performance on standardized exams. This study examines GPT-4's competence on neurosurgical board-style questions, comparing its performance with medical students and residents, to explore its potential in medical education and clinical decision-making.-METHODS: GPT-4's performance was examined on 643 Congress of Neurological Surgeons Self-Assessment Neurosurgery Exam (SANS) board-style questions from various neurosurgery subspecialties. Of these, 477 were text-based and 166 contained images. GPT-4 refused to answer 52 questions that contained no text. The remaining 591 questions were inputted into GPT-4, and its performance was evaluated based on first-time responses. Raw scores were analyzed across subspecialties and question types, and then compared to previous findings on Chat Generative pre-trained transformer performance against SANS users, medical students, and neurosurgery residents.-RESULTS: GPT-4 attempted 91.9% of Congress of Neurological Surgeons SANS questions and achieved 76.6% accuracy. The model's accuracy increased to 79.0% for text-only questions. GPT-4 outperformed Chat Generative pre-trained transformer (P < 0.001) and scored highest in pain/peripheral nerve (84%) and lowest in spine (73%) categories. It exceeded the performance of medical students (26.3%), neurosurgery residents (61.5%), and the national average of SANS users (69.3%) across all categories.-CONCLUSIONS: GPT-4 significantly outperformed medical students, neurosurgery residents, and the national average of SANS users. The mode's accuracy suggests potential applications in educational settings and clinical decision-making, enhancing provider efficiency, and improving patient care.

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions

期刊

WORLD NEUROSURGERY

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions

期刊

WORLD NEUROSURGERY

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文