4.5 Article

Does Google's Bard Chatbot perform better than ChatGPT on the European hand surgery exam?

期刊

INTERNATIONAL ORTHOPAEDICS
卷 -, 期 -, 页码 -

出版社

SPRINGER
DOI: 10.1007/s00264-023-06034-y

关键词

Bard; ChatGPT; Chatbot; Hand Surgery; Multiple-choice question; Artificial intelligence

向作者/读者索取更多资源

This study investigated the performance of Google's chatbot Bard (R) on the European Board of Hand Surgery (EBHS) diploma examination and compared it with ChatGPT. The results showed that both current versions of ChatGPT and Bard were unable to pass the first part of the EBHS diploma exam.
PurposeAccording to a previous research, the chatbot ChatGPT (R) V3.5 was unable to pass the first part of the European Board of Hand Surgery (EBHS) diploma examination. This study aimed to investigate whether Google's chatbot Bard (R) would have superior performance compared to ChatGPT on the EBHS diploma examination.MethodsChatbots were asked to answer 18 EBHS multiple choice questions (MCQs) published in the Journal of Hand Surgery (European Volume) in five trials (A1 to A5). After A3, chatbots received correct answers, and after A4, incorrect answers. Consequently, their ability to modify their response was measured and compared.ResultsBard (R) scored 3/18 (A1), 1/18 (A2), 4/18 (A3) and 2/18 (A4 and A5). The average percentage of correct answers was 61.1% for A1, 62.2% for A2, 64.4% for A3, 65.6% for A4, 63.3% for A5 and 63.3% for all trials combined. Agreement was moderate from A1 to A5 (kappa = 0.62 (IC95% = [0.51; 0.73])) as well as from A1 to A3 (kappa = 0.60 (IC95% = [0.47; 0.74])). The formulation of Bard (R) responses was homogeneous, but its learning capacity is still developing.ConclusionsThe main hypothesis of our study was not proved since Bard did not score significantly higher than ChatGPT when answering the MCQs of the EBHS diploma exam. In conclusion, neither ChatGPT (R) nor Bard (R), in their current versions, can pass the first part of the EBHS diploma exam.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据