Journal
INTERNATIONAL ORTHOPAEDICS
Volume -, Issue -, Pages -Publisher
SPRINGER
DOI: 10.1007/s00264-023-06034-y
Keywords
Bard; ChatGPT; Chatbot; Hand Surgery; Multiple-choice question; Artificial intelligence
Categories
Ask authors/readers for more resources
This study investigated the performance of Google's chatbot Bard (R) on the European Board of Hand Surgery (EBHS) diploma examination and compared it with ChatGPT. The results showed that both current versions of ChatGPT and Bard were unable to pass the first part of the EBHS diploma exam.
PurposeAccording to a previous research, the chatbot ChatGPT (R) V3.5 was unable to pass the first part of the European Board of Hand Surgery (EBHS) diploma examination. This study aimed to investigate whether Google's chatbot Bard (R) would have superior performance compared to ChatGPT on the EBHS diploma examination.MethodsChatbots were asked to answer 18 EBHS multiple choice questions (MCQs) published in the Journal of Hand Surgery (European Volume) in five trials (A1 to A5). After A3, chatbots received correct answers, and after A4, incorrect answers. Consequently, their ability to modify their response was measured and compared.ResultsBard (R) scored 3/18 (A1), 1/18 (A2), 4/18 (A3) and 2/18 (A4 and A5). The average percentage of correct answers was 61.1% for A1, 62.2% for A2, 64.4% for A3, 65.6% for A4, 63.3% for A5 and 63.3% for all trials combined. Agreement was moderate from A1 to A5 (kappa = 0.62 (IC95% = [0.51; 0.73])) as well as from A1 to A3 (kappa = 0.60 (IC95% = [0.47; 0.74])). The formulation of Bard (R) responses was homogeneous, but its learning capacity is still developing.ConclusionsThe main hypothesis of our study was not proved since Bard did not score significantly higher than ChatGPT when answering the MCQs of the EBHS diploma exam. In conclusion, neither ChatGPT (R) nor Bard (R), in their current versions, can pass the first part of the EBHS diploma exam.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available