☆ 4.5 Article

Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics

INTERNATIONAL ENDODONTIC JOURNAL (2023)

期刊

INTERNATIONAL ENDODONTIC JOURNAL

卷 -, 期 -, 页码 -

出版社

WILEY

DOI: 10.1111/iej.14014

关键词

artificial intelligence; Bing; ChatGPT; endodontics; Google Bard; large language models

类别

Dentistry, Oral Surgery & Medicine

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study evaluated and compared the validity and reliability of responses provided by GPT-3.5, Google Bard, and Bing to frequently asked questions in the field of endodontics. The findings showed that GPT-3.5 provided more credible information compared to Google Bard and Bing.

Aim: This study aimed to evaluate and compare the validity and reliability of responses provided by GPT-3.5, Google Bard, and Bing to frequently asked questions (FAQs) in the field of endodontics.Methodology: FAQs were formulated by expert endodontists (n = 10) and collected through GPT-3.5 queries (n = 10), with every question posed to each chatbot three times. Responses (N = 180) were independently evaluated by two board-certified endodontists using a modified Global Quality Score (GQS) on a 5-point Likert scale (5: strongly agree; 4: agree; 3: neutral; 2: disagree; 1: strongly disagree). Disagreements on scoring were resolved through evidence-based discussions. The validity of responses was analysed by categorizing scores into valid or invalid at two thresholds: The low threshold was set at score >= 4 for all three responses whilst the high threshold was set at score 5 for all three responses. Fisher's exact test was conducted to compare the validity of responses between chatbots. Cronbach's alpha was calculated to assess the reliability by assessing the consistency of repeated responses for each chatbot.Results: All three chatbots provided answers to all questions. Using the low-threshold validity test (GPT-3.5: 95%; Google Bard: 85%; Bing: 75%), there was no significant difference between the platforms (p > .05). When using the high-threshold validity test, the chatbot scores were substantially lower (GPT-3.5: 60%; Google Bard: 15%; Bing: 15%). The validity of GPT-3.5 responses was significantly higher than Google Bard and Bing (p = .008). All three chatbots achieved an acceptable level of reliability (Cronbach's alpha >0.7).Conclusions: GPT-3.5 provided more credible information on topics related to endodontics compared to Google Bard and Bing.

Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics

期刊

INTERNATIONAL ENDODONTIC JOURNAL

出版社

WILEY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics

期刊

INTERNATIONAL ENDODONTIC JOURNAL

出版社

WILEY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文