4.6 Article

Benchmarking the symptom-checking capabilities of ChatGPT for a broad range of diseases

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Editorial Material Medicine, General & Internal

Harnessing the Promise of Artificial Intelligence Responsibly

David A. Dorr et al.

Summary: This Viewpoint discusses the benefits and potential risks of using AI algorithms in healthcare and suggests the collaborative creation of a Code of Conduct for AI in Health Care.

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION (2023)

Letter Medicine, General & Internal

Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge

Zahir Kanjee et al.

Summary: This study evaluates the diagnostic accuracy of the GPT-4 AI model in challenging cases.

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION (2023)

Article Multidisciplinary Sciences

Foundation models for generalist medical artificial intelligence

Michael Moor et al.

NATURE (2023)

Article Medicine, General & Internal

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum

John W. Ayers et al.

Summary: The rapid expansion of virtual health care has led to an increase in patient messages and burnout among health care professionals. This study evaluated the ability of an AI chatbot assistant to provide quality and empathetic responses to patient questions, and found that the chatbot performed well in both aspects.

JAMA INTERNAL MEDICINE (2023)

Editorial Material Medicine, General & Internal

AI in Medicine-JAMA's Focus on Clinical Outcomes, Patient-Centered Care, Quality, and Equity

Rohan Khera et al.

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION (2023)

Article Multidisciplinary Sciences

Large language models encode clinical knowledge

Karan Singhal et al.

Summary: This paper introduces a multi-domain benchmark for medical question answering, which evaluates the performance of models in terms of factuality, comprehension, reasoning, possible harm, and bias through human evaluation. In addition, it proposes instruction prompt tuning to align language models to new domains. The experimental results suggest the potential value of model scale and instruction prompt tuning in improving comprehension, knowledge recall, and reasoning abilities. The human evaluations reveal the limitations of current models and emphasize the importance of evaluation frameworks and method development in creating safe and helpful large language models for clinical applications.

NATURE (2023)

Letter Medicine, General & Internal

Chatbot vs Medical Student Performance on Free-Response Clinical Reasoning Examinations

Eric Strong et al.

JAMA INTERNAL MEDICINE (2023)

Letter Medicine, General & Internal

Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis

Yat-Fung Shea et al.

JAMA NETWORK OPEN (2023)

Article Medical Informatics

Operationalizing and Implementing Pretrained, Large Artificial Intelligence Linguistic Models in the US Health Care System: Outlook of Generative Pretrained Transformer 3 (GPT-3) as a Service Model

Emre Sezgin et al.

Summary: Generative pretrained transformer models, particularly GPT-3, have gained popularity for their enhanced capabilities in tasks such as writing essays and answering complex questions. However, challenges remain in implementing these models in healthcare, including concerns about operationalization and their use in clinical practice and research. In a viewpoint paper, considerations are outlined for implementing GPT-3 in clinical practice, such as processing needs, operating costs, model biases, and evaluation metrics, as well as factors driving adoption in the US healthcare system. These tools can provide valuable insights for healthcare practitioners, developers, clinicians, and decision makers seeking to integrate powerful artificial intelligence tools into hospital systems and healthcare practices.

JMIR MEDICAL INFORMATICS (2022)

Article Medicine, General & Internal

What is the suitability of clinical vignettes in benchmarking the performance of online symptom checkers? An audit study

Austen El-Osta et al.

Summary: This study assessed the suitability of clinical vignettes in benchmarking the performance of online symptom checkers (OSCs). The results showed that clinical vignettes have inherent limitations, and real-world evidence studies involving real patients are recommended to benchmark the performance of OSCs.

BMJ OPEN (2022)

Review Health Care Sciences & Services

The diagnostic and triage accuracy of digital and online symptom checker tools: a systematic review

William Wallace et al.

Summary: This systematic review evaluates the accuracy of digital and online symptom checkers in providing diagnoses and triage advice. The results show that the diagnostic accuracy of symptom checkers is low and varies between different checkers. Triage accuracy is generally higher than diagnostic accuracy. The study highlights the potential patient safety hazards and calls for further research and regulation of these technologies.

NPJ DIGITAL MEDICINE (2022)

Article Multidisciplinary Sciences

Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data

Anjun Chen et al.

Summary: This study demonstrates the feasibility of developing a simulated ML-enabled LHS using synthetic patient data, and shows that performance improves with increasing data size. It provides guidance and methods for other researchers to develop LHS with real patient data.

SCIENTIFIC REPORTS (2022)

Article Multidisciplinary Sciences

Accuracy of online symptom checkers and the potential impact on service utilisation

Adam Ceney et al.

Summary: The diagnostic accuracy of symptom checkers is poor, with safety decreasing with condition urgency and half of the systems suggesting resource utilisation beyond national guidelines. There is substantial variation in diagnostic accuracy and appropriate resource recommendation between systems.

PLOS ONE (2021)

Article Medicine, General & Internal

Evaluation of symptom checkers for self diagnosis and triage: audit study

Hannah L. Semigran et al.

BMJ-BRITISH MEDICAL JOURNAL (2015)

Article Medicine, General & Internal

Evaluation of symptom checkers for self diagnosis and triage: audit study

Hannah L. Semigran et al.

BMJ-BRITISH MEDICAL JOURNAL (2015)

Article Medicine, General & Internal

Comparison of vignettes, standardized patients, and chart abstraction - A prospective validation study of 3 methods for measuring quality

JW Peabody et al.

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION (2000)