3.8 Article

Comparison of artificial intelligence and human-based prediction and stratification of the risk of long-term kidney allograft failure

期刊

COMMUNICATIONS MEDICINE
卷 2, 期 1, 页码 -

出版社

SPRINGERNATURE
DOI: 10.1038/s43856-022-00201-9

关键词

-

资金

  1. MSD Avenir and the Fondation AP-HP
  2. INSERM-Action thematique incitative sur programme Avenir (ATIP-Avenir), RHU KTDInnov [17-RHUS-0010]
  3. H2020 EUTRAIN [754995]
  4. Fondation Bettencourt Schueller
  5. French Foundation for Medical Research

向作者/读者索取更多资源

This study evaluates the ability of transplant physicians to predict the risk of long-term allograft failure and compares them to a validated artificial intelligence (AI) prediction algorithm. The study finds that the overall performance of physicians in predicting individual long-term outcomes is limited compared to the AI system, and there is wide variability in physicians' predictions.
BackgroundClinical decisions are mainly driven by the ability of physicians to apply risk stratification to patients. However, this task is difficult as it requires complex integration of numerous parameters and is impacted by patient heterogeneity. We sought to evaluate the ability of transplant physicians to predict the risk of long-term allograft failure and compare them to a validated artificial intelligence (AI) prediction algorithm.MethodsWe randomly selected 400 kidney transplant recipients from a qualified dataset of 4000 patients. For each patient, 44 features routinely collected during the first-year post-transplant were compiled in an electronic health record (EHR). We enrolled 9 transplant physicians at various career stages. At 1-year post-transplant, they blindly predicted the long-term graft survival with probabilities for each patient. Their predictions were compared with those of a validated prediction system (iBox). We assessed the determinants of each physician's prediction using a random forest survival model.ResultsAmong the 400 patients included, 84 graft failures occurred at 7 years post-evaluation. The iBox system demonstrates the best predictive performance with a discrimination of 0.79 and a median calibration error of 5.79%, while physicians tend to overestimate the risk of graft failure. Physicians' risk predictions show wide heterogeneity with a moderate intraclass correlation of 0.58. The determinants of physicians' prediction are disparate, with poor agreement regardless of their clinical experience.ConclusionsThis study shows the overall limited performance and consistency of physicians to predict the risk of long-term graft failure, demonstrated by the superior performances of the iBox. This study supports the use of a companion tool to help physicians in their prognostic judgement and decision-making in clinical care. Plain language summaryThe ability to predict the risk of a particular event is key to clinical decision-making, for example when predicting the risk of a poor outcome to help decide which patients should receive an organ transplant. Computer-based systems may help to improve risk prediction, particularly with the increasing volume and complexity of patient data available to clinicians. Here, we compare predictions of the risk of long-term kidney transplant failure made by clinicians with those made by our computer-based system (the iBox system). We observe that clinicians' overall performance in predicting individual long-term outcomes is limited compared to the iBox system, and demonstrate wide variability in clinicians' predictions, regardless of level of experience. Our findings support the use of the iBox system in the clinic to help clinicians predict outcomes and make decisions surrounding kidney transplants. Divard, Raynaud et al. compare artificial intelligence (AI)-based predictions of kidney allograft failure based on electronic health records with those made by transplant physicians of varying levels of experience. The ability of physicians to predict allograft failure is limited, with superior performance seen for the AI system.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据