4.6 Article

Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews

期刊

JOURNAL OF CLINICAL EPIDEMIOLOGY
卷 133, 期 -, 页码 140-151

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.jclinepi.2020.11.003

关键词

Machine learning; Study classifiers; Searching; Information retrieval; Methods/methodology; Randomized controlled trials; Systematic reviews; Automation; Crowdsourcing; Cochrane Library

资金

  1. Australian National Health & Medical Research Council [APP1114605]
  2. U.S. National Library of Medicine [2R01LM01208605]
  3. Medical Research Council (UK) fellowship [MR/N015185/1]
  4. National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care North Thames at Barts Health NHS Trust
  5. Cochrane via Project Transform
  6. MRC [MR/N015185/1, MR/J005037/1] Funding Source: UKRI

向作者/读者索取更多资源

The study developed a machine learning classifier to reduce study identification workload in Cochrane for producing systematic reviews. The calibrated classifier showed high recall in evaluation, with older records more likely to be missed compared to newer records.
Objectives: This study developed, calibrated, and evaluated a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews. Methods: A machine learning classifier for retrieving randomized controlled trials (RCTs) was developed (the Cochrane RCT Classifier''), with the algorithm trained using a data set of title-abstract records from Embase, manually labeled by the Cochrane Crowd. The classifier was then calibrated using a further data set of similar records manually labeled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification. Results: The Cochrane RCT Classifier was trained using 280,620 records (20,454 of which reported RCTs). A classification threshold was set using 49,025 calibration records (1,587 of which reported RCTs), and our bootstrap validation found the classifier had recall of 0.99 (95% confidence interval 0.98-0.99) and precision of 0.08 (95% confidence interval 0.06-0.12) in this data set. The final, calibrated RCT classifier correctly retrieved 43,783 (99.5%) of 44,007 RCTs included in Cochrane Reviews but missed 224 (0.5%). Older records were more likely to be missed than those more recently published. Conclusions: The Cochrane RCT Classifier can reduce manual study identification workload for Cochrane Reviews, with a very low and acceptable risk of missing eligible RCTs. This classifier now forms part of the Evidence Pipeline, an integrated workflow deployed within Cochrane to help improve the efficiency of the study identification processes that support systematic review production. (C) 2020 The Authors. Published by Elsevier Inc.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据