☆ 4.5 Article

Towards automated generation of curated datasets in radiology: Application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism

EUROPEAN JOURNAL OF RADIOLOGY (2020)

期刊

EUROPEAN JOURNAL OF RADIOLOGY

卷 125, 期 -, 页码 -

出版社

ELSEVIER IRELAND LTD

DOI: 10.1016/j.ejrad.2020.108862

关键词

Natural language processing; Data curation; Classification; Pulmonary embolism; Computed tomography angiography

类别

Radiology, Nuclear Medicine & Medical Imaging

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Purpose: To design and evaluate a self-trainable natural language processing (NLP)-based procedure to classify unstructured radiology reports. The method enabling the generation of curated datasets is exemplified on CT pulmonary angiogram (CTPA) reports. Method: We extracted the impressions of CTPA reports created at our institution from 2016 to 2018 (n = 4397; language: German). The status (pulmonary embolism: yes/no) was manually labelled for all exams. Data from 2016/2017 (n = 2801) served as a ground truth to train three NLP architectures that only require a subset of reference datasets for training to be operative. The three architectures were as follows: a convolutional neural network (CNN), a support vector machine (SVM) and a random forest (RF) classifier. Impressions of 2018 (n = 1377) were kept aside and used for general performance measurements. Furthermore, we investigated the dependence of classification performance on the amount of training data with multiple simulations. Results: The classification performance of all three models was excellent (accuracies: 97 %-99 %; F1 scores 0.88-0.97; AUCs: 0.993-0.997). Highest accuracy was reached by the CNN with 99.1 % (95 % CI 98.5-99.6 %). Training with 470 labelled impressions was sufficient to reach an accuracy of > 93 % with all three NLP architectures. Conclusion: Our NLP-based approaches allow for an automated and highly accurate retrospective classification of CTPA reports with manageable effort solely using unstructured impression sections. We demonstrated that this approach is useful for the classification of radiology reports not written in English. Moreover, excellent classification performance is achieved at relatively small training set sizes.

Towards automated generation of curated datasets in radiology: Application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism

期刊

EUROPEAN JOURNAL OF RADIOLOGY

出版社

ELSEVIER IRELAND LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Towards automated generation of curated datasets in radiology: Application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism

期刊

EUROPEAN JOURNAL OF RADIOLOGY

出版社

ELSEVIER IRELAND LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文